Social Spammers in Evolving Multi-Relational Social Network Dataset social spammer heterogeneous dataset


This anonymized dataset was collected from the social network website. It contains 5.6 million users and 858 million links between them. Each user has 4 features and is manually labeled as "spammer" or "not spammer". Each link represents an action between two users and includes a timestamp and a type. The network contains 7 anonymized types of links. The original task on the dataset is to identify (i.e., classify) the spammer users based on their relational and non-relational features.

Files:   TERMS_OF_USE.txt  usersdata.csv.gz  relations.csv.gz  hash.md5

[userId, sex, timePassedValidation, ageGroup, label]

userId: Anonymized ID
sex: M/F
timePassedValidation: Normalized to 0-1
ageGroup: Set to bins every 10 years
label: 0 = Not-spammer, 1 = Spammer

Rows (i.e., users or nodes): 5,607,447

[day, time_ms, src, dst, relation]

day: Anonymized day [0-9]
time_ms: Time of day based on milliseconds
src: Anonymized ID of the source user
dst: Anonymized ID of the destination user
relation: Anonymized ID of the action [0-7]

Rows (i.e., links or actions): 858,247,099

This dataset was released by Shobeir Fakhraei with permission from if(we) Inc. as supplementary material of the following paper.
Please read the 'TERMS_OF_USE' and use the following citation referring to the dataset:

author = {Fakhraei, Shobeir and Foulds, James and Shashanka, Madhusudana and Getoor, Lise},
title = {Collective Spammer Detection in Evolving Multi-Relational Social Networks},
booktitle = {Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
series = {KDD '15},
year = {2015},
isbn = {978-1-4503-3664-2},
location = {Sydney, NSW, Australia},
pages = {1769--1778},
doi = {10.1145/2783258.2788606},
publisher = {ACM},

Paper's PDF: [ACM DL]

Paper's Code:

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Shobeir Fakhraei © 2017