Social Spammers in Evolving Multi-Relational Social Network Dataset


This anonymized dataset was collected from the social network website. It contains 5.6 million users and 858 million links between them. Each user has 4 features and is manually labeled as "spammer" or "not spammer". Each link represents an action between two users and includes a timestamp and a type. The network contains 7 anonymized types of links. The original task on the dataset is to identify (i.e., classify) the spammer users based on their relational and non-relational features.

Files:   TERMS_OF_USE.txt  usersdata.csv.gz  relations.csv.gz  hash.md5

[userId, sex, timePassedValidation, ageGroup, label]

userId: Anonymized ID
sex: M/F
timePassedValidation: Normalized to 0-1
ageGroup: Set to bins every 10 years
label: 0 = Not-spammer, 1 = Spammer

Rows (i.e., users or nodes): 5,607,447

[day, time_ms, src, dst, relation]

day: Anonymized day [0-9]
time_ms: Time of day based on milliseconds
src: Anonymized ID of the source user
dst: Anonymized ID of the destination user
relation: Anonymized ID of the action [0-7]

Rows (i.e., links or actions): 858,247,099

This dataset was released by Shobeir Fakhraei with permission from if(we) Inc. as supplementary material of the following paper.
Please read the 'TERMS_OF_USE' and use the following citation referring to the dataset:

author = {Fakhraei, Shobeir and Foulds, James and Shashanka, Madhusudana and Getoor, Lise},
title = {Collective Spammer Detection in Evolving Multi-Relational Social Networks},
booktitle = {Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
series = {KDD '15},
year = {2015},
isbn = {978-1-4503-3664-2},
location = {Sydney, NSW, Australia},
pages = {1769--1778},
doi = {10.1145/2783258.2788606},
publisher = {ACM},

Paper's PDF: [ACM DL]

Paper's Code:

