Prediction of user retweets based on social neighborhood information and topic modelling

Celayes, Pablo Gabriel; Domínguez, Martín Ariel

View/Open

MICAI_2017_-Celayes_-_Dominguez_99990155.pdf (315.4Kb)

Date

2017

Author

Celayes, Pablo Gabriel

Domínguez, Martín Ariel

Metadata

Show full item record

Abstract

Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.

URI

http://hdl.handle.net/11086/552488

Collections

Ponencias 2017

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International