Quality-biased ranking of short texts in microblogging services

Date

2011

Journal Title

Journal ISSN

Volume Title

Publisher

Asian Federation of Natural Language Processing (AFNLP)

Abstract

The abundance of user-generated content comes at a price: the quality of content may range from very high to very low. We propose a regression approach that incorporates various features to recommend short-text documents from Twitter, with a bias toward quality perspective. The approach is built on top of a linear regression model which includes a regularization factor inspired from the content conformity hypothesis - documents similar in content may have similar quality. We test the system on the Edinburgh Twitter corpus. Experimental results show that the regularization factor inspired from the hypothesis can improve the ranking performance and that using unlabeled data can make ranking performance better. Comparative results show that our method outperforms several baseline systems. We also make systematic feature analysis and find that content quality features are dominant in short-text ranking.

Description

Meeting: 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, November 8 - 13, 2011

Keywords

USER GENERATED CONTENT, BLOGGING, DATA QUALITY

Citation

Minlie Huang, Yi Yang, & Xiaoyan Zhu (2011). Quality-biased Ranking of Short Texts in Microblogging Services. Proceedings of the 5th International Joint Conference on Natural Language Processing, 373-382.

DOI