Learning to Identify Review Spam

Li, Fangtao; Huang, Minlie; Yang, Yi; Zhu, Xiaoyan

Learning to Identify Review Spam

Date

2011-07

Authors

Publisher

AAAI Press / International Joint Conferences on Artificial Intelligence, Menlo Park, California

Abstract

In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, they may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. On product review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors’ products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpfulness voting, or rating deviation, which limits the performance of this task. In this paper, we exploit machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. We first analyze the effect of various features in spam identification. We also observe that the review spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a two-view semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvements in comparison to the heuristic baselines.

Description

Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

Keywords

SENTIMENT ANALYSIS, SPAM, PRODUCT REVIEWS, MACHINE LEARNING, MARKETING

Citation

Li, F., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to Identify Review Spam. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, ES. (p. 2488-2493). AAAI / International Joint Conferences on Artificial Intelligence Press. doi:10.5591/978-1-57735-516-8/IJCAI11-414

URI

http://hdl.handle.net/10625/49046

Collections

IDRC Research Results / Résultats de recherches du CRDI
2010-2019 / Années 2010-2019
Breaking the barriers to Internet access / Faire tomber les obstacles entravant l’accès à Internet

Full item page

Learning to Identify Review Spam

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

item.page.type

item.page.format

Keywords

Citation

URI

DOI

Collections