E-mail spam has existed for at least 15 years now, and arguably over 25. This seems to be due to the combination of an overly naive protocol (SMTP) and the profitability of exceedingly low response rates.
The one benefit of this long history is that a plethora of responses to e-mail spam have been proposed. In fact, it's gotten to the point where new proposals are even met with form responses, like this (mirrored here).
There are really a lot of things people have proposed. For example:
While there has been a great deal of work figuring out how to fight e-mail spam, there has been relatively little thinking about which solutions are general to all social systems, and which solutions are not.
My main contribution was to try to make a taxonomy of spam fighting solutions, in particular, as they related to social web sites. In the magazine paper below (from IEEE Internet Computing) I lay out the following taxonomy:
In the case of e-mail spam, it has only really been possible to implement detection-based methods, because of the nature of SMTP. In the case of web search, it has only really been common to implement demotion-based methods.
The really exciting thing about social websites is that the system owner now can use a wide variety of spam fighting tools that have been developed previously. For example, while it has been really difficult to get users to use proof-of-work for e-mail systems, the owner of a social website could instantly implement such a feature (for example, using WP Hashcash).
There is a lot of future work to do here. While it is now possible to take a more general approach to the spam problem, the big question is what sorts of things users will tolerate. Will a user still go to a website with HashCash? With a CAPTCHA? With limits on posting? Only time will tell.
More directly related to tagging is the question of how to fight spam in tagging systems. In other related work, Georgia Koutrika and Frans Effendi in particular (as well as a few others in our lab) have been looking at this problem.
However, one challenge is that the top tagging sites tend to be pretty proactive about fighting tag spam. Furthermore, what tag spam there is tends to be pretty obvious for one reason or another. Unlike the web or e-mail where many iterations between spammers and anti-spammers have taken place, tagging is relatively new.
As a result, it seems to make more sense at this point to simulate tag spam and look at the result of various tag anti-spam measures.
The model in the papers below has a few issues with it (for example, users contribute a uniform number of tags rather than a power law distributed number of tags). However, it leads to interesting results. For example, it leads to a rejection of limit-based techniques because limits impede both good users and bad, and the more good data we have the better we can do at detecting and demoting spam.
Title: | Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges |
Authors: | Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina |
Type: | Magazine Article |
Keywords: | Social Web Sites, Spam, Tagging |
Accessible: | (info) (pdf) |
Description: | This is a magazine article in IEEE Internet Computing. The main contribution is a taxonomy of spam fighting approaches, and a discussion of their applicability to social websites. Tagging is discussed as a potential use case, and we also give a summary of the previous tag spam work from below. |
Title: | Combating Spam in Tagging Systems |
Authors: | Georgia Koutrika, Frans Effendi, Zoltan Gyongyi, Paul Heymann, Hector Garcia-Molina |
Type: | Workshop Paper |
Keywords: | Tagging, Spam |
Accessible: | (info) |
Description: | This paper presents a model for spam in tagging systems. The model is realistic in some respects and less realistic in others, but it nonetheless leads to some interesting insights. For example, limiting tag contributions by all users is unlikely to help in confronting spam from malicious users. |
Title: | Combating Spam in Tagging Systems: An Evaluation |
Authors: | Georgia Koutrika, Frans Effendi, Zoltan Gyongyi, Paul Heymann, Hector Garcia-Molina |
Type: | Technical Report |
Keywords: | Tagging, Spam |
Accessible: | (info) (pdf) |
Description: | This is a technical report version of the workshop paper above. |
Title: | Combating Spam in Tagging Systems |
Authors: | Georgia Koutrika, Frans Effendi, Zoltan Gyongyi, Paul Heymann, Hector Garcia-Molina |
Type: | Technical Report |
Keywords: | Tagging, Spam |
Accessible: | (info) (pdf) |
Description: | This is a technical report version of a journal paper to appear in ACM Transactions on the Web that expands on the workshop paper, above. |