"Too Many Jelly Beans?"  

This is an archive of my old Stanford pages. These pages will cease to be updated. To find out what I'm up to these days, check out paulheymann.com.


I am a Ph.D. student in the Computer Science Department, working with Professor Hector Garcia-Molina as part of the InfoLab.

Stanford InfoLab Stanford InfoBlog Stanford InfoLab DBPubs Publication Server


Recently, I have been investigating collaborative tagging systems. Tagging systems are based around "tags": (usually) single word, user-contributed, keyword annotations. The big difference between tags and keyword annotations is that users can contribute tags, whereas keyword annotations are usually added by authors or librarians. This allows tagging to scale to massive and dynamic corpora on the web.

Popular examples of collaborative tagging systems include: These systems work pretty well, but they also have some problems.

Tags have caveats for text corpora (February 2007—February 2008). Social bookmarking systems are a type of tagging system for URLs. We looked at how these systems can impact web search. We call this work: "Can Social Bookmarking Improve Web Search?". We found that tags are often redundant, though there are other features of social bookmarking systems that make them valuable. More info...

Users spam tag sites (February—November 2007). When users can add anything they desire, they often add spam. In recent work on Tag Spam, we looked at methods to fight tag spam and models for attackers in tagging systems. More info...

Tags are flat and disorganized (January—April 2006). One big challenge is to decide how to organize and interface with tags. We did some early work on organizing tags into Tag Hierarchies in 2006. More info...

Ultimately, the question is: how can we improve tags, and how do tags compare to other types of metadata that scale to millions of users?


Gates Building, Room 424
353 Serra Mall
Stanford, CA 94305
cs-stanford-paul AT heymann.be