"Too Many Jelly Beans?"  

UPDATE 2011/08/11:
This site is pretty out of date right now, hopefully I'll get a chance to update it at some point. The most recent update was probably 2008 or 2009. Yikes!

I am no longer a Ph.D. student, these days I work at Bionica (Bionica Human Computing LLC to be exact), purveyor of fine crowdsourcing products.

About

I am a Ph.D. student in the Computer Science Department, working with Professor Hector Garcia-Molina as part of the InfoLab.

Stanford InfoLab Stanford InfoBlog Stanford InfoLab DBPubs Publication Server

Tagging

Recently, I have been investigating collaborative tagging systems. Tagging systems are based around "tags": (usually) single word, user-contributed, keyword annotations. The big difference between tags and keyword annotations is that users can contribute tags, whereas keyword annotations are usually added by authors or librarians. This allows tagging to scale to massive and dynamic corpora on the web.

Popular examples of collaborative tagging systems include: These systems work pretty well, but they also have some problems.

Tags have caveats for text corpora (February 2007—February 2008). Social bookmarking systems are a type of tagging system for URLs. We looked at how these systems can impact web search. We call this work: "Can Social Bookmarking Improve Web Search?". We found that tags are often redundant, though there are other features of social bookmarking systems that make them valuable. More info...

Users spam tag sites (February—November 2007). When users can add anything they desire, they often add spam. In recent work on Tag Spam, we looked at methods to fight tag spam and models for attackers in tagging systems. More info...

Tags are flat and disorganized (January—April 2006). One big challenge is to decide how to organize and interface with tags. We did some early work on organizing tags into Tag Hierarchies in 2006. More info...

Ultimately, the question is: how can we improve tags, and how do tags compare to other types of metadata that scale to millions of users?

Address

Gates Building, Room 424
353 Serra Mall
Stanford, CA 94305
cs-stanford-paul AT heymann.be