Google and Del.icio.us
Del.icio.us is a web-application to create what Del.icio.us itself calls “social bookmarks”. I enter my book-marks and tag it with meta-data that I think are relevant and apply to that book-mark, so that it is easier for me to find it the next time. But then Del.icio.us does another thing. It allows syndication of these book-marks. When I create my meta-data for bookmarks what in effect I am also doing is to categorizing the bookmarks. In other words, I am classifying them. Another word might be adding “semantics” to them (ofcourse in a limited way). If I share my bookmarks on Del.icio.us now, I am increasing the utility of what I have stored for my personal use. In a sense, I am putting out some knowledge I have and labelling it in a manner that incidentally lets other people find it too. For example, other people who are looking for particular topic sites can think of all the terms that might be applicable to the site they are looking for. Then they can go to Del.icio.us and check the public bookmarks (created by other people) and check out the bookmarks stored under those terms. In a sense, one can use the knowledge of others to help himself/herself find more sources of information and more reliable sources of information. It is a kind of filtering process.
Now take Google on the other hand. How does it show such excellent search results. It uses a mechanism called Pagerank. Basically each page that it indexes has a “Pagerank” that is calculated using a formula that takes as a parameter the number of other sites linking to that site. In other words it is again using other people’s knowledge to index information to help retrieve knowledge.
But currently, Google and Del.icio.us work in different domains – Del.icio.us is a widely used site but mainly for organizing ones bookmarks, while Google forte is retrieving information present on the internet. Del.icio.us offers a syndication mechanism, such that a user can subscribe to a particular tag (say ‘XYZ’) and each time, some other person (probably at the other end of the world) tags one of his own bookmarks with ‘XYZ’ (on the Del.icio.us site ofcourse), the first person gets this bookmark in his feed aggregator. Another source of knowledge in his Information inbox. On the other hand, Google has a different methodology. It doesn’t ask for user information. But takes what it finds using full text search. Both approaches are important in their own ways. Del.icio.us doesn’t need full-text indexing. It is sufficient to search the tags. It uses people’s knowledge to classify/cluster information rather than an automated mechanism based on full text search and domain discovery.
Over the weekend, I was thinking what would happen if Google tied-up with Del.icio.us? For one would get is intelligent clustering of search results. This is because what Google lacks currently is the ability to club search results into meaningful categories for easy traversal, a feature provided by other search engines like Vivisimo and Mooter. Vivisimo and Mooter do clustering mainly using automated techniques like domain discovery. But they are to a large extent non-intuitive and based mainly on content of files being indexed. This has a few drawbacks. Assume I am writing an essay on a piece of Python code, where I never mention the word “Python”. An automated tool might fail to categorize this as a document “about” Python. Even if it did classify it correctly, it would be a mechanical solution, not something intuitive; not something dynamic. Humans, who are the final users of that information, are in a better position to say what a particular document is about. But currently, one might consider it tedious to categorize each site one visits into topics that that site relates to. Who has the time to add all this information? But then we already have such a corpus of information readily available (although currently on a limited scale). This kind of human-based clustering is exactly what is the basis of sites like Del.icio.us and Flickr (the latter does for photographs what Del.icio.us does for bookmarks). True, these sites were not made specifically with this aim in mind. Flickr and Del.icio.us use tags as a means to enable their users to manage and retrieve (easily) the information they store on these sites and also to share them with others (for example using syndication). So users can subscribe to particular category of information and any new entries under that category gets delivered to their mailboxes automatically.
If a traditional search engine like Google can tie-up with sites like Del.icio.us and Flickr which already have built a significant index based on meta-data, they can offer more relevant search results and enable people to process the search results in a more meaningful manner – using Del.icio.us’s meta-data for categorizing the search results for example. My feeling is that there is a latent resource that is just waiting to be tapped into by search engine companies. The resource I am referring to are the people who use these search engines. Every time we use a search engine we go through the information on sites, we are in a position to say something that site; say what that site is about. Classifying the sites we want to visit later into directories is one way to do this. We do this regularly when we organise our bookmarks into folders. But Del.icio.us went one step further by letting us apply tags (which is a more dynamic way of managing information than folders – look at what Gmail is doing) instead of putting them into some kind of folder and it let us keep this information online. Del.icio.us mainly stores URLs, which are exactly what Google and other search engines fetch for us. But currently Del.icio.us’ bookmarks are limited to people who use and subscribe for that service. But this huge index information can be used by Google to create something that will benefit the internet using community as a whole – streching my imagination a bit – may be a semantic search engine, with the tags and meta-data – privided by humans – providing semantics for information for the sites that are retrieved by the search engine.
Is this unthinkable? Let me know.
You sound like the right person to help me with me meta tags!? Drop me a line on my blog. Need your help