Sunday, October 16, 2005

Relevance in blog search is very difficult. Google-type PageRank analysis, which looks at incoming links to a piece of content, simply doesnt work because new content doesnt have much in the way of links. Until now, no one has come up with a way to properly sort blog posts by relevance, and the general default way of showing results is reverse-chrono, which simply puts the newest stuff at the top.

A very interesting problem. How to provide relevant results for new dynamic posts, that don't have a lot of links. Memeorandum still works on the concept of how many links are created to a post. How do you provide relevant search results for matter that has not been linked to. Maybe sphere has an answer...

Sphere appears to have solved the problem, or at least taken big steps in the right direction. Their approach involves three key algorithms - an analysis of links into and out of a blog, an analysis of metadata around a post (links, post frequency, length of posts, etc.), and something Tony calls their “secret sauce”, which is content semantic analysis to filter out spam and to understand what a blog post is talking about.