July 24, 2005

For the Vox Populi: A Comparison of How Some Blog Aggregation and RSS Search Tools Work

UPDATED: Recently, there has been some blogosphere discussion about different blog search services. People have been asking me for a year and a half to compare them, and I've been reluctant. However, after last weeks confusion, I decided that if folks like Robert Scoble are having difficulty comparing the search results of different services that we've been using for some time, we really needed to get a few things clear for users. Also, Doc Searls suggested that it was about time. And the other day, he said it again in person.

I'm going to do this as a six part series, the first of which is below, on how services track links to blogs. The second will be on key word search, the third will cover subscription search (watchlist) performance, the fourth will look at special services and the fifth will look at spam and controls for it. The sixth will summarize and make recommendations about how to best use the services. I picked the five services I look at every day: Technorati, Feedster, Bloglines, Blogpulse and Pubsub, and so I'm familiar with them over time. I see watchlists or alerts via RSS feeds from all but Bloglines, of both URL and keyword searches, many of which are duplicate searches that allow me to also track how the services do with their searches. Note that I'm not reviewing Bloglines as a newsreader, partly because I use Netnewswire for the most part, with Bloglines as one of my backup readers, and partly because there is no comparison to the other services because they are not news readers at all.

Additionally, Blogpulse had this write up in Marketing Vox, suggesting it might be a Technorati Killer in the estimation of a blogger they were quoting. However, because what Blogpulse covers is fundamentally different, and their philosophies about how to age information is different, they are not so similar when comparing results of URL searches for inbound links. Depending on the user's needs, one or the other service may suit those needs better. However, due to some of the additional features Blogpulse is now offering, it is doing some of the things that bloggers and others really want from other blog aggregation companies, yet aren't being offered, like rank, citations and recent posts. So in this sense, they are different and more interesting, if Blogpulse information is what you are looking for about you or others you want to analyze.

Finally, Adam Pennenberg notes that these kinds of services are like public utilities, so it seems like a good time to compare and contrast the services.

This exercise is an attempt to give readers and users of the services a comparison of how the services work so that they can take best advantage of the strengths and avoid the weaknesses in order to track URLs, keywords, other special services, and alerts or subscriptions or watchlists (the services each use different terminology in order to differentiate themselves but users tell me the terminology is just terribly confusing and they wish that as an industry we would settle on one term and use it across all the services and then get on to figuring out how to provide the service better).

Matt Hurst of Intelliseek (parent to Blogpulse) has a post on evaluating blog search services which is very informative. It includes information on search generally.. which I think applies very much to evaluating key word search, which will be covered in my next post. URL search is a little more straightforward in that people want to see everything that is linking to the URL they are looking up. But he makes some excellent points.

In disclosure, I should say that to one degree or another, I'm friends with people at all of these companies, as well as having worked for Technorati in the past, and currently a member of its advisory board.

URL Searches for Inbound Links

Two weeks ago, Scoble compared the inbound link counts for Dave Sifry's blog on Technorati (735 links at the time of Scoble's post) and Bloglines (2,644 links at the time of Scoble's post). However, the way they are contrasted, isn't actually comparable. First, Technorati's count is actually for inbound sites or sources. In other words, you can have 10 links from a blog, but that blog counts once as a 'site'. So Dave has 735 blogs that have linked to him at least once, at this moment in time. Technorati also only counts links and sites from blogs that have a link on the front page. Therefore, if a bloggers blogs, which bloggers tend to do, their old posts scroll off the front pages and therefore the links in those old posts go off the Technorati count at the same time. Blogroll links stay in the counts because they are permanently on the front pages of blogs, but if a blogger's post links to another blog, that link only gets counted so long as it's on the linking blog's top home page.

Bloglines on the other hand, gives a total link count, for all Blogline's history. If a blogger is linked to 10 times, in the history of Bloglines aggregation of links, those links count as ten, towards Dave's Bloglines total. Bloglines doesn't give a base count of sources doing the linking. Also, Bloglines shows you everything since they started tracking blogs, so Dave's first link goes back to a post on August 22 2002. Technorati would age that post off their link counts, since that blog no longer shows the post on the front page (it long ago scrolled off the page). However, I wasn't able to look at Dave's first link on Technorati, because the service kept returning error messages about high search volumes, so I can't compare their first result to Blogline's first result.

Note also that Blogline's total for Dave's blog is now, two weeks after Scoble's post, 2730 links verses Technorati's total sources (each blog counts once) is 712. Bloglines is higher that two weeks ago, because it has an aggregate count of all links. Technorati is lower, because some blog posts have scrolled of those blog's front pages, and until new links are made, Dave's source count might continue to fall. And based on each company's information philosophy, this is actually as it should be, and is correctly counted using each methodology. In fact, the difference is very useful, because one can compare Dave's current activities, blogroll and post links at 712 from Technorati to his historical link count at Bloglines of 2730, maybe discounted a little for duplication of posts. My assessment might be that Dave is currently a heavily linked to blogger, but three years ago, didn't have so many links, and has grown over time, in an upswing to say, around 2000 links total over the history of his blog. Probably this has occurred because of the growth of Technorati, and as its CEO and the place Dave blogs about Technorati, his blog has had it's link counts grow as more attention is paid to Technorati.

On the other hand, my blog has 1012 links from Bloglines over the past couple of years (discount 20% for dups) but 205 site links in Technorati. My assessment might then be that Napsterization is more of a steady blog.. with 800 links over the past two or so years, and since I already know that the blog had similar link counts a year ago.. that it's more conversational, linking out and in at similar rates over the past year or so. Not much upswing but a steady conversation ongoing.

Below, in chart form, is a comparison of Technorati, Bloglines (as an information search tool, not a news reader tool), Feedster, Blogpulse and Pubsub. The chart is a PDF (blog software doesn't render html charts so well... but if you have a suggestion about getting this data into my post, please email me at mary@hodder.org) but as feedback for this post comes in, I will update both the post and chart and note the updated time and date. I'm going to treat this survey as somewhat of a wiki, so that I can incorporate feedback to make this the most accurate survey possible.

Please note the footnotes, as they explain additional information about specific categories of information and how specific services work in those categories of activities. Also, note that some services perform poorly in the URL lookup category, but their usefulness will become apparent in the keyword category, or for subscription search or for other special services. Please don't write anyone off due to a poor showing here in the URL section. All five of these services are very valuable, as they each show us different things, and frankly, for my information needs, I want and use all of them each day to track myself, my projects, companies I consult for, and all of my areas of interest, which are numerous. Often, the combination is the only way to get an accurate picture of what is happening online across blogs and RSS feeds.

NOTE: I've updated the file just now to take into account revised and clarified information about Blogpulse and Technorati. Blogpulse has a bug in their URL search, wherein, if the http:// is not at the front of the URL, very little information is returned. And so rather than 9 links for napsterization, there are 477. And Technorati, I wanted to point out, does not count links in its link counts that have scrolled off the front pages of blogs, but they do still show search results that match keywords that have scrolled off. So users may see older results, but not see them in link counts.

And additional update regarding Bloglines. They noted that they only serve results for searches from blogs that at least one subscriber has in the list of subscriptions. This has been added to the chart under information philosophy.

PDF file of comparison of how Blog search work.

Also, please use the comments below to tell me about areas that need more information, or suggestions. I'd like this to be as accurate as possible and will correct or update with information as I find it, or it's sent to me. Thanks very much for suggestions.

Oh.. and you have to answer a question to comment.. so please remember to do that, or the comment system gives you that obtuse answer that your comment is 'of questionable content' which isn't really true.. just that haven't answered the question. Thanks!

Posted by Mary Hodder at July 24, 2005 11:51 PM | TrackBack
Comments

Mary,

Very interesting post, and a good clarification of how different services work. We (Technorati) actually have the aggregate link counts in our databases, and we could show them if people think that this is an interesting statistic, but what we've found is that using it for ranking or other authority calculation is problematic, because it becomes a very easy thing to game - many links (e.g. a robot posting multiple times linking to your blog) may inflate your counts arbitrarily, and by definition, these counts will always increase, giving significant deference and near-gentrification to bloggers who have been around for a long time. I don't think that we really want that. Of course, no system is perfect, but we're trying to keep our authority rankings pretty relevant - thus the focus on links from the blog homepage making a difference in the rankings.

As soon as I get a chance I'll write more about this on my blog, but it will be interesting to hear what other people think about this practice, our goal is to be of service...

Dave

Posted by: David Sifry at July 25, 2005 07:49 AM

This is good stuff. I'll link to it tonight.

Posted by: Robert Scoble at July 25, 2005 01:53 PM

Love to hear what you think of kbcafe.com search.

http://www.kbcafe.com/search.aspx

Posted by: Randy Charles Morin at July 25, 2005 01:56 PM

Great write up. Tough job since those services are evolvng every day it seems.

Posted by: jim wilde at July 31, 2005 01:16 PM

This is hugely valuable for those of us in the PR business trying to sort out what to tell our cilents. Thank you so much for putting this together Mary!

Posted by: Lisa Poulson at August 1, 2005 03:55 PM

What about the blog search by Technorati?

Posted by: sodora search at September 10, 2005 03:19 PM

does this thing work?

Posted by: test at September 10, 2005 03:20 PM

Quo Vadis, Technorati? Down the drain?

The search result for interface translation at Technorati

www.technorati.com/search/interface+translation

renders, disappointingly:
#
Latina street latin translation, latina ass...

Latina street latin translation, latina ass latina pussy. Algeria Interface - Nov 15 10:50 PM Latina street latin translation, latina ass latina pussy.Save to My Web

* Posted 5 hours ago in big saggy tits 0 links Search this blog

#
ADSL配置静态IP的配置参考

timestamps log uptime ! hostname 827 ! ip subnet-zero no ip domain-lookup ! bridge irb ! interface... address and it's a private IP address. ! interface ATM0 no ip address no atm ilmi-keepalive pvc 0/35 encapsulation aal5snap ! bundle-enable dsl operating-mode auto bridge-group 1 ! interface BVI1 ip

* Posted 5 hours ago in 像鸟儿飞ア  0 links Search this blog

#
THE MATADOR de Richard Shepard (USA-2005): en...

nouvelle interface, de nouvelles options, etc... En principe, tout cela est de la cuisine interne, et..., quelques bugs se sont glissés dans cette nouvelle interface! L'adresse email du Dr Devo (moi!), soit... collier, moins collé monté que LOST IN TRANSLATION, par exemple, autre film d'hôtel et d'ennui, à

Posted by: Gisela Strauss at November 18, 2005 12:47 PM

Mary, this review about different blog search services very interesting. And thanks for footnotes. Too it is a lot of helpful information. I have read also PDF file of comparison of how Blog search work. Thanks for your work!

Posted by: Bruce at December 9, 2005 02:11 PM

I am new to all this and this has been very useful for me. Many thanks Mary, keep up the good work, all the best from Jim.

Posted by: Jim Rockingham at March 15, 2006 12:51 PM