UPDATED: Recently, there has been some blogosphere discussion about different blog search services. People have been asking me for a year and a half to compare them, and I've been reluctant. However, after last weeks confusion, I decided that if folks like Robert Scoble are having difficulty comparing the search results of different services that we've been using for some time, we really needed to get a few things clear for users. Also, Doc Searls suggested that it was about time. And the other day, he said it again in person.
I'm going to do this as a six part series, the first of which is below, on how services track links to blogs. The second will be on key word search, the third will cover subscription search (watchlist) performance, the fourth will look at special services and the fifth will look at spam and controls for it. The sixth will summarize and make recommendations about how to best use the services. I picked the five services I look at every day: Technorati, Feedster, Bloglines, Blogpulse and Pubsub, and so I'm familiar with them over time. I see watchlists or alerts via RSS feeds from all but Bloglines, of both URL and keyword searches, many of which are duplicate searches that allow me to also track how the services do with their searches. Note that I'm not reviewing Bloglines as a newsreader, partly because I use Netnewswire for the most part, with Bloglines as one of my backup readers, and partly because there is no comparison to the other services because they are not news readers at all.
Additionally, Blogpulse had this write up in Marketing Vox, suggesting it might be a Technorati Killer in the estimation of a blogger they were quoting. However, because what Blogpulse covers is fundamentally different, and their philosophies about how to age information is different, they are not so similar when comparing results of URL searches for inbound links. Depending on the user's needs, one or the other service may suit those needs better. However, due to some of the additional features Blogpulse is now offering, it is doing some of the things that bloggers and others really want from other blog aggregation companies, yet aren't being offered, like rank, citations and recent posts. So in this sense, they are different and more interesting, if Blogpulse information is what you are looking for about you or others you want to analyze.
Finally, Adam Pennenberg notes that these kinds of services are like public utilities, so it seems like a good time to compare and contrast the services.
This exercise is an attempt to give readers and users of the services a comparison of how the services work so that they can take best advantage of the strengths and avoid the weaknesses in order to track URLs, keywords, other special services, and alerts or subscriptions or watchlists (the services each use different terminology in order to differentiate themselves but users tell me the terminology is just terribly confusing and they wish that as an industry we would settle on one term and use it across all the services and then get on to figuring out how to provide the service better).
Matt Hurst of Intelliseek (parent to Blogpulse) has a post on evaluating blog search services which is very informative. It includes information on search generally.. which I think applies very much to evaluating key word search, which will be covered in my next post. URL search is a little more straightforward in that people want to see everything that is linking to the URL they are looking up. But he makes some excellent points.
In disclosure, I should say that to one degree or another, I'm friends with people at all of these companies, as well as having worked for Technorati in the past, and currently a member of its advisory board.
Two weeks ago, Scoble compared the inbound link counts for Dave Sifry's blog on Technorati (735 links at the time of Scoble's post) and Bloglines (2,644 links at the time of Scoble's post). However, the way they are contrasted, isn't actually comparable. First, Technorati's count is actually for inbound sites or sources. In other words, you can have 10 links from a blog, but that blog counts once as a 'site'. So Dave has 735 blogs that have linked to him at least once, at this moment in time. Technorati also only counts links and sites from blogs that have a link on the front page. Therefore, if a bloggers blogs, which bloggers tend to do, their old posts scroll off the front pages and therefore the links in those old posts go off the Technorati count at the same time. Blogroll links stay in the counts because they are permanently on the front pages of blogs, but if a blogger's post links to another blog, that link only gets counted so long as it's on the linking blog's top home page.
Bloglines on the other hand, gives a total link count, for all Blogline's history. If a blogger is linked to 10 times, in the history of Bloglines aggregation of links, those links count as ten, towards Dave's Bloglines total. Bloglines doesn't give a base count of sources doing the linking. Also, Bloglines shows you everything since they started tracking blogs, so Dave's first link goes back to a post on August 22 2002. Technorati would age that post off their link counts, since that blog no longer shows the post on the front page (it long ago scrolled off the page). However, I wasn't able to look at Dave's first link on Technorati, because the service kept returning error messages about high search volumes, so I can't compare their first result to Blogline's first result.
Note also that Blogline's total for Dave's blog is now, two weeks after Scoble's post, 2730 links verses Technorati's total sources (each blog counts once) is 712. Bloglines is higher that two weeks ago, because it has an aggregate count of all links. Technorati is lower, because some blog posts have scrolled of those blog's front pages, and until new links are made, Dave's source count might continue to fall. And based on each company's information philosophy, this is actually as it should be, and is correctly counted using each methodology. In fact, the difference is very useful, because one can compare Dave's current activities, blogroll and post links at 712 from Technorati to his historical link count at Bloglines of 2730, maybe discounted a little for duplication of posts. My assessment might be that Dave is currently a heavily linked to blogger, but three years ago, didn't have so many links, and has grown over time, in an upswing to say, around 2000 links total over the history of his blog. Probably this has occurred because of the growth of Technorati, and as its CEO and the place Dave blogs about Technorati, his blog has had it's link counts grow as more attention is paid to Technorati.
On the other hand, my blog has 1012 links from Bloglines over the past couple of years (discount 20% for dups) but 205 site links in Technorati. My assessment might then be that Napsterization is more of a steady blog.. with 800 links over the past two or so years, and since I already know that the blog had similar link counts a year ago.. that it's more conversational, linking out and in at similar rates over the past year or so. Not much upswing but a steady conversation ongoing.
Below, in chart form, is a comparison of Technorati, Bloglines (as an information search tool, not a news reader tool), Feedster, Blogpulse and Pubsub. The chart is a PDF (blog software doesn't render html charts so well... but if you have a suggestion about getting this data into my post, please email me at firstname.lastname@example.org) but as feedback for this post comes in, I will update both the post and chart and note the updated time and date. I'm going to treat this survey as somewhat of a wiki, so that I can incorporate feedback to make this the most accurate survey possible.
Please note the footnotes, as they explain additional information about specific categories of information and how specific services work in those categories of activities. Also, note that some services perform poorly in the URL lookup category, but their usefulness will become apparent in the keyword category, or for subscription search or for other special services. Please don't write anyone off due to a poor showing here in the URL section. All five of these services are very valuable, as they each show us different things, and frankly, for my information needs, I want and use all of them each day to track myself, my projects, companies I consult for, and all of my areas of interest, which are numerous. Often, the combination is the only way to get an accurate picture of what is happening online across blogs and RSS feeds.
NOTE: I've updated the file just now to take into account revised and clarified information about Blogpulse and Technorati. Blogpulse has a bug in their URL search, wherein, if the http:// is not at the front of the URL, very little information is returned. And so rather than 9 links for napsterization, there are 477. And Technorati, I wanted to point out, does not count links in its link counts that have scrolled off the front pages of blogs, but they do still show search results that match keywords that have scrolled off. So users may see older results, but not see them in link counts.
And additional update regarding Bloglines. They noted that they only serve results for searches from blogs that at least one subscriber has in the list of subscriptions. This has been added to the chart under information philosophy.
Also, please use the comments below to tell me about areas that need more information, or suggestions. I'd like this to be as accurate as possible and will correct or update with information as I find it, or it's sent to me. Thanks very much for suggestions.
Oh.. and you have to answer a question to comment.. so please remember to do that, or the comment system gives you that obtuse answer that your comment is 'of questionable content' which isn't really true.. just that haven't answered the question. Thanks!Posted by Mary Hodder at July 24, 2005 11:51 PM | TrackBack