August 06, 2005

Link Love Lost or How Social Gestures within Topic Groups are More Interesting Than Link Counts

A discussion about creating a new metric for understanding blogs is something I think the community should have the chance to participate in to find a different way of perceiving a blog, or the ripples a blog makes. Partly I believe this because of the frustration people express about Google's secret algorithm for pagerank, where they feel something this powerful should not be secret (update: the algorithm is not secret but the ordering of the search results is secret). And partly because I see that blogging is a opportunity for people to talk transparently, so why shouldn't the algorithm used to express our weight in the blogosphere also be open. Bloggers should have input about the importance of one social gesture over another, one metric over another, and know what it is that is included because it will be used to describe them. And also, I cannot assume that the ways I read blogs is the same as everyone else, so I'd rather have a community algorithm in the sense that the community has commented on the weight of some metrics over others within the algorithm, and not just assume that the ways I or others weight these gestures in our blog search are correct for everyone.

A closed algorithm is purported to be a kind of spam control, as opposed to an algorithm that is open. But a community based standard means the community can help police those that try to game it, if we put in place mechanisms to flag those who abuse the system. Transparency as it exists in open source software, and as it should exist here, is the opposite of security by obscurity. But creating this is also an experiment, and help is needed in order to make creating a community based algorithm possible.

Currently, blogs are measured in systems like Technorati or ranked in PubSub by links or by number of subscribers to a feed in Feedster. In particular, these are the not very interesting, subtle or telling measures used to make indexes like the Technorati Top 100 or the PubSub 100 or the Feedster 100. In Particular, the Technorati Top 100 is based purely on inbound links. All of these lists tend to favor those who blog in more general, popular topic areas, and not those who are specialists in an area.

For many bloggers the relevant sphere of influence is not overall popularity, as those indexes express. It's influence and connection within a community. And the relevant measure of connection isn't the number of connections -- it's the depth and impact of those connections. This is about celebrating the niche, and measuring engagement over time.

Links alone are not a good metric for authority. There are several reasons for this. But the most important, I think is that as consequence for the blogosphere, it harms the way people see blogging. People know some bloggers want influence; many bloggers know they want it too, though many others don't want it at all. Counting links is very much like counting subscriptions to magazines in order to sell ads, as far as comparing it to a number not reflective of what is actually going on with the media it's meant to reflect. Link counts alone are an analog media model, but online media is dynamic, and what is digital often has the possibility of getting much closer to finding smaller, more granular, and more interesting ways of perceiving things, that are much more interesting, and orthogonal to legacy media models counting eyeballs.

I should also be clear that this effort, and the discussions I've been a part of this topic do not have the goal to make another list to replace the Technorati Top 100 or the Pubsub Link Rank 100 or the Feedster list, or any other top (insert #) list. Rather, this is about going beyond lists and links, to understand that the social relationships of expression between and across blogs is really about searching for a "metric for identity" or "metric for affiliation", "metric for community", or "metric for influence".

So a couple of months ago, at a dinner at Les Blogs, a group of us (including Ross Mayfield, Stowe Boyd, Doc Searls and Halley Suitt among others) talked about what it would mean to make an index that could give a clearer sense of a blogger's reach and influence, that might upend the inbound link counts to give some clarity to what is now opaque and hard for us to see blogs we are unfamiliar with but want to find context. Actually, the service was taking a while, and with 30 or so bloggers in the room, eventually things turn to blogging. We started talking about the issue of inbound links and how, counted up and reported as a kind of "attention index," as a show of interest or attention or conversation, they weren't very interesting or telling on their own, partly because they lump together all types of links, no matter when the links were made or where they are from (blogrolls or posts).

Part of what we want is a rich user generated ontology resulting in topic groups that is constantly adjusting to find what's delightful, useful, interesting across blogs. And a more complex metric for understanding those topic groups and individual users as they blog memes and interact with each other, with some context around those bloggers, would help quite a bit.

This issue came up again in force at Blogher where the opening discussion talked about how to play, or not play, the link game. Much of the room was, to one degree or another, very frustrated with using inbound link counts as an expression of attention, and how the derivation of "A list" bloggers comes from that, ignoring the many blogs that are very influential or conversational in their topic areas.

To automate this process, or create a score, is to judge the stone by the ripples in the pond. Right now, the Technorati Top 100 list is obtuse enough that we can all agree that it's not useful for judging 14 million blogs, because blogs are as different as their authors and those who would make a link rank for a person in one's topic community. As for lists of bloggers based on the number of subscribers, like the Feedster Top 100, we know that in this instance, the list is a count only those users of Feedster, so it reflects a small percentage of overall readers.

So I hear people dismiss the current indexes all the time. By doing so, we let the opacity of inbound links counts be a barrier to rankism or scoring that we don't really want to make more precise. The obtuseness is useful because it's can't be relied upon, and therefore the confusion as to the value of a blog is left to be determined by readers through their own methods, by those who look on their own for the ripples across blogs, combined with some reading of the blogs. And this may make many people happy. For me, I would rather have people do their own assessment of my blog because they read it or participate in discussions I am in, seeing what the activity is around it, to judge it, verses relying on a score or count of inbound links.

However, I'm beginning to see many reports prepared by PR people, communications consultants etc. that make assessments of 'influential bloggers' for particular clients. These reports 'score' bloggers by some random number based on something: maybe inbound links or the number of bloglines subscribers or some such single figure called out next to each blog's name. The bloglines measure in particular is not a great one on it's own, because RSS aggregator users are reported to be only approximately 20% of the blog readers, though I believe it's really half that, because my own user studies show that many who are asked if they use an RSS aggregator say yes, when in fact they don't know what it is (they just think they should know, so they answer yes to the question of whether they are users). And of those RSS aggregator users (I think it's 10% of blog readers), and of those, 50% supposedly use Bloglines. But my own assessment of Bloglines is that maybe 60% of their accounts are probably used regularly (not abandoned or very rarely checked), so if they have 35% of the RSS reader market, a Bloglines score might only reflects 3.5% of the total blog reading market -- a very low sample to judge the readership of a blog generally. Using the Bloglines count only counts users of that aggregation service. As a point of comparison, Bloglines shows 20k subscribers of BoingBoing, but Feedburner has 1.2 million subscribers to the BoingBoing feed itself, because they produce the feed, though those counts are only discoverable to the blog owner currently.

And these kinds of counts may or may not reflect the actual readership because users may not necessarily open the feed or posts. On the other hand, I think you do have to weight RSS users a little more heavily because right now, as that user base tends to be early adoptors, influencers, and a market that also tends to be the blog writer set. However, this won't always be the case. And I'm not confident that these PR/Communication agencies understand how to read this kind of information, and while it's one thing to gage the influence of a blogger who writes about their clients by reading the post, it's another to make decisions to send sponsorship or advertising based upon these kinds of measurements.

So the tension is, do we in the blogosphere figure out a more sophisticated, open standard based metric that reflects the way we see blogs, within and across communities, in order to score blogs? And do we do this within topic areas? Or does using a more sophisticated algorithm across all blogs make more sense? Or do we allow this all to be done for us, possibly in an opaque way by some of the blog search engines or by people who are trying to figure out blogger influence and communities for their clients, or do we write off those efforts because we know they cannot possibly understand us anyway?

I have to say, I've resisted this for the past year, even though many people have asked me to work on something like this, because I hate rankism. I think scoring, even a more sophisticated version of it, akin to page-rank, is problematic and takes what is delightful about the blogosphere away, namely the fun of discovering a new writer or media creator on their terms, not others. What I love is that people who read blogs are assessing them over time to see how to take a blogger and their work. But more recently, as I said, I'm seeing these poorly done reports floating around by PR people, communications companies, journalists, advertising entities and others trying to score or weight blogs. And after hearing the degree to which people are upset by the obtuseness of the top counts, and because they do want to monetize their blogs or be included into influencer ranks, I'm at the point where I'd like to consider making something that we agree to, not some secretly held metric that is foisted upon us.

If we are going to do this, I think the algorithm has to be open source, at least as far as the weighting of social gestures and what gestures are to be included. Many people are upset that page rank is secret, and that something so powerful online is not open to scrutiny by the community it ranks. So this is an attempt to have the community determine the social weighting as it goes into algorithm, and have it be transparent to the community.

At the Les Blogs dinner, a group of us made a list of things we might include in this algorithm. This list is an attempt to figure out what things we look at when we're trying to figure out where a blog is at, in terms of interest, conversation and value:


I think a newly made blogroll link now, in the age of 14,000,000+ blogs is far more telling of community and interest, than a blogroll link made five years ago when there were 100,000 blogs (in other words, few choices about where to link). And of course, links made in posts, which are more indicative of conversation or immediate attention about specific topics, are lumped in as well, with the same weight as a blogroll link, for the indexes we have now.

So.. below is a list based on the earliest discussion, but it really needs refinement and input on what is important, how it could be expressed and I'm looking for feedback to help define these issues better to help get the best set of social gestures weighted in the ways we see them across blogs for a community based algorithm.

A new metric could balance links with these other representations of activity (not all are available, but if we want them, we should ask tool builders and data aggregators to get these kinds of information for us). Note that many of these are subject to spam, and spam controls for them are implemented by the companies that track this stuff. Using a metric that incorporates those will require additional spam controls.

So then.. we talked about how important those kinds of information are us as we evaluate a blog or post, and then whether or not there was a number associated with that particular information or a ratio between two sorts of information that might be interesting, and whether it's information we have, and what we might do with it.

A new metric could balance links with the following items in this chart:

Rank ElementDescriptionWeighted ValueMetric BaseInformation Available?Note
Inbound links:
post url
links to a posthigh# countyesmight age over time
Inbound links:
blog url
links to a bloglow# countyesmight age over time
Comments to posts The kinds and numbers of comments others make on a bloggers' postsmediumratios within topic/postyes the kinds and numbers of comments others make on a bloggers' posts.
Blog server logsexpose how many readers and where they are coming from, though it's very rare that others can see this kind of information. There are places like Bloglines, Feedburner and Feedster that give some indication about how many readers there are.High#'syesinformation is not public except in rare cases, but could ask for a tool that would share certain parts and ask bloggers to post or send a portion of this information using a specific tool, for sharing
direct mentions without linksdirect mentions of a blog or blogger on other blogs (without necessarily linking)high#'syesmight mean that mentions that intend *not* to link would use a link with a 'no-follow' tag
indirect mentionsindirect mentions of a blog or blogger in terms of meme generation (HP algorithm)medium#'syeshave data, but would have to perfect the meme generation algorithms HP developed 15 months ago
2nd generation linkslinks to linkers of a post or bloggerhigh#'syes.
Subscribersthe number of subscribers to ab RSS feed, which can also be found at Bloglines, and Feedburner if they were willing to share this, or from bloggers if we had reporting tools to install on an individual's bloghigh#'syeswith appropriate tools and disclosure
time to read/lengththe time spend reading a post divided by the length of itmediumrationowould require length of time data on post click through, reporting tools and disclosure
links to post and incoming traffic from themlinks readers click through from, and the traffic overall in a post where someone has linked through to a postmediumratio: links/trafficnorequires reporting tools and disclosure
links from post and outgoing traffic to themthe links readers click through to, and the traffic overall from a post where someone has linked out to a posthighratio: links/trafficnorequires reporting tools and disclosure
topic frequency scoredegrees of topics communities: first degree ripples for bloggers in a community might be those who blog mostly about that topic and frequently (a ratio of posts to topics?), second degree might be those who blog sometimes about a topic, and third degree ripples might be those that blog infrequently about those topicshigh to lowscoreno.
outbound post links.high#'shigh.
outbound blogroll links.medium#'syesnote: age out over time
emailed postsFrom referrerhigh#'smaybeneed tools for referrer logs
topic discussionkey word analysis of topic and meme discussion around topics the blogger discusses that match frequent topic group discussionhighscoreno.
tagged urlstagged urls showing attention from, furlhigh#'syesdescription
Reputation scoringreputation scoring system rankings like syndic8te that rate rss feedslowscoreyes.
tagged urlstagged urls embedded together within tag structures in blog postsmediumratio of topicsyesin a way, this sort of pointing with a tag attached could become a kind of topic measure, if we wanted to create a tag structure for that type of tagging from blog posts to other blogs or posts

We wanted to see these measures used in an algorithm that balanced the weight of each social gesture, put against large data sets to see whether the resulting score or characterization felt right against what we know about blogs as readers and writers. One thing to consider is that some data sets are made up of spidered data (including blogrolls), while others are made up of RSS feed information (some partial and some whole posts, but there are no blogrolls in RSS feeds) and some are a blend. So we would want to adjust the algorithm for different types of data sets.

So this is my first post think about making an open source algorithm. And I'm wondering, is this a useful approach? I think it could be worthwhile, done right, and I put it out there to the blogging community to determine what is best here. As I said, after seeing what people who want to work with smaller topic communities are doing, it may be in blogger's interest to think about how this might be done so that is it more in keeping with the desires and views of the blogosphere.

Posted by Mary Hodder at August 6, 2005 08:10 AM | TrackBack

Great post Mary!

I replied here:

Posted by: Kevin Burton at August 6, 2005 08:20 PM

My full reply URL is here I think..

Posted by: Kevin Burton at August 6, 2005 08:23 PM

Mary, PageRank is not a secret. Its actually described in Larry and Sergey's paper The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Of course, alot of other stuff that google does is secret including other aspects of their seach results ranking. PageRank, though, is not a secret.

-ryan king

Posted by: ryan king at August 6, 2005 10:57 PM

The url for the PageRank paper is - it didn't make it through on the last comment.

Posted by: ryan king at August 6, 2005 11:07 PM

"Part of what we want is a rich user generated ontology resulting in topic groups that is constantly adjusting to find what's delightful, useful, interesting across blogs."

That would be http:/

With the volume of blogs these days, you can find topic-focused blogs with keywords and tag search, which helps get out of the 'one true list' trap.

There's certainly lots of room for experiment here.

Posted by: Kevin Marks at August 6, 2005 11:49 PM

This is well done, Mary, but may I add an element that most people seem to miss when tackling this issue. We all agree that the blogosphere is a bottom-up phenomenon, yet rankings are inherently top-down. This, it seems to me, is the most important factor in determining influence, etc.

Every blog is a part of or member of a tribe of individuals with like interests or other social, work or playtime factors that bring people together. No man is an island. We are all connected, and it seems to me that these connections are what need to be determined before any external value factor can be measured. I mean, it's fine to weigh overall rankings among the population, but that's mass marketing stuff and really irrelevant in a world of bottom-up connectivity.

In my work with LOCAL blogospheres, I find remarkable communities of people, so this is one element -- how does a blog or blogger fit with their geographical tribe?

A second factor is, of course, content. Rankings determined by blogger category, for example, will give an entirely different view of where that blog fits within tribes that are determined by interest.

If this truly is a bottom-up phenomenon -- and I believe it is -- then we must start looking at the expanding circles of influence that surround an individual before we can do any sorts of measuring.

Finally, if any of this produces a lust to "get to the top" in any way, then we've shifted focus from bottom-up to top-down. Frankly, I'm very comfortable on the bottom, because that's where the people are.

Keep up the great work. Terry

Posted by: Terry Heaton at August 7, 2005 07:34 AM

Hi Ryan,
Thanks for letting me know about Google's page rank algorithm and search results.. I've made a note up in the post to the effect that it is the search result order, not the algorithm, that is secret. But I also wonder. Isn't keeping the order of the search result ranking a secret just as problematic? I mean, if one part and not another is secret, there is still a kind of security-by-obscurity that the community may have trouble policing, and comparing that to a way to understand bloggers, I still feel that we need an open algorithm for showing bloggers influence, especially if the metrics are based on so many factors.

Hi Terry,
Thanks! I do describe factoring in topic communities, which are included in the chart and the description, but had not thought about geographic communities. It's a great idea. But how would it work? Would the same algorithm around small communities that talk with each other be applied to show bloggers influence locally? That could be really interesting, but we should probably test it against a large data set, because I can also imagine that some bloggers have no connection at all to bloggers in relatively close physical proximity, and therefore, the results might be really off. But I definitely want to try it!


Posted by: mary hodder at August 7, 2005 08:23 AM

The primary criteria for any metric is utility. Ask the simple question: What's this metric *for*? *Who* will do *what* with it?

Who are the prospective customers for this metric? Who is actually going to *buy* it (so to speak) and actually *use* it (not to mention misuse it)?

I don't want to sound cynical, but is blog ranking just for ego gratification? Or maybe simply a way for SEO consultants to justify that they've accomplished something?

If a business has a blog, don't they have an intended audience in mind? Aren't they most interested in their "reach" and effectiveness *within* that target audience or market?

Aren't most bloggers more interested in their *value* to their "tribe"?

Do we want to encourage niche blogging or encourage broader, "diverse" blogging?

Let's not forget one thing... blogs are about conversations. Comments should count a lot. Cross-blog conversation (not mere linking) should count a lot. Raw links don't seem to capture the quality of a conversation.

Finally, haven't you ever been involved in some conversation that go's back and forth endlessly until some "genius" makes an astute observation and short-circuits the entire conversation? How do you rank the elements of that conversation? The person or person who resolves the issue deserves a lot of credit (ranking), but their insight may simply have come from carefully listening from a distance, so the "trench warriors" may deserve a lot of credit as well.

Question: Shouldn't comments raise the ranking of the commenter's blog as well?

And how would you rank a blog which had some early success, but then lost its luster. It almost seems like ranking should depend on timeframe as a parameter. Call this "historical ranking."

But even a "current" ranking is by definition somewhat historical. Will someone who spikes up sharply based on the timeliness of an event be ranked above someone who delivers consistent value without spikes over time? Do you want to value spikes over consistency or vice versa, or maybe the poor dumb *user* selects between or weights the balance between the two?

Ultimately you've got a big problem: objectivity versus subjectivity. If you want to come up with global metrics that are inherently "objective", then they'll have little subjective value to each user. And if you focus on tailoring to the needs of each user or class of user, that subjectivity diminishes the global objectivity of the metrics.

Do we really have a handle on what problem we're trying to solve? Show me a robust *problem statement*, and only when there is some consensus about *what* the problem is (and how people with actually *use* the metric(s)) does it make sense to consider solutions.

If the problem is "understanding blogs", I'd suggest that we're talking about some global measure of how *effective* blogs are at reaching out to and engaging in conversations with their target audiences. The keyword there is effectiveness, not quantity or popularity.

If blogs are popular reading, but the conversations are minimal (or non-existent), shouldn't the "reading reach" be discounted by the weakness of the conversations?

How do you measure effectiveness of a conversation?

And what should be the measure or rank of a blog that stimuates angry controversy in the blogosphere without stimulating "useful" conversations within the blog itself? Will "firestarters" be permited to retain the "fruits" of their ill-gotten gains? Will "frenzy level" tend to be ranked higher than simply offering calm reflections?

One reflection: The complexity of the ranking algorithm will determine the extent of gaming of the ranking system.

There was a little controversy over character blogs some months ago... might your algorithm and ranking system encourage "character communities" and "character tribes". That reminds me of the need for a more robust "identity" validation system for blog posts and comments.

Note: With advances in software agent technology and multi-agent systems and artificial societies and virtual environments, aren't we just a short time away from a cyberworld in which "artificial bloggers" will be able to dynamically conjure up entire "artificial communities" that may attract "real" users, to form hybrid communities? How are these virtual bloggers to be ranked, especially if you are unable to identify them as "artificial"?

If you want to evaluate the effectiveness of a blog to its audience, I'm not sure you can really do any better than to poll that audience and let them do the ranking (e.g., -10 to +10 in terms of value received.) Even then, the effectiveness of the poll will depend on the willingness of the audience members to actually "vote", and vote responsibly.

Of course, the big remaining problem with such metrics is than so many blogs have an open-ended audience.

Maybe it's like stocks, where you have "growth" stocks, "value" stocks, small-cap stocks, and large-cap stocks, and ranking is relative to your "peer" group. The problem with blogs is that all of the parameters are open-ended, so it can be hard to define boxes that would hold even two blogs.

-- Jack Krupansky

Posted by: Jack Krupansky at August 7, 2005 10:21 AM

I wonder whether rank is the wrong presentation, and clouds are right.

Clouds would primarily show the communities that a blogger is in. It may show secondarily the influence strength within that community, but that should be secondary in the presentation.

A cloud presentation might enable navigation along topic axis. For my blog, you'd be able to traverse to social software and austin clouds.

Influence would be calculated within the cloud. So, Jon Lebkowsky would have separately-calculated influence level within Austin and environmental blog communities.

Perhaps the presentation would allow the browser to traverse communities. One could find "blogher", and traverse to the "sepia mutiny" south asia community.

A cloud presentation would avoid the rankism, because it would focus on the community more than the individual, and allow a browser to travese communities.

Posted by: Adina Levin at August 7, 2005 10:28 AM

Actually Mary I just started wondering today if it is the very opacity of Google (and other search engine) results that gives it more credibility.

I think the fact the one can see who links to who as part of the blog search results and how it drives up ranking leads to at the worst dishonest, and at the best inevitably distorted linking behaviors.

Posted by: Elisa Camahort at August 8, 2005 11:35 AM

Hm, maybe the whole idea of making a 'just' system is wrong. To some a blog will be interesting because it has many readers, to others because it has many early-adopter-RSS-readers, to others because it covers a niche, to others because of its writing style.

A newspaper with a high circulation might be boring to me because it covers the wrong region of the world.

Different methods of measurement measure different things. Maybe it makes more sense to build separate tools which measure what you suggest and let the READERS decide what "toplist" to pick.

The marketers will optimize their strategy on totally different data anyway (views, PIs, clickhtrough and such).

Posted by: OliverG at August 8, 2005 11:36 AM

My trackback is failing for some reason. One of the many possible reasons that inhibit some conversations when the technology gets in the way.

What I think we are after is using technology to draw up a better measure of an audience. How engaged are we? I do more in my posting at

In the revision being drafted, I will be formulating a problem statement as Jack mentions. It is good to begin to agree on the problem were solving before we can begin to solve the problem.

Posted by: Steve Sherlock at August 8, 2005 05:29 PM

Very helpful post, Mary, thanks.

I may have missed it, but in the table above, should "frequency of posting" and "average length of post" be measurable criteria as well? They're certainly available by the time and date data on most blogging systems, and via word counting software.

Not sure what inferences you'd gather though from frequency of posting, since a lot of great blogs don't post very frequently.

At the same time, I'd venture to say that a majority of the "Top 100" blogs tend to do short, bursty posts several times a day.

How do you adjust for this bias in measuring/ranking systems is another important question, methinks.

thanks again.

Posted by: michael parekh at August 9, 2005 08:43 AM

I find this urge to find "another" ranking system somewhat puzzling. What is the purpose of such a ranking system? I share your distate for it, yet you somehow think it is something essential.

I think a close analogy can be made with books. Do you choose what you read by the sales rank of a particular title? Certainly some do, and some stores make shelf space available on this basis, but most of what I read is not by driven by any particular rank. It is driven by interest and referrals. Reviews and bibliographies and libraries are useful in this regard. I would think that in that regard publishing reasonably good reviews of blogs would be more beneficial than finding another way to assign rank.

Algorithms are great for some things, but not particularly for finding something that is interesting to me.

Posted by: Jack Dahlgren at August 9, 2005 02:13 PM

you have really hit the nail on the head. i've been playing around with the idea of a better blog search concept for a few months, and you have done a great job articulating some of the key issues. you may be interested in a couple posts of mine on the topic:

cheers, mark

Posted by: Mark Evans at August 10, 2005 10:29 AM

I like the idea of coming up with a more varied set of metrics for judging blogs than just incoming links. But why should we restrict things to making only One True List of bloggers? If we have a varied set of metrics, let each reader create their own list based on the metrics that matter to them. Maybe incoming links works for some people. For others, only personal referrals matter. For others, it may be writing style or frequency of posts. The power of the Internet is that we are no longer restricted to mass media trying to fit everybody - we can create our own media that is completely customized for us, our own personal blogosphere.

I developed these ideas further in a post on my blog, but for some reason, the trackback didn't work, so I'm commenting manually.

Posted by: Eric Nehrlich at August 10, 2005 10:46 AM

In many ways measuring blog impact is only the tip of the iceberg in measuring the centrality of a person or an organization and their ideas.

So many good parts of the discussion happen through back channels, side conversations, in person, ad hoc little email lists, and the like that the blog-only measurements are always skewed no matter the algorithm. You simply don't have the data.

Posted by: Edward Vielmetti at August 11, 2005 07:36 AM