A discussion about creating a new metric for understanding blogs is something I think the community should have the chance to participate in to find a different way of perceiving a blog, or the ripples a blog makes. Partly I believe this because of the frustration people express about Google's secret algorithm for pagerank, where they feel something this powerful should not be secret (update: the algorithm is not secret but the ordering of the search results is secret). And partly because I see that blogging is a opportunity for people to talk transparently, so why shouldn't the algorithm used to express our weight in the blogosphere also be open. Bloggers should have input about the importance of one social gesture over another, one metric over another, and know what it is that is included because it will be used to describe them. And also, I cannot assume that the ways I read blogs is the same as everyone else, so I'd rather have a community algorithm in the sense that the community has commented on the weight of some metrics over others within the algorithm, and not just assume that the ways I or others weight these gestures in our blog search are correct for everyone.
A closed algorithm is purported to be a kind of spam control, as opposed to an algorithm that is open. But a community based standard means the community can help police those that try to game it, if we put in place mechanisms to flag those who abuse the system. Transparency as it exists in open source software, and as it should exist here, is the opposite of security by obscurity. But creating this is also an experiment, and help is needed in order to make creating a community based algorithm possible.
Currently, blogs are measured in systems like Technorati or ranked in PubSub by links or by number of subscribers to a feed in Feedster. In particular, these are the not very interesting, subtle or telling measures used to make indexes like the Technorati Top 100 or the PubSub 100 or the Feedster 100. In Particular, the Technorati Top 100 is based purely on inbound links. All of these lists tend to favor those who blog in more general, popular topic areas, and not those who are specialists in an area.
For many bloggers the relevant sphere of influence is not overall popularity, as those indexes express. It's influence and connection within a community. And the relevant measure of connection isn't the number of connections -- it's the depth and impact of those connections. This is about celebrating the niche, and measuring engagement over time.
Links alone are not a good metric for authority. There are several reasons for this. But the most important, I think is that as consequence for the blogosphere, it harms the way people see blogging. People know some bloggers want influence; many bloggers know they want it too, though many others don't want it at all. Counting links is very much like counting subscriptions to magazines in order to sell ads, as far as comparing it to a number not reflective of what is actually going on with the media it's meant to reflect. Link counts alone are an analog media model, but online media is dynamic, and what is digital often has the possibility of getting much closer to finding smaller, more granular, and more interesting ways of perceiving things, that are much more interesting, and orthogonal to legacy media models counting eyeballs.
I should also be clear that this effort, and the discussions I've been a part of this topic do not have the goal to make another list to replace the Technorati Top 100 or the Pubsub Link Rank 100 or the Feedster list, or any other top (insert #) list. Rather, this is about going beyond lists and links, to understand that the social relationships of expression between and across blogs is really about searching for a "metric for identity" or "metric for affiliation", "metric for community", or "metric for influence".
So a couple of months ago, at a dinner at Les Blogs, a group of us (including Ross Mayfield, Stowe Boyd, Doc Searls and Halley Suitt among others) talked about what it would mean to make an index that could give a clearer sense of a blogger's reach and influence, that might upend the inbound link counts to give some clarity to what is now opaque and hard for us to see blogs we are unfamiliar with but want to find context. Actually, the service was taking a while, and with 30 or so bloggers in the room, eventually things turn to blogging. We started talking about the issue of inbound links and how, counted up and reported as a kind of "attention index," as a show of interest or attention or conversation, they weren't very interesting or telling on their own, partly because they lump together all types of links, no matter when the links were made or where they are from (blogrolls or posts).
Part of what we want is a rich user generated ontology resulting in topic groups that is constantly adjusting to find what's delightful, useful, interesting across blogs. And a more complex metric for understanding those topic groups and individual users as they blog memes and interact with each other, with some context around those bloggers, would help quite a bit.
This issue came up again in force at Blogher where the opening discussion talked about how to play, or not play, the link game. Much of the room was, to one degree or another, very frustrated with using inbound link counts as an expression of attention, and how the derivation of "A list" bloggers comes from that, ignoring the many blogs that are very influential or conversational in their topic areas.
To automate this process, or create a score, is to judge the stone by the ripples in the pond. Right now, the Technorati Top 100 list is obtuse enough that we can all agree that it's not useful for judging 14 million blogs, because blogs are as different as their authors and those who would make a link rank for a person in one's topic community. As for lists of bloggers based on the number of subscribers, like the Feedster Top 100, we know that in this instance, the list is a count only those users of Feedster, so it reflects a small percentage of overall readers.
So I hear people dismiss the current indexes all the time. By doing so, we let the opacity of inbound links counts be a barrier to rankism or scoring that we don't really want to make more precise. The obtuseness is useful because it's can't be relied upon, and therefore the confusion as to the value of a blog is left to be determined by readers through their own methods, by those who look on their own for the ripples across blogs, combined with some reading of the blogs. And this may make many people happy. For me, I would rather have people do their own assessment of my blog because they read it or participate in discussions I am in, seeing what the activity is around it, to judge it, verses relying on a score or count of inbound links.
However, I'm beginning to see many reports prepared by PR people, communications consultants etc. that make assessments of 'influential bloggers' for particular clients. These reports 'score' bloggers by some random number based on something: maybe inbound links or the number of bloglines subscribers or some such single figure called out next to each blog's name. The bloglines measure in particular is not a great one on it's own, because RSS aggregator users are reported to be only approximately 20% of the blog readers, though I believe it's really half that, because my own user studies show that many who are asked if they use an RSS aggregator say yes, when in fact they don't know what it is (they just think they should know, so they answer yes to the question of whether they are users). And of those RSS aggregator users (I think it's 10% of blog readers), and of those, 50% supposedly use Bloglines. But my own assessment of Bloglines is that maybe 60% of their accounts are probably used regularly (not abandoned or very rarely checked), so if they have 35% of the RSS reader market, a Bloglines score might only reflects 3.5% of the total blog reading market -- a very low sample to judge the readership of a blog generally. Using the Bloglines count only counts users of that aggregation service. As a point of comparison, Bloglines shows 20k subscribers of BoingBoing, but Feedburner has 1.2 million subscribers to the BoingBoing feed itself, because they produce the feed, though those counts are only discoverable to the blog owner currently.
And these kinds of counts may or may not reflect the actual readership because users may not necessarily open the feed or posts. On the other hand, I think you do have to weight RSS users a little more heavily because right now, as that user base tends to be early adoptors, influencers, and a market that also tends to be the blog writer set. However, this won't always be the case. And I'm not confident that these PR/Communication agencies understand how to read this kind of information, and while it's one thing to gage the influence of a blogger who writes about their clients by reading the post, it's another to make decisions to send sponsorship or advertising based upon these kinds of measurements.
So the tension is, do we in the blogosphere figure out a more sophisticated, open standard based metric that reflects the way we see blogs, within and across communities, in order to score blogs? And do we do this within topic areas? Or does using a more sophisticated algorithm across all blogs make more sense? Or do we allow this all to be done for us, possibly in an opaque way by some of the blog search engines or by people who are trying to figure out blogger influence and communities for their clients, or do we write off those efforts because we know they cannot possibly understand us anyway?
I have to say, I've resisted this for the past year, even though many people have asked me to work on something like this, because I hate rankism. I think scoring, even a more sophisticated version of it, akin to page-rank, is problematic and takes what is delightful about the blogosphere away, namely the fun of discovering a new writer or media creator on their terms, not others. What I love is that people who read blogs are assessing them over time to see how to take a blogger and their work. But more recently, as I said, I'm seeing these poorly done reports floating around by PR people, communications companies, journalists, advertising entities and others trying to score or weight blogs. And after hearing the degree to which people are upset by the obtuseness of the top counts, and because they do want to monetize their blogs or be included into influencer ranks, I'm at the point where I'd like to consider making something that we agree to, not some secretly held metric that is foisted upon us.
If we are going to do this, I think the algorithm has to be open source, at least as far as the weighting of social gestures and what gestures are to be included. Many people are upset that page rank is secret, and that something so powerful online is not open to scrutiny by the community it ranks. So this is an attempt to have the community determine the social weighting as it goes into algorithm, and have it be transparent to the community.
At the Les Blogs dinner, a group of us made a list of things we might include in this algorithm. This list is an attempt to figure out what things we look at when we're trying to figure out where a blog is at, in terms of interest, conversation and value:
I think a newly made blogroll link now, in the age of 14,000,000+ blogs is far more telling of community and interest, than a blogroll link made five years ago when there were 100,000 blogs (in other words, few choices about where to link). And of course, links made in posts, which are more indicative of conversation or immediate attention about specific topics, are lumped in as well, with the same weight as a blogroll link, for the indexes we have now.
So.. below is a list based on the earliest discussion, but it really needs refinement and input on what is important, how it could be expressed and I'm looking for feedback to help define these issues better to help get the best set of social gestures weighted in the ways we see them across blogs for a community based algorithm.
A new metric could balance links with these other representations of activity (not all are available, but if we want them, we should ask tool builders and data aggregators to get these kinds of information for us). Note that many of these are subject to spam, and spam controls for them are implemented by the companies that track this stuff. Using a metric that incorporates those will require additional spam controls.
So then.. we talked about how important those kinds of information are us as we evaluate a blog or post, and then whether or not there was a number associated with that particular information or a ratio between two sorts of information that might be interesting, and whether it's information we have, and what we might do with it.
A new metric could balance links with the following items in this chart:
|Rank Element||Description||Weighted Value||Metric Base||Information Available?||Note|
|Inbound links: |
|links to a post||high||# count||yes||might age over time|
|Inbound links: |
|links to a blog||low||# count||yes||might age over time|
|Comments to posts||The kinds and numbers of comments others make on a bloggers' posts||medium||ratios within topic/post||yes the kinds and numbers of comments others make on a bloggers' posts||.|
|Blog server logs||expose how many readers and where they are coming from, though it's very rare that others can see this kind of information. There are places like Bloglines, Feedburner and Feedster that give some indication about how many readers there are.||High||#'s||yes||information is not public except in rare cases, but could ask for a tool that would share certain parts and ask bloggers to post or send a portion of this information using a specific tool, for sharing|
|direct mentions without links||direct mentions of a blog or blogger on other blogs (without necessarily linking)||high||#'s||yes||might mean that mentions that intend *not* to link would use a link with a 'no-follow' tag|
|indirect mentions||indirect mentions of a blog or blogger in terms of meme generation (HP algorithm)||medium||#'s||yes||have data, but would have to perfect the meme generation algorithms HP developed 15 months ago|
|2nd generation links||links to linkers of a post or blogger||high||#'s||yes||.|
|Subscribers||the number of subscribers to ab RSS feed, which can also be found at Bloglines, and Feedburner if they were willing to share this, or from bloggers if we had reporting tools to install on an individual's blog||high||#'s||yes||with appropriate tools and disclosure|
|time to read/length||the time spend reading a post divided by the length of it||medium||ratio||no||would require length of time data on post click through, reporting tools and disclosure|
|links to post and incoming traffic from them||links readers click through from, and the traffic overall in a post where someone has linked through to a post||medium||ratio: links/traffic||no||requires reporting tools and disclosure|
|links from post and outgoing traffic to them||the links readers click through to, and the traffic overall from a post where someone has linked out to a post||high||ratio: links/traffic||no||requires reporting tools and disclosure|
|topic frequency score||degrees of topics communities: first degree ripples for bloggers in a community might be those who blog mostly about that topic and frequently (a ratio of posts to topics?), second degree might be those who blog sometimes about a topic, and third degree ripples might be those that blog infrequently about those topics||high to low||score||no||.|
|outbound post links||.||high||#'s||high||.|
|outbound blogroll links||.||medium||#'s||yes||note: age out over time|
|emailed posts||From referrer||high||#'s||maybe||need tools for referrer logs|
|topic discussion||key word analysis of topic and meme discussion around topics the blogger discusses that match frequent topic group discussion||high||score||no||.|
|tagged urls||tagged urls showing attention from del.icio.us, furl||high||#'s||yes||description|
|Reputation scoring||reputation scoring system rankings like syndic8te that rate rss feeds||low||score||yes||.|
|tagged urls||tagged urls embedded together within tag structures in blog posts||medium||ratio of topics||yes||in a way, this sort of pointing with a tag attached could become a kind of topic measure, if we wanted to create a tag structure for that type of tagging from blog posts to other blogs or posts|
We wanted to see these measures used in an algorithm that balanced the weight of each social gesture, put against large data sets to see whether the resulting score or characterization felt right against what we know about blogs as readers and writers. One thing to consider is that some data sets are made up of spidered data (including blogrolls), while others are made up of RSS feed information (some partial and some whole posts, but there are no blogrolls in RSS feeds) and some are a blend. So we would want to adjust the algorithm for different types of data sets.
So this is my first post think about making an open source algorithm. And I'm wondering, is this a useful approach? I think it could be worthwhile, done right, and I put it out there to the blogging community to determine what is best here. As I said, after seeing what people who want to work with smaller topic communities are doing, it may be in blogger's interest to think about how this might be done so that is it more in keeping with the desires and views of the blogosphere.Posted by Mary Hodder at August 6, 2005 08:10 AM | TrackBack