August 09, 2005

Lotta Linkin Going On.. Or Not

I wanted to summarize some of the very interesting things people have been saying about making a community based algorithm for understanding topic communities in my post Saturday, Link Love Lost.

Elisa Camahort at Worker Bees Blog
:

    ... all of this talk and tempest around some relatively new companies and their tools makes me wonder why people don't get as up in arms about discussing the algorithms behind general Internet search tools. Oh yes, people occasionally compare their positions on Google vs. MSN vs. Yahoo search, but there's little accusation associated with those comparisons.
    Why is that?
    ...
    3. Most important: they don't provide an accounting of the links they are using to calculate their rankings.
    Why is this last one important...because it removes the incentive to link to sites simply to get their attention and potential links back. Look, from a publisher's perspective it's great to see who's linking to you. Understood. But it also encourages people to link to the "top" blogs because a) they hope to be noticed by said blogs and b) when someone looks at the top blogs, they can see who's linking to them, and the linker will be on that list...and therefore might hope to be noticed that way too, even if said top blog doesn't link back to them.
    The point of counting links is to calculate relevancy. But the results are distorted because the very transparency of the web of linking in the blog community encourages dishonest linking patterns....

In blog search, the point of counting link *sources* (which is what Technorati does to make link rank.. and the top 100 of those end up on the Top 100 list) is about what Technorati calls "authority." This is different than relevancy. It may or may not get a searcher anywhere near relevancy as they search for a term or URL, and relevancy is far harder and more complex than counting link sources that link to a blog.

    I know it's heretical to make any argument for opacity in the blog environment, but a little more of it might lead to more authenticity in our blogging, and in the search tools for blogs.

I think it's perfectly reasonable to talk about every aspect of this, including creating an opaque standard, if that's the best solution.

Eric at CollabuTech:

    There are many implications for the corporate blogosphere. How do you measure the worth of contributions? How do you help people find "blogmates" who have affinity for and knowledge of similar topics? Do you encourage a particular pattern of linking? What norms do you establish?

Is worth something we want to establish, or do we want to let readers retain responsibility for judging what they read?

Shelly at Burningbird:

    Yesterday, Mary released her new effort to identity alternate algorithms based on a dinner she had with Ross, Doc Searls, Halley Suitt, and others in Paris a couple of months ago. It’s a very detailed and thoughtful post, and I respect the amount of work she put in it, but it seems to me that no matter how much the community is involved in this effort, it’s just propagating the same problems, because the issue isn’t about technology, it’s about people and how we behave.
    ...
    think Mary should stop with …I hate rankism. I understand the motivations behind this work, but ultimately, whatever algorithm is derived will eventually end up replicating the existing patterns of ‘authority’ rather than replacing them. This pattern repeated itself within the links to Jay Rosen’s post; it repeated itself within the speaker list that Mary started for women (�where are the women speakers�), but had its first man within a few hours, and whose purpose was redefined within a day to include both men and women.
    Rankings are based on competition. Those who seek to compete will always dominate within a ranking, no matter how carefully we try to ‘route’ around their own particular form of ‘damage’. What we need to challenge is the pattern, not the tools, or the tool results.

She's right.. whenever there is a measurement, a power law develops where those at the top sit and the rest bend their behaviors toward them, trying to attain top status they don't have. Link counts mean people change their behavior to get more links. It's not the spammers I fear, but us.

But if we get rid of rankings, and instead see topic based communities with long standing conversation, can we get out of some of that power law dynamic? I'm not sure. Maybe not. Maybe we must simply refuse the metrics all together. I think it's an open question.

Stowe Boyd at Get Real:

    Instead of developing a single, open source, mega algorithm for determining blog value, how about developing a simple standard for publishing blog metrics so that individuals or groups could easily collate various sorts of interesting metrics about blogs into meta-indices?
    For example, imagine that I were to create an online solution, let's call it Blognetter, that would discover the centrality of any given blog in the implicit social network that the blog is part of (this would be a very useful tool, by the way). Pointing Blognetter at Get Real would discover links from Get Real to Mary's, Doc's, and Ross' blogs, and vice versa. Using various parameters, it would rapidly determine a network that defines a community, of some number of hops via links away from Get Real. Blognetter would calculate that Get Real is connected to and from a specific number of those other blogs. That service could then provide that data in an agreed upon XML format.
    ...
    ...a collating service, let's call it RankOut, could aggregate these various feeds related to Get Real, and any RankOut user could override the default weighting built into RankOut. RankOut may "know" what the feeds "mean" in a sense -- the builders of RankOut may be aware of the point of Blognetter, for example....
    And lastly, specific rating services -- the Robert Parkers of the blogosphere, if you will -- could then publish their ratings, based on what they deem to be most important....

These are interesting suggestions.. I'd like to see topic communities, as I proposed in my post, with a conversation weighting.. that would show conversationalness over time. But as Shelly points out, people will behaviorally lean toward any system, no matter what it is. As for ranking people, I have real trouble with that. Even Robert Parker acknowledges that people just look at the wine scores (90+ means they buy the wine without reading the review). I think rating each other would produce crazy results, where just like junior high, we were all so concerned about being popular. For me, the value lies in understanding cohesive communities.

Rachel at License to Roam:

    The Technorati Popularity list - you can ignore it, love it or hate it for lots of reasons. It's the equivalent of the All-time greatest hits chart, looking at total number of links over time. But just because Elvis or The Beatles would always be on top of the charts looking at total sales, does not mean they would be on the chart if there was a smaller timescale.
    ...
    (In response to Jason Calcanis' bounty of $50k for a better ranking tool) I'd add another requirement - the ability to slice and dice by category/metadata. That of course would need the categorising data to be collected from the blogs or when blogs are registered with the search services, but I can see the need to be able to assess 'popularity' with a niche, ie movie blogs, music blogs etc. But that's a longer term desire.

But is it popularity within a niche? Or do we have metrics to show conversation, collaboration, interest? I totally agree with wanting to see topic communities.

    In putting this challenge up, you could argue that Jason is acting in the 'old model', or, more likely the 'male model'. There's a problem, here's a solution, throw money at it and get it fixed my way. This is in contrast to the more collaborative, discussion based way I see Mary Hodder's proposal developing. So is Jason just perpetuating the male domination of the space by making more lists based on popularity? I don't think so; he's trying to make what we have (a subjective, measurable analysis) better and is prepared to encourage it.

I don't see the link count and the corresponding rank as necessarily male, but rather as a legacy media model. However, legacy media measurements are were developed at a time where men completely ran that business. So they naturally reflect that point of view. Now that digital media allow us to measure things easily in many more ways, and we have many more styles of blogging than just those that fit legacy media paradigms, why not figure out better ways to discover interesting communities and discussions?

danah boyd at M2M and Apophenia on the biases of links:

    There are a few things that we know in social networks. First, our social networks are frequently split by gender (from childhood on). Second, men tend to have large numbers of weak ties and women tend to have fewer, but stronger ties. This means that in traditional social networks, men tend to know far more people but not nearly as intimately as those women know. (This is a huge advantage for men in professional spheres but tends to wreak havoc when social support becomes more necessary and is often attributed to depression later in life.)

And yet, in all of these systems, a link is a link is a link.. with no distinction for type, or network ties, or styles of linking.. or God forbid, types of links (as in no-vote, + or - in the rel tags -- who knows what a blogger means when they use those tags).

    While blog linking tends to be gender-dependent, the number of links seems to be primarily correlated with content type and service. Of course, since content type and service are correlated by gender, gender is likely a secondary effect.
    Interestingly, there are distinct clusters of norms with linking in blogging, not a coherent and consistent one. The search engines (and the Technorati 100 and PubSub’s Daily 100 Top Links) are validating one of those clusters, regardless of whether or not that is what searchers are looking for. The Top 100 is a list of blogs who either fit into those norms or have adopted those norms in their patterns (most commonly the companies).
    ...
    These services are definitely measuring something but what they’re measuring is what their algorithms are designed to do, not necessarily influence or prestige or anything else. They’re very effectively measuring the available link structure. The difficulty is that there is nothing consistent whatsoever with that link structure. There are disparate norms, varied uses of links and linking artifacts controlled by external sources (like the hosting company). There is power in defining the norms, but one should question whether or companies or collectives should define them. By squishing everyone into the same rule set so that something can be measured, the people behind an algorithm are exerting authority and power, not of the collective, but of their biased view of what should be. This is inherently why there’s nothing neutral about an algorithm.

Very interesting stuff. Keeping this in mind as we discuss what we make will be key to gaining something we consciously want to describe ourselves.

Assaf at LabNotes:

    Start with this blog, use it as context and search for the keyword ‘blog’. First observation, there’s a lot of links coming out of this blog. Most are links to sources I find interesting, relevant, authoritative. Others may disagree, but in this particular context, my outbound links rule. Anything these sources have to say about ‘blog’ should be ranked highest.

One problem with outbound links are that they are extraordinarily susceptible to spam. However, we have to deal with that anyway, so thinking about how we weight outbound links is valuable.

    Second observation, those blogs link to other blogs, which they find interesting, relevant, authoritative, etc. So that’s a second hop that increases the sphere of relevance. Repeat enough times and you’ll spider the entire Web, something to do with six degrees of separation. But now we’re just duplicating Google.

There may be a way to derive who uses whom for filters, but this may be reflected in RSS subscriptions and reading habits. However, there are serious identity, privacy and data ownership issues (users should, in my opinion, own their data) to figure out first before we can think about using this kind of information.

    Third observation, limit the number of hops to a small set (say six), and decrease relevance in proportion to distance. So a blog four degrees of separation ranks less than a blog two degrees of separation. Interesting patterns start to emerge.

Link decay is a very interesting idea. PubSub does it now, but it's not clear to me yet what the effects are. However, I plan to discuss it with them so that I can understand it better. Bob Wyman explains more in this post.

The Vision Thing with Enough with the Lists:

    ...Mary Hodder's post about "better algorithms" (sorry to generalize, but my eyes glazed over and I have yet to read the whole thing), and in a nutshell, I’m really sick to death of "lists." If you've seen one Top N list, you've seen them all. Wake me when there's a list that actually conveys something interesting.

This is not about making a single list. This is about making a metric that takes several factors into consideration, to find topic groups who consistently talk about something. At least, that's what I first proposed in my eye glazing post (sorry about that). However, that may not be what we end up with, as I believe the community should decide what it wants. If something else is better, let's try it.

Posted by Mary Hodder at August 9, 2005 11:55 AM | TrackBack
Comments

Thanks for consolidating so much of the talk going on.

I can appreciate in a vague, gut way that "authority" on a subject and "relevancy" to a subject are different.

But can you articulate that difference...especially when it comes to the technical difference between how Technorati uses links to assess authority (and therefore placement on their list) and how Google uses links to assess relevancy (and therefore placement in keyword search results)?

Thanks again!

Posted by: Elisa Camahort at August 9, 2005 02:00 PM