August 19, 2005
Adina Levin on Conversation Clouds... And Mitch Ratcliffe responds with Cloudmakers R Us
Awesome stuff by Adina Levin on Conversation Clouds which I'm just going to repost:
The cloud would be a picture of a conversation surrounding a person or a topic. The picture would show the relationships between the participants in a conversation. The densest areas would represent people who frequently cross-reference each other over time.
You can start with a participant (the url of a person's weblog), or a search term (a word or tag) Nodes are clustered based on closeness, measured by number of links and reverse links over a period of time (comments, too, if you can measure them).
If the picture starts with a link, then that link is at the center of the picture. The picture shows the links between the first node and the other nodes, and between other nodes that are connected to each other.
If the picture starts with a word, topic, or tag search, then the cloud contains a cluster of blogs that include the term or tag in the last time period. The picture shows lines between blogs that link to each other. Unlinked blogs are thrown out.
The cloud is built from a data set over a time period; the user should be able to scale the time (conversation over a week, a month, six months) The conversation cloud would need to provide ways to navigate through conversation space. If you click on a blog, perhaps you re-center around that blog's conversations. If you click on a tag or topic, you search based on that. You'd need to experiment with several ways of allowing browsing out from the first cloud.
This type of picture would not measure rank. Instead, it would illustrate the connections within subcommunities.
Cloud-browsing represents a pattern of blogsurfing. A reader might start with Mary Hodder's post on blog metrics, and then traverse to Dina Mehta, danah boyd, Stowe Boyd, Ross Mayfield.
The cloud would show in graphical form what a Technorati or Blogpulse search would -- who linked to the post. And it would also illustrate the repeated links and cross-links as people reply. If you zoomed out the time horizon, you'd see some relationships become more obviously dense, with repeated patterns of links and counterlinks.
I think this sort of presentation would get more of what we're looking for -- a picture of the relationships in a community that reveals participants, both loud and quiet. The ability to browse the conversation.
The results would be more interesting than a diagram of an email thread -- where participants already know who's talking to whom. It woudn't be particularly rankist, since webwide popularity isn't relevant to the picture. It would let you browse to related people, or related ideas that the same people are talking about.
The next step is to test this idea, maybe with a manually drawn picture, and then with a dataset and a toolkit like TouchGraph. This seems like a good experiment to me. It could be somebody's done this already. Or somebody's tried this and proved that it doesn't work. Please share if you know.
p.s. Zawodny talks about the need for content discovery. I don't know about you, but a lot of the content that I discover comes from browsing through a conversation and finding voices that I want to keep hearing.
And Mitch Ratcliffe, who has a company called Persuadio which visualizes relationships between data, responds with Cloudmakers R Us:
I've been following this discussion, mostly holding my tongue because it may look self-serving to respond with "here are the pictures you want of blog relationships" by pointing at the MyDensity site we've put together to show off the social analysis tools built by Persuadio. I also realize it would be incorrect, as we've focused on the big picture to the detriment of a small world.
Simply put, like many of the indexers, we've tried to capture the role of any blog or Web site in any conversation (being about more than blogs has been important to us from the very beginning). Meanwhile, it would have been simple all along to provide what Adina calls a "conversational cloud" that shows the relationships around a single posting or Web page. And, frankly, it took someone asking for that simple solution to realize it was the first thing we should have offered instead of trying to solve the really huge problems we're wrestling.
The current MyDensity maps show all the relationships around a blog, rather than the links to a single page (which we can do, but just hadn't).
Unfortunately, we hear often from customers that they want a "top" this or "top" that list and had decided to focus on that. With limited resources and real money coming from these people, we paid attention. It is what they are ready for.
The desire to see the big picture is endemic in a changing market. Top 10, Top 20 or Top 500 lists make a certain amount of sense if you are trying to aim for plain old low cost-per-thousand (CPM) or cost-per-impression advertising deals. Most advertising and marketing people aren't prepared to think outside the CPM box, and if they do, they think about relatively ineffective cost-per-click (CPC) ads.
The contours of this market are very poorly understood. ComScore, the Reston, Va.-based research firm, in an August 2005 report describes visitor traffic to the top blog hosting sites in aggregate even though the blogs hosted by those services, their authors and readers share few demographic or behavioral characteristics. For marketing and advertising purposes they are separate publications, not a monolith that can be compared to the traffic of the New York Times—however, ComScore does make that spurious comparison. Yes, more readers (ComScore does not distinguish between readers and bloggers visiting BlogSpot to author their own sites, confounding any attempt to characterize audience size) may visit BlogSpot in a month, but the information they are consuming and commenting on there is disorganized; by contrast, the editorially coherent sections of the New York Times create viable venues for addressing audiences with specific interests.
Marketers are stuck between that familiar composed environment of the Times, with all its shortcomings, and the apparent anarchy—from their perspective—of the blogosphere with all the opportunities it represents. Every discussion of a "top" list is predicated on mapping the reach of a site to the community around a blog or group of blogs. There's a hunger for something recognizable to grab hold of, which is why I keep harping on the question of how to get today's content owners to start across a bridge to content sharing.
If we can solve all these problems by laying out the flow of influence, the role of trust and conflict in discussions, magical things will happen to the marketplace of ideas.
When it comes to conversations about specific topics or just conversations between people, though, there are multiple dimensions of value, some personal—the kind of information in the clouds around a single posting—and some profoundly economic: If you can target advertising based on behavioral characteristics, the value of an ad can soar. If you know what people are talking about, you can guess why and position a contextually relevant and high-value CPC ad alongside the content of the page.
If the marketer were really radical, the ads would go away and the message, with all necessary disclaimers so that it would not pollute the content, would come through as part of the conversation.
When it comes to blogs, the content is so personal and bloggers so interested in understanding the intellectual currents around their writing, audio or video, that the first responsibility of a company that wants to be of service to the market is to be of service to the bloggers. So far, Persuadio has been of service to a couple customers, but if we cannot get more information to bloggers we'll forever be outside the market we most want to serve. For most of us bloggers, it really is about the neighborhood (Ross Mayfield's discussion of the Rule of 150 play well, even years later) we're talking with than our rank in the whole blogosphere (though such ranking is a guilty pleasure the honest blogger will cop to).
That said, as we map blogs we also map the rest of the Web and the relationships between all information, individuals and organizations we are often confused as primarily a mapping service rather than an analytics service. We want to offer information about who is talking, their relationships (even the hidden ones) so that everyone can judge ideas and movements based on the fullest information. We've been aiming at that, but thinking like an old-style analytics company, so we're going to change, but I hope you'll remember that there is a lot of social measurement going on in the background that have both social and economic value.
We'll have link clouds for you very shortly. Allow us a bit more time and we'll let you configure the variables of the map, so that you choose to include current or archival links in the calculation of influence, as well. We're awake to this, now.
Nice! Go look at Mitch's diagrams.. but you get the gist. It's just so cool.. I figured why rewrite, just put up their words!
Posted by Mary Hodder at August 19, 2005 04:30 PM
I've been thinking about this same problem from the perspective of search engines but also as a problem of literary quality, which might sound odd to you techies, but I think it is relevant.
During the Culture Wars in the late 80s in academia, the debate around multiculturalism made everyone freak about how society would be destroyed if everyone didn't read the same 'golden bookshelf': the classics, the canon, the 10 best or the 100 best books of all time. Of course: best for what? best for whom? best for what specific purpose? I was in this debate, then left grad school, went to work in tech and eventually for a major search engine, and then back to academia again.
I have been talking about an approach to literary quality that would take for granted the idea of multiple, dynamic canons. Besides conversation clouds (which in literary criticism we call intertextuality; books become important or relevant not just by being talked about, but by talking about other books) it is also helpful to factor in other qualities that are internal or external to the text. And if there were some degree of transparency to the algorithms, and users could adjust them, what then? If they were open source algorithms, then institutions or companies could develop their own secret branded algorithm (i.e. Harvard's top 100 books, or Technorati's top 100 blogs.) And that would have value to them and to others.
But if I as a user of the search engine, or quality-judgement algorithm, or interest-matching algorithm; then I could turn "published by major publisher" all the way down and turn something else up. Probably this would best be done as a set of settings/preferences files that could be user-created and fiddled, and the search user could toggle easily between them. The idea applied to traditional "literary" texts and to blogs - it's much the same idea. Much of literary criticism is people flailing around trying not to "hyperlink" texts in the cheesy fashion this word usually means -- instead they want to link and tag and annotate as people are doing with literary/textual production in the world of blogs.
I've been talking about this in humanities grad school and people occasionally get the idea. danah suggested to me at Blogher that in the tech world people would know what I was talking about... and wow, was she right.
In case that sounds utterly mad: think of a book you think is important, and then think of it as a book in conversation with other books. Jacqueline Carey's "Banewreaker" books are important not because they're high culture (they're in fact super purple prose) but because they're a conversation with Lord of the Rings and with most of the fantasy genre.
People fuss about whether blogs are journals, or diaries, or journalism, but I think we can benefit from seeing them as literature. And literature that is unmediated by institutions which claim to control the production of knowledge.