January 15, 2005

Tagging at Technorati, Flickr and Del.icio.us

Technorati's new tags page has been getting lots of play in the blogosphere.. since it went live Thursday. It's a brilliant idea, matching tags from Del.icio.us, Flickr and blog post categories as they come through RSS feeds, and then displaying those photos together with posts that match. Of course, tags like general.jpg are big because people have that as a category for their blog posts... as are other categories. David Weinberger noted that blog categories aren't really tags, because they aren't usually granular the way tags often are, and so there are results like this, or or whathaveyou because people want broad general buckets to put posts in, and with a post categorized as "general" on a page on their blog, in that context, there is a kind of meaning that is lost on a page like Tags. But still... the two sets of tags along with broader categories together produce very interesting results. Also, the photos are beautiful and make the pages far more engaging to read than when they just had text. Searching for interesting serendipitous meanings that occur while glancing between the two types of information is really fun.

tags1.jpgThere is also a fourth way to get Technorati's Tag page to pick up information, and that is to use a rel='tag' link. This is done by putting a Technorati link (transparent, so other blog companies could use the links, but still proprietary to Technorati) around some words. The words the tag goes around do not become the tags. Rather, the tags are picked up in the link.. so in this example the bolded word is the tag Technorati's system picks up: < a href="http://technorati.com/tags/tag rel="tag" >words wrapped by tag href link< /a>. (note there are extra spaces in this example.. if you want to cut an paste this, remove the space just after each < so they function correctly). Though I don't think these tags will get used a lot, relatively speaking, because as the blogosphere gets bigger, the bloggers overall become less technical and won't have a clue what this is, and they shouldn't. The answer is probably that Technorati and the other blog companies should cache posts, and let users tag them on their sites as they read them.

What I'm wondering about is how quickly the spammers will figure this all out, and use it to their advantage. Currently, even though I block comment spam across my blogs, and know that Technorati, Feedster, PubSub et al, as well as Google, don't log comments or at least comment links because of the spam problem, the comment spammers try ever increasingly clever tricks. They might leave 500 comments in a hour (like I wouldn't notice) each with a different IP address, a different URL they want linked to for google juice, a different return email address, different products. All in the same blast. Removing them is automatic, but if they are clever, they'll figure out just how to get one or two through.. and if I believe I've gotten them all, they've succeeded with just a couple.

I posed the tag-span question to a friend at technorati via IM on Thursday and they indicated that since they block spam blogs, they'll block spam tags, too. Fair enough. But in this case, there are three systems, not just Technorati, that need to block spam, and with these three, the possiblity exists that partial spam could be cleverly spread out across the three, in order to come together to equal a spam situation. How long til the spammers figure this out and use it to their advantage across these different sites?

I could see a spammer putting up a photo, relatively benign and not at all spammy, but with specific tags that matched a blog post, with links to spam sites, and tags designed to match the photo tags, but not look very spammy on their own. Then, with some coordinated tagging through Del.icio.us, so that those blog posts matching the tags from photos matching the Del.icio.us links, the blog posts and photos would show up together in Technorati Tag page results. Depending on the goals of the spammers, and their cleverness, it might be very hard for individual systems by themselves to see the entries as spam, or to use the community moderation on any one system to realize what is happening. It would be in combination that the information from all three systems would constitute spam.

Part of the problem I think is in the nature of the spamming, which gains exposure through short windows of time, and has value even if a very very small percentage of viewers actually click the links or see the words. Since the Technorati pages would only show posts and photos for a short period, hours maybe, the spammers could succeed with regularly changing information. Recently, I've been getting comment spam (blocked of course from appearing on the front end of my blogs, but I can see it on the back end) for hand cream, and pet food (like we didn't learn anything from the first bubble...) from what appear to otherwise be legitmate companies that are just looking to capitalize on something they perceive as providing value even if it doesn't really. It's not just mortgages and porn. It might be harder to recognize than we think because spam is changing, and spammers are very very clever.

Disclosure: I used to work at Technorati, and I'm friends with many of the folks there.

, , , , , , , , ,

Posted by Mary Hodder at January 15, 2005 06:10 AM | TrackBack
Comments

Here you have a collection of content items, each with tags that have been defined using an unconstrained vocabulary and that have been applied in an inconsistent manner, which is indexed to recognize the tags.

If you're interested in browsing a large collection of "stuff" without any particular objective in mind, it can indeed be interesting.

However, if you're searching for a specific concept, you'd probably be much better off using simple full-text indexing (ignoring the tags, or at the very least, diminishing their significance in the search).

Posted by: Terry Steichen at January 15, 2005 12:26 PM

Regarding the spamming issue—this is surely another reason we need to solve the persistent digital identity problem.

Some provisional thoughts (draft form): http://www.i-together.net/weaverluke/2004/12/exploring-apparent-polarities-in.html

Posted by: weaverluke at January 17, 2005 02:16 AM

Scott Rosenberg has an interesting (and cautious) perspective on this here: http://blogs.salon.com/0000014/2005/01/17.html#a810

Posted by: Terry Steichen at January 17, 2005 01:26 PM