Previous month:
August 2004
Next month:
October 2004

September 2004

It just works

RSS-aggregators haven't achieved the general breakthrough some of us might have hoped they would. Jeremy Zawodny writes about the latest developments over at My Yahoo, summing up the current state of play thus: 'Yahoo is in a unique position to ... bring RSS to the masses. ... My Yahoo has adapted to handle RSS/Atom feeds in addition to all the other content that was previously available'.

Some folks might argue that the world just needs to learn about RSS, download and use a desktop aggregator, and so on. That's true for some people, but probably not the majority. My parents, for example, don't care about the various technologies that make their e-mail work. They just want e-mail. They don't know about HTML either. They just want to use the web.

So this new version of My Yahoo tries to get us closer to the point that it Just Works. To make content discovery easier, there are multiple ways to find feeds: search, a small directory, a list of popular feeds, and even some editor's picks. ...

If you're already using a desktop aggregator and like all the features it provides, I don't expect you'll switch. You're an advanced user. You probably don't realize it, but you are. There's part of the population that does all of their e-mail using web-based mail services only. To others, that seems insane. The same will likely be true of web-based vs. desktop-based aggregators.

(How does this differ from Bloglines? 'My Yahoo isn't simply an RSS aggregator. It's still about pulling together lots of information into a single place. And not all of that information is available via RSS.')


Tim Berners-Lee: the Semantic Web

Technology Review interviews Sir Tim Berners-Lee, inventor of the World Wide Web, about the Semantic Web, 'which adds definition tags to information in Web pages and links them in such a way that computers can discover data more efficiently and form new associations between pieces of information, in effect creating a globally distributed database. Though part of Berners-Lee’s original intention for his invention, the Semantic Web has been 15 years in the making and has met its share of skepticism. But Berners-Lee believes it will soon win acceptance, enabling computers to extract meaning from far-flung information as easily as today’s Internet simply links individual documents.'

TECHNOLOGY REVIEW: For several years, you’ve been promoting something you call the Semantic Web, but people don’t seem too excited. Why not?
TIM BERNERS-LEE: It’s not the first time I’ve had this paradigm-shift problem. Early on, people really didn’t understand why the Web was interesting. They saw it in the smaller scale, and it’s not interesting in the smaller scale. Same thing with the Semantic Web.

TR: How do you get past that?
B-L: Right now we are just starting by putting applications onto the Semantic Web one by one and linking them up where it seems useful. But what’s exciting is the network effect. The vision is that we will get to a critical mass, where everything starts getting linked into an unimaginably large whole. Then, the incentive to add more to it rises exponentially as the value of what is out there also does. Because few people initially get this great “aha!” of connecting to a huge mass of Semantic Web data, it all has to be done by people who are convinced—who understand that it’s worth putting the effort into getting the thing off the ground.

TR: Then please explain: Why is it worth all this up-front effort?
B-L: The common thread to the Semantic Web is that there’s lots of information out there—financial information, weather information, corporate information—on databases, spreadsheets, and websites that you can read but you can’t manipulate. The key thing is that this data exists, but the computers don’t know what it is and how it interrelates. You can’t write programs to use it.

But when there’s a web of interesting global semantic data, then you’ll be able to combine the data you know about with other data that you don’t know about. Our lives will be enriched by this data, which we didn’t have access to before, and we’ll be able to write programs that will actually help because they’ll be able to understand the data out there rather than just presenting it to us on the screen. ...

TR: It would seem an impossibly huge task. How does the technology work?
B-L: The Semantic Web technology tackles the problem in two stages. The more mundane is a common data format. You can take a database or a calendar or an address book or a bank statement or a weather reading—basically anything with hard data in it—and make the machine write it in the basic Semantic Web language, instead of some proprietary or application-specific format. This solves the “syntactic” problem.

It still doesn’t solve the “semantic” one, though. For that, the Semantic Web first gives names to the basic concepts involved in the data: date and time, an event, a check, a transaction, temperature and pressure, and location. These are all defined just to mean whatever they mean in the system which produces the data—for example, “Transaction date as I get on a bank statement,” and so on. This set of concepts is called an ontology. Then, where there are connections between ontologies, such as when the date and time on a photograph is the same concept as the time on a weather report, we write rules to take advantage of these connections. This allows one to query the Semantic Web agent for photos taken on sunny days, for example. Bit by bit, link by link, the data becomes connected, interwoven. The exciting thing is serendipitous reuse of data: one person puts data up there for one thing, and another person uses it another way. ...

TR: Is there an existing application that shows how the Semantic Web can form such connections?
B-L: If you want to play with the Semantic Web, you can make a friend-of-a-friend file. In a FOAF file [the data component of a personal home page, formatted in a standardized way], you can publish stuff about yourself, your organization, your publication, places, or photographs. You can have a pointer that says “this is a photograph about me” and other data about the photograph, such as who else is in it.

To create a FOAF file, you must fill out a form, such as the one at www.ldodds.com/foaf/foaf-a-matic.html. From this information, a Semantic Web–readable text file is generated that you can add to your personal website. There are semantic websites that will pull that data up and give you things like a list of photographs linking you to somebody else. I’m three photographs from Frank Sinatra because I’m photographed with Bill Clinton who’s been photographed with one of the Kennedys who’s been photographed with Frank Sinatra. That’s a silly application, but it really shows the power of the reuse of information.

TR: Can you describe a more serious example?
B-L: It’s exciting to see industry focused on implementing these standards. Tool kits from HP and IBM, authoring applications from Adobe, smart content management solutions from Profium and Brandsoft, and search engines from Network Inference are all working to create a Semantic Web at various scales. These and other technologies are being adopted by communities that in turn revolutionize how these groups collaborate and communicate. This is what’s happening in life sciences, which we spoke about earlier.

In the U.K., the Semantic Web Environmental Directory is a prototype of a new kind of directory of environmental organizations and projects. Rather than centralizing the storage, management, and ownership of the information, SWED simply harvests data and uses it to create the directory. From a social perspective, there’s an application nicknamed Fatcats from FoafCorp [a Semantic Web project that extends the friend-of-a-friend format to corporate entities] that allows you to pick a company, and it shows you who’s on its board by displaying a graph of connected people. When you click on one of the people, it shows you all the boards they’re a member of. You can start exploring the spheres of influence in American corporate culture.

The exciting thing is when you find that one of these people has a FOAF file, and you start going from corporate culture into personal culture, and then into photographs, and then into weather information, and then booking flights, and then into booking restaurants, and then into figuring out what wine to have for a meal. ...

TR: Do you believe that the World Wide Web will be your most important contribution?
B-L: My role necessarily had to morph from lone designer through community agitator to lead architect and facilitator of consensus at W3C. But I suspect the Web will be my most important contribution—although it required being in the right place at the right time. The mistake, though, is to think that it is finished. The Semantic Web is just the application of weblike design to data; it will be many more decades before we will be able to say we have really implemented the Web idea in the full, if ever we can.


Guardian blogs

The Guardian is very much at the cutting-edge when it comes to the British media online. Here are its current range of blogs:

Newsblog

Newsblog is our flagship weblog, put together by our news team and augmented by dispatches from Guardian journalists around the world. Growing from The Weblog (launched in the summer of 2001) Newsblog makes a point of featuring other people's sites, an eclectic range of links around the web and - now - lively debate between readers and other contributors.

The Guide

The web version of The Guide - our popular Saturday entertainment magazine - sports a weblog front page bringing you updates every weekday. Don't miss the steady flow of gossip, impromptu reviews and bitchiness from The Guide team, alongside links to your favourite Guide regulars.

Onlineblog

Onlineblog - our technology weblog from the team that produces the Online section every Thursday - has become a firm favourite with readers, who nominated for best technology weblog in the 2004 Bloggies. Check in for the latest internet and technology news - including live posting from the industry's biggest shows and conferences - and knowledgeable discussion about the latest trends

Gamesblog

Enjoy playing games on your computer, console or mobile phone? Ever felt games magazines weren't for you? Then we hope you'll enjoy Gamesblog. Put together by games writers from the Guardian's technology section, Online, Gamesblog takes an informed, intelligent and - above all - entertaining look at the games world.

For an overview, see here.


That old Digital Rights tune again ...

Last weekend, Memex 1.1 drew our attention to a report looking at file-sharing in the TV, movie, software and music markets. This report, conducted by Jonathan A. Zdziarski utilising the services of Slashdot, is published here. At the school I teach at, we are preparing for a sixth form conference on 'IT and the challenge of change'. Speakers include Cory Doctorow and Jyri Engeström. Cory will be talking about DRM and, in the run-up to this event, I have begun chatting with Colin Greenwood (Radiohead), getting the views of an artist, someone without whom there would be no music to share in the first place. I hope we can have a good debate on this contentious issue. The report by Jonathan Zdziarski suggests:

there is a captive audience and a viable market in reaching the file-sharing community to generate revenue (without litigation). Because of the vast selection of media available to file-sharers, many are finding themselves exploring new music, movies, and even software they would not have normally considered in their purchases. There is demand, and demand creates market. The key to finding the market is adapting to a new business model - one that serves the enlightened consumer. ... There are countless consumers in the Internet community willing to invest in long-term relationships with various artists or manufacturers. All they require is that it is on their terms.

Case in point (via Anil Dash):

Since the release of Give Up early last year, Sub Pop records has offered the Postal Service's two lead singles available as free downloads on their website, and they've sold more than 300,000 copies of their album. Despite the fact that the songs have been downloaded, for free, 1.5 million times since then, Such Great Heights and The District Sleeps Alone have both been in the top 100, sometimes at the same time, on the iTunes Music Store for the past several months.

The path I took to buying Give Up? I downloaded the free files, liked what I heard, read about the band, wanted to support them — and bought the CD.


Rathergate

It's been interesting to follow the whole CBS/Dan Rather hoo-ha. Jeff Jarvis has posted an excellent résumé on his blog. The analysis and timeline that Ernest Miller has posted at Many2Many offers important, and fuller, commentary. He says there:

It is disappointing to me that the major media has been mostly silent in their condemnation of CBS's response to this scandal. Even granting, against reason, that there remains a serious debate about the authenticity of the documents, and that CBS's "checks and balances" for vetting this story were sufficient, the response of CBS to its critics has been outrageous. Where are the outraged calls for more transparency on the part of CBS News from the editorial boards of the New York Times, Washington Post, Chicago Tribune or Wall Street Journal? Why haven't anchors of the other networks called for CBS to establish an internal, or better yet, an external investigation into the issue? Any profession that won't police its own when members egregiously violate the fundamental tenets of that profession will very quickly lose all credibility. More importantly, the press plays a vital and critical role in forcing transparency on government. How effectively will the press be able to play that role if it adopts the stonewalling tactics of the government when it is subject to criticism? If our watchdogs cannot even watch themselves, the Fourth Estate will become ever more ineffective.

Jeff Jarvis subsequently asserted that this affair is 'bigger than Dan Rather. It's bigger than CBS. It's about journalism and Big Media and their relationship with the citizenry and democracy. It's about sharing authority with the people'. I'd like to believe this, but Steven Johnson has surely got it right:

I've been thrilled to see the team effort over the past ten days that toppled the CBS documents story. But could we have a brief reality check for just one split second? For all of you announcing that Rathergate is a watershed moment in the history of journalism, the moment when the swarm Davids finally outfoxed the big media Goliath -- remember that this was a story that was uniquely suited for the living-room journalism that flourishes in the blogging world. You didn't even need Google to crack this case: 95% of the relevant facts that proved the documents to be forged were available simply by switching applications. If there's a watershed here, it's this: from this day on, you can be sure that any time a national news story appears that revolves around Microsoft Word's auto-formatting features -- the blogosphere will OWN that story!

Think about the other major stories that broke in the last year or so involving misrepresentations or other abuses of power: the Plame Affair, Abu Ghraib, the whole missing-WMD madness. Did the bloggers contribute anything substantive to the reporting -- to the facts, not the opinions -- of those stories? No, because the central elements in those stories were not matters of typography; to advance them you couldn't just launch Microsoft Word or Google for "Niger documents." Until the blogosphere figures out a way to contribute to those kinds of stories -- and not just ones where a knowledge of font trivia makes you a genuine expert -- I think we'll still prove to be better at framing the news than making it ourselves.

Update, 5.10.2004: I've just seen this posting (14 September) by Matthew Yglesias —

I'm not quite sure I grasp all the blogosphere triumphalism surrounding the Killian memos. After CBS ran the story, the conservative side of the 'sphere came up with dozens of purported debunkings of their authenticity, almost all of which turned out to be more purported than debunking. Then after a few days of back-and-forth, traditional reporters at The Washington Post came out with a more careful, more accurate, more actually-debunking story. The folks at PowerLine and LGF are, at best, Gettier cases, they didn't do any of the actual debunking. Instead, it was done by reporters working for major papers. And good for them. And shame on CBS. But I don't really see what the blogs had to do with it.

Towards a digital media home

How I appreciated Adam Bosworth's post, PCs and media revamped! With an impending move of sorts (this Wednesday), I'm trying to sort out the best way to rig up a media system for the new house. Like Adam, my media is increasingly stored on hard drives, but I'm sure he's not right when he says that this is the way it is for 'everybody else' — not yet, not in the UK, at any rate.

He has a lot to say, including some good things about iPods as remote controls, but it's this that really echoed my experience:

I go into a PC store and they don't understand amps or speakers and try to push little tiny tinny ones on me. They don't understand wireless at all. I go into a stereo store and they think I still listen to sound from CD's and video from cable or DVD's when again, I typically want it on my disks. They will sell me a flatscreen for $5K or more without blinking, but ask for a 1 terabyte hard disk which should be <$1K and everyone just stares. One terabyte can store a LOT of movies and songs and shows. I also want great recording quality on my sound and the stereo stores seem to know nothing about this.

A9 and 'the Wright brothers phase of search technology'

Much lately about search engines. A9, newly emerged from beta, has been summarised by Lance Arthur, posting at glassdog, as: depth of search (Yahoo) + more meaningful results (Google) + personalisation (A9). As Lance notes, A9 also enables you to see different kinds of results in the same page, separated into panes. More importantly,

(A9) keeps track of what you’re doing, providing feedback and reminding you about which results you used in past queries; it also watches everyone else’s clicks and gives you feedback about what other people did with the results.

The success of this “Discover” feature (your results are called “Recover”) depends a great deal on how many people adopt A9 as their search engine of preference, of course, but over time it may be the one thing that separates the search winners from the also-rans.

Udi Manber, the presiding magus behind A9, calls this 'a search engine with memory' (NYT). There are questions: as John Battelle has asked, even as he celebrates the arrival of A9, 'will people get the habit of using it?'. Then, we have to wonder why Amazon is doing this, taking Google search results and building this new front end. Search engines and shopping go hand in hand, of course, but A9 goes beyond any narrow, Amazon-centred purpose and is fully able to compete, as a distinctive search tool, alongside Google, Yahoo and Microsoft.

Response to A9 has been generally very good: see, eg, Future Now and Marc Canter ('What I see is an adaptable, dynamic interface which is really simple to use and understand'), and, to encourage the diffident, there's a discount for A9 customers shopping at Amazon. But not everyone has been so positive. Jeremy Zawodny had this to say:

... search is a skill but it really shouldn't be. The Microsoft research is shining a light on this fact. Our software needs to work harder to pay attention to and react to what we're doing—especially when we're failing! If only the search engine could stop after a few tries and say, "hey, I'm guessing that you're looking for something like..." You know, just like any reasonably bright librarian might. (You do remember libraries, don't you?) Yeah, it'd probably freak some people out, but what if it actually was helpful?

Amazon's A9 is an interesting step in evolving search, but it really seems to be going in a different direction. Rather than making search a "lean and mean" operation the way that Google had, A9 is trying to make searching the web a different kind of experience. They're encouraging exploration while also trying to tie in your previous behavior (past queries).

Elsewhere, Jeremy Zawodny added, 'if we must search, I believe it should be a very natural and conversational thing' (and see Greg Linden's remark to the same effect). Trouble is, as John Battelle says, 'It's really hard to do. Such an approach to results works particularly well with limited and/or structured data sets (ie "I see you're looking for a movie. Did you want a comedy or a drama?") but not so hot with horizontal, unstructured data.'

However, that doesn't mean folks aren't working on it (or that some engines, like Teoma or AllTheWeb, don't have some solutions already, and Yahoo's "Also Try..." is close as well). The problem is that it's hard to make the choices presented relevant enough of the time - so that overall, the service is really, really useful, as opposed to often right, but often also wrong.

In any event, and pace Jeremy, I want a variety of approaches from search engines, depending on the task, my mood, my deadline, etc, but I would also love to have what Ramesh Jain calls a 'steering wheel' (see another John Battelle post): 'If I can be shown how the items are distributed in time and space, I can start controlling what I want to see over this time period or what I want to see in that space'. (In the last of this bumper crop of Battelle links, it's well worth reading his brief comment on the purchase of Furl by Looksmart and his piece on Raymie Stata.)

How all this will be pursued by Google is a matter of some anticipation. Google needs to do something about the shortfalls in its current search engine (there's an interesting post on this at Google Blogoscoped). Will an improved Google search engine come together with the also much talked about Gbrowser (Jason Kottke, blogzilla, Ars Technica) and the anticipated move towards more fully integrated services? On this note, and closer to home, I long for a decent local computer search engine, something we've been hearing for a while now that Google intends to roll out : X1 is powerful, but locks up my ThinkPad's CPU for inconveniently long periods of time, and I haven't found Copernic desktop search much kinder ... I am playing with Quicksilver on my office Mac: as Merlin Mann says, 'Quicksilver is moving well beyond its modest roots as an application launcher' and Tiger will bring further benefits through 'access to a very well structured data source (Spotlight) and a whole slew of new action possibilities (Automator)' (see here). This might become another reason for looking at a PowerBook as my next machine.