Digital archives

History, our future

… no civilization has ever saved everything; acknowledging that fact does not obviate the need to try and save as much as we can. — A Working Library

Way back in 2006, I heard Chris talk, demoing Nokia’s mobile web server. I loved that and in my imagination it combined with the idea of owning your own data. Imagine carrying your own data with you, the canonical copy of everything digital that’s you, serving it from your mobile device. (There was a newspaper picture I saw, during the terrible years after Yugoslavia disintegrated, of a refugee family carrying their hard drives stashed around their van and in their bags and coats. They called themselves, I think — I’ve never been able to find the picture since — the first hi-tech refugees, carrying with them all their digital stuff.)

Owning your own data:

I’m building a solution, bit by bit. It’s certainly incomplete, and with rough edges … but iteratively improving as I find time and inspiration to work on it. I’d rather host my data and live with such awkwardness in the open than be a sharecropper on so many beautiful social content farms. — Tantek Çelik

I haven’t even the beginnings of the technical knowledge needed to follow that particular path (‘This is what I mean by “own your data”. Your site should be the source and hub for everything you post online. This doesn’t exist yet, it’s a forward looking vision, and I and others are hard at work building it. It’s the future of the indie web.’), though I’d dearly like to. If someone builds that, I’d buy it.

In 2008, at Open Tech, I heard Danny O’Brien talk, Living on the Edge (pdf), and read his posts on the same theme: 2008–07–16, and then Independence DayIntermediariesDeath by BoredomH-T-T-P, You Know MeReachability on the EdgeHow Many Nines Does One Person Need?. From Independence Day: ‘a trend you couldn’t help but notice in this latest overexcitement is migration of data from the edge to centralised servers. … I’m curious as to what happens when one tries to buck this trend. … how much of our life that we share with the Web 2.0 giants do we really *need* to share? How much of these services can and should we be running from the comfort of our own homes?’

The year before, Ben had written: ‘I’m living out of webapps at the moment: Google Docs, Gmail, Reader, Meebo and the like. It has been a revelation: these things work really well.’ (And see Matt Haughey, writing in April that year.) How long ago that seems now!

Discussion of the issues hasn’t ceased and, for the foreseeable future, how can it? Take John Naughton, writing earlier this year: ‘the components needed for a new, user-controlled architecture are beginning to fall into place. It’s still a bit geeky, but all it needs is a human-friendly front end’ (my italics). And last year’s speech by Eben Moglen (FreedomBox), Freedom in the Cloud. Or, Take Back the Tubes — A DIY Data Manifesto:

… the web will likely never be completely free of centralized services and Winer recognizes that. Most people will still choose convenience over freedom. Twitter’s user interface is simple, easy to use and works on half a dozen devices. Winer doesn’t believe everyone will want to be part of the distributed web, just the dedicated. But he does believe there are more people who would choose a DIY path if they realized it wasn’t that difficult.

For much of the last year, I’ve become preoccupied with archiving and preserving our data; ‘we are all curators, in the post-modern world, whether we want to be or not’ (William Gibson, 2001). mmmarilyn: ‘The one thing that differentiates human beings from all other creatures on Earth is the externalization of subjective memory—first through notches in trees, then through cave paintings, then through the written word and now, through databases of almost otherworldly storage and retrieval power.’  And then — YAHOO!LOCAUST. John Naughton again (from earlier still this year):

Think of the pleasure we get from old family photographs or the delight that comes from clearing out an attic and finding boxes of love letters, school reports, our first exercise books and old appointment diaries. The contemporary versions of these personal documents are mostly stored either on obsolescent PC hard drives or on the servers of internet companies …


The European Union says its member states must do more to digitize Europe’s cultural heritage and not simply leave that work to the private sector. To do otherwise, suggests a recently commissioned report, could steer Europe away from a digital Renaissance and “into a digital dark age.” — ReadWriteWeb, 2011

I’m no programmer, though decades ago I learned to use Fortran, writing my own program for an A level Biology project, and played with BASIC. Now, I’m playing with a Mac Mini server and a Pegasus R6. I want to know that we can hand on certain things … music, audio, photos, text and, increasingly important, video. History for the future.

Last Christmas, I was hoping we’d see some development in 2011 around the Mac Mini, though I suspected the game plan was more likely to be centred on the ecosystem that individuals, families and groups weave around multiple Apple devices. There’s room for both and it seems that Apple thinks so, too. I use cloud services a great deal, and this won’t stop as I play with creating our own, centralised repository of music, audio, photos, text and videos. I want our own backup and personally maintained server and store, but I know the cloud offers us so much, too.

In What if Flickr fails?, Doc Searls looked forward to ‘self-hosted versions of Flickr, or the equivalent’ but also to a future where we ‘pay more for what’s now free’:

I want them, and every other silo out there, to realize that the pendulum has now swung full distance in the silo’d direction — and that it’s going to swing back in the direction of open and distributed everything. And there’s plenty of money to be made there too.

Yes, indeed. If Apple gets it right with iCloud, I’ll happily pay for secure and really useful services in the cloud that respect my privacy and offer a level of backup and reliability that, even with all my best efforts, I’ll probably not (always) achieve at home. But I’ll hold them to the highest standards and aim not to have to miss a beat if it comes to moving to another service. Dave Winer:

The important thing is that you and your ideas live outside the silo and are ported into it at your pleasure. You never have to worry about getting your stuff out of the silo because it never lived in there in the first place.

Things my students might enjoy reading as they, too, wrestle with these matters:

Delicious II

Yahoo sunset 2010

That image from last December.

And after the “decision” to sunset, what? Rumour and speculation. And … a service that began to seem like a ghost town, as far as my network went.

So a little while ago (15 March), I moved over to Pinboard, decoupling, at last, the feeds for my Delicious account and this blog. Maybe Delicious will gain a good new owner and there’ll be life after Yahoo!. But I’m not banking on it.

Pinboard is reliable, fast and lithe. It’s incredibly easy to use and responsive to search. I’m using it far more than I had been Delicious — because it’s so quick to come back at me with the goods. But I miss the social, the enhanced chance of discovery. In Sticking With Delicious, Paul covered well the reasons why one might stay: ‘what’s always made Delicious most useful to me is its network pages in general, and mine in particular … [Pinboard] has a network, but you can only see your own, and friend finding is basically impossible’. (You can always ‘enter someone’s nick and see if they exist’ — The Post-Delicious World, of course, and there’s the independent Delicious → Pinboard username mapper.) Or, as Matt Haughey put it, ‘my Pinboard feed is personally useful, but socially uninteresting. And therein lies the rub … As a personal archive tool, it’s pretty impressive, as a shared space to find interesting bookmarks, it’s problematic. In the end, I’ll likely continue using Delicious to track bookmarks with Pinboard as a backup/archive tool that I’ll gladly continue to pay for’.

Well, time came to move on. And in truth, my network had mostly migrated to a number of other scattered sites, services and feeds.

(Previously, as they say, in Delicious (I), I picked out this by Paulsnagged via my Tumblr:

This fracturing of the network is a huge loss, no matter whether all the people you’re following wind up on the same service you do or otherwise.


Pinboard support is also fast — and personal (Maciej is patient, even with my stumblings). And I really like the way it aspires to archive not just pages but dependencies (find the post, ‘Bookmark Archives That Don’t’, dated 25 Nov, 2010: ‘in 2010 I don't believe it makes any sense to try to archive bookmarks if you’re not willing to resolve dependencies’). It’s sometimes proved better at this than Evernote.

Moving over, importing all my data from Delicious, was straightforward.

You can find me on Pinboard, or subscribe to my Pinboard feed


Every year when I teach our first years (Year 9) about ICT it’s often surprising what stands out as having changed. It’s life-as-we-once-knew-it, Jim, but now more or less of a piece with this digital stuff. We’ve made a new thing. Lots of new things.

But quick! Look after them! They’re vanishing even as we look.

Recent lessons have developed a focus around the web-and-culture, the web-as-culture. So, lesson 15 is all about the stuff James talked about at dConstruct last year: history, historiography, memory; archiving the internet; time, growth, loss, decay … hope. James’s talk is the focus.

Last Thursday, in the discussion about all that, I found other things suggested themselves and fell satisfyingly into place. I’ve added some of these to the lesson as a supplementary page: Lee’s deeply affecting talk at Reboot 9 about Kozarac; the Long Now’s Rosetta Project. But also things I haven’t put on that supplementary page: Yahoo! and Geocities (already in the original lesson 15) led on to Yahoo! and Delicious (I showed them Pinboard and we talked about backing up locally as well as in the cloud), and Yahoo! and Flickr (which Yahoo!’s CEO doesn’t use: ‘One of the most highly visible and trafficked Yahoo properties and you don’t even have an account there’).

That led on to a look at cloud-computing and the ways in which the Wikileaks story has made people readjust their view of providers (see my last post). It got me scrambling around to find this photo that I knew I’d squirrelled away on Tumblr a while back:

Jerry Yang at Congress

Gao Qin Sheng, mother of Shi Tao, a Chinese reporter sentenced to 10 years in prison for leaking state secrets, cries as Yahoo CEO Jerry Yang (left) testifies before a congressional committee hearing. (Photo: Reuters) — The Sydney Morning Herald (2007)

Wikileaks, Egypt … let’s not forget these lessons about cloud-computing and the responsibilities of global communications and cloud-computing providers.

And, as I find myself thinking more and more about archiving, memory and the digital, I really enjoyed Euan’s recent pieces: One small step (‘Goodness - a usenet search just stumbled upon my first ever experience, in 1995, of the power of the internet to make things easier’) and My first blog post. I hope Euan is happy if I re-blog the latter here (it’s so pertinent):

I knew I started blogging around this time of year in 2001 but thanks to a server crash in December 2001 I had no record of my first blog post. I tried The Wayback Machine but couldn’t remember the original url. I had tried various searches on Google and using Devon Agent but with no success but then I remembered that Ev Williams, who started Blogger and now Twitter, had made me a “Blog of Note” on the front page of Blogger in 2002. A search for that got me my old url and The Wayback Machine then came up with the goods.

So my first ever blog posts are preserved here and I began on the 3rd of March 2001. On day three I said:

“I started feeling a bit uneasy about this blog today. Who will ever read it and what will those who do think?”

Ten years later I am still wondering …..

URLs, permalinks, archives … preservation. It all matters so very much.

I’ve been pointing out to my Year 9 pupils the Facebook setting that lets you download your material to a local drive (thanks to Michael for pointing it out to me) — Account > Account settings > Download your information:

This tool lets you download a copy of your information, including your photos and videos, posts on your wall, all of your messages, your friend list and other content you have shared on your profile. Within this zip file, you will have access to your data in a simple, browseable manner. Learn More about downloading a copy of your information.

So many memories are held in Facebook — for now. Will these teenagers be grandparents with few photos of their teenage years to look back on, show and share? Back up, back up, back up.


Thinning out, tidying up. Books to Oxfam, books to booksellers. Analogue to digital.

Here’s something I’ve long wanted to consign to my outboard brain. In a book bought eight years ago and now on its way out, these words, attributed to an unnamed headmaster (but I think I know who it is — they’d be utterly characteristic of him):

… four questions to ask myself in any situation:
What are the facts?
What are the issues?
What am I going to do?
Who do I have to tell?

Teaching’s changed over the course of my life, becoming suppler and subtler, gentler and wiser. Kinder. Looking back, there was a lot of focus on “facts” and not always much sensitivity to issues. Facts often seemed to be the issues.

Schools, like families, are crucibles of intense engagement. Those four questions are a great way of collecting yourself in the rush of a crisis. They’ve been of help and they can live here now. The book can go.

This archiving business, though … Opening up the book to get that bit and put it down here, I found forgotten notes on index cards — one about the book, but others to do with a job interview I had nearly 10 years ago — and a post-it with a rather good quotation on it from … ? And now that it’s so easy to digitalise and store, what do I keep? When should you forget? What should be put clean away?

Guarding our data


The mass of personal information on government databases must be protected or public trust will be damaged, ministers are being warned.  Information Commissioner Richard Thomas says getting details wrong or mixing them up has huge costs to the people concerned, government and businesses.  Details should not be shared just because technology allows it …

Experts estimate that information about the average working adult in the UK is stored on 700 databases. They include information about people's health records, credit checks and household details. "Never before has the threat of intrusion to people's privacy been such a risk," said Mr Thomas. He said many databases were being used to good effect - such as systems for renewing car tax online rather than waiting in Post Office queues. But there can be problems, such as when the Criminal Records' Bureau mistakenly labelled thousands of people as criminals. …

There were severe consequences for people if information on (a) database was out-of-date, inaccurate, or given to the wrong people, he said. He pointed to the case of a father investigated by social services after his young daughter said he had "bonked" her - it turned out he had hit her on the head with an inflatable hammer. While social services had closed the file, police and health authority records were not updated and said the man had been suspected of child abuse.

Information Commissioner's Office; Annual Report, 2005–6 (pdf).

The BBC, Backstage … and what then?

Ben Metcalfe launched Backstage at Open Tech. His presentation can be downloaded here (PowerPoint) and an audio file can be downloaded via the Open Tech 05 site (the talk was one of those given in the Main Theatre). There's a very useful posting on Backstage about the raft of BBC News RSS feeds, including theme-led feeds.

What's happened to the conservative Auntie we grew up with? Earlier this year, Wired News carried a story entitled, 'The Beeb Shall Inherit the Earth' — by Cory: 'America's entertainment industry is committing slow, spectacular suicide, while one of Europe's biggest broadcasters -- the BBC -- is rushing headlong to the future, embracing innovation rather than fighting it. … With Backstage, BBC's online department takes all the goop in its content-management system -- breaking news, editorials and conferences -- and exposes it as a set of standard programming interfaces. Anyone who can hack a little Perl or Python can mix these into any kind of service they can imagine'. (Cory also sums up the BBC's developing relationship with amateur content providers, 'The BBC's news website is the first mainstream news-gathering organization in the Western world to solicit and give prominence to photographs and reporting provided by its visitors', and the Creative Archive — 'an attempt to digitize all the programming the BBC has commissioned, clear the copyrights and post it online with a Creative Commons-like license. This will allow Britons to download the BBC's content, distribute it and noncommercially remix it into their own films, music, gags, projects and school reports'.)

At Open Tech, I found myself sitting just across the aisle from Stef Magdalinski, author of Wikiproxy — the cause-of-origin of which was explained here (4 October, 2004):

News Online doesn’t engage with its users, it doesn’t provide tools that allow me, the licence payer, to slice and dice their stories, and by refusing to link from its body text, it fails to understand how hypertext works. Also, with its conservative link policy … that only connects the BBC to established brands, it snubs the wider web, the great teeming mass of creativity. Patrician is not authoritative. Aloof is not respected. Conservative and fearful is not engaging. The gap between the BBC’s utterly laudable self image and ambitions and delivery could not be any clearer than at News Online. Finally, by not really allowing user interaction or commenting, News Online forces that debate and activity away from its site, and out onto the wild wild web. I’ve known many people at the organisation since its very earliest days. There’s some incredible talent and ideas, and from what I hear, an equal amount of frustration at how difficult it is to get these ideas to fruition.

Wikiproxy is described by Stef Magdalinski as 'a proxy for the site, that does the following things: retrieves a page from News Online, and regexes out “Capitalised Phrases” and acronyms. It then tests these against a database of wikipedia topic titles. If the phrase is a topic in wikipedia, then it’s turned into a hyperlink; uses the technorati API to add a sidebar of links to blogs referencing the story. Now you can see who’s talking about the story from the story itself …'. And instead of suing him, the BBC went away and came back with Backstage.

Two of three Backstage-unleashed projects that Ben whipped through caught my attention (many more here):

I had to leave before the session in the late afternoon when Lee spoke about Headshift's work for the BBC 'that looked at how social tagging might work on BBC News to drive both social bookmarking and user-driven related stories'. This project (which Lee spoke about at Reboot, last month) strikes me as really interesting; there's more on it here and here.

So there's Backstage, BBC Open Source, the Creative Archive Licence Group ('The BBC, the bfi, Channel 4 and the Open University set up the Creative Archive Licence Group to make their archive content available for download under the terms of the Creative Archive Licence - a single, shared user licence scheme for the downloading of moving images, audio and stills') and also Action Network: 'The BBC runs Action Network as an open forum for people to influence issues they care about. Most of the content is written by the public and reflects their views. Content provided by the BBC is clearly marked'. Other signs of movement and change at the Beeb keep popping up: Mildly Diverting posted last week that the BBC had authorised the opportunity 'to watch 'The Mighty Boosh' on broadband. A WEEK BEFORE IT GOES OUT ON THE TELLY', and Paul Mason (Newsnight) blogged Gleneagles and G8 from outside the BBC.

What does all this amount to? To stick with Backstage for a moment, it is clearly a GOOD THING:

This is such a good idea and will, I hope, cement the BBC's leading role in innovating for public good within the mainstream media. It is the latest in a long line of developments that illustrate how the BBC has become a safe harbour for some clever people who are committed to building public value through online media. It also proves, I think, how the internet has revitalised the BBC's public service remit, which was previously becoming a bit lost amidst the management debates, multi-skilling and the growing obsession with competing with lower forms of commercial media. (Lee)

But before we all get too excited, both Lee and Lloyd Shepherd, Head of Development at Guardian Unlimited, have some cautionary words. The BBC is moving boldly, but hasn't declared itself a liberty hall. Under Backstage's terms of service,

You can’t redistribute BBC content; only the BBC can do that. And Backstage is an ideal way to encourage distribution of BBC content around the world (a fundamental tenet of the BBC’s public service charter) but click on a link and you’re back on a BBC page to look at the full content. The simple fact is that the BBC is not distributing full-text content by RSS; only headlines and snippets (this is even true of Backstage’s own RSS feeds). As the BBC itself has said, it expects 10 per cent of its website traffic to be coming from RSS by the end of this year. In other words, RSS is just another effective way of building audience and traffic, and Backstage is a very good way of getting BBC RSS feeds out into wider communities. (Lloyd Shepherd)

Nevertheless, and in the spirit of that well-worn line from Robert Frost's 'Mending Wall', Something there is that doesn't love a wall, something stirs. Stef Magdalinski said how he was 'inspired by meeting Jimmy Wales of', a venture and a vision that 'precisely illustrates how the collaborative, great unwashed web can create more value than ‘authoritative’ institutions', and it's great to see the BBC responding creatively, interpreting its remit in new ways for this new age. Definitely time not just to watch Auntie, then, but to join in.

'This vast, free and open system'

This caught my eye last week — Julian Bond posting about two stories:

Wayback Machine sued: DMCA
IFPI vs Heise vs Allofmp3

The first is about a law suit being brought in the USA where an old copy of a company's web site appears in the Wayback Machine. They are claiming copyright abuse using the much discredited DMCA. Crucially, they claim that old snapshots are available even though more recent snapshots have been prohibited via a robots.txt file that is being honoured. This is a problem that I've hit on Ecademy with Google where somebody has chosen to hide their profile from Google, but Google still maintains an entry in the index and a cached copy of the page from before they made the change.

The second is about a new law in Germany, where promoting a service which is illegal in Germany is also illegal. A German magazine website that specialises in copyright issues has a link in an article to Allofmp3, the Russian paid for music download site. They are being sued in Germany by the International Federation of the Phonographic Industry (IFPI). And this despite the fact that they have not yet brought a case *in Germany* proving that the Allofmp3 site is illegal under German law and within the German jurisdiction.

I looked into Allofmp3 in March of 2004, going so far as to ring The New Statesman to discuss their reporter's judgement that the site is legal. As Tom Armitage reported back then: 'All the music on the site is licensed by ROMS, the Russian Organisation for Multimedia and Digital Systems, and the assistant to the lawyer of ROMS assured music portal that: “the sites you mentioned conduct their business legally and are licensed by ROMS, in full accordance with Russian and international law“. … The site demonstrates a clear understanding of the internet and how best to exploit it, applying local copyright law to a global marketplace. Whether the record industry will be as impressed with it as the public remains to be seen.' Indeed.

I can't say Colin Greenwood was wowed when I told him about Allofmp3, either, but since then (November of last year) I think things have moved on and many musicians are reconsidering how to distribute their music, how the punter should pay for it and what rights the buyer should then have over his/her purchase. One of the reasons why Allofmp3 has attracted praise is because, as Tom Armitage pointed out, 'The files it provides have no digital rights management information attached to them. This means that there are no restrictions on how many times you copy or distribute the files once they’ve been downloaded. The files can be copied between an unlimited number of computers and electronic devices. It is still illegal to give the files to people who have not paid for them, but clearly feel they can trust their customers to keep the law, rather than potentially crippling the files they have paid for.'

And as for this nonsense about the Wayback Machine … I've long been a fan and user of it, but since the advent of Firefox and the wonderful extension from Kristof Polleunis that adds a right-click menu which 'allows you to check the page you are browsing or any link in the waybackmachine archive' — well, I use it many times a day and it's an indispensable tool. (There's another great extension there for Google cache, Gcache: 'It will add an entry called "Gcache This Page" to your contextmenu'.)

Julian winds up his post with a great comment and a terrific quotation from Doc Searls:

Like Doc Searls, I'm scared that this vast, free and open system will get tied down, monetized and ruined as more and more commercial and governmental interests try to control it.

This is what we are fighting, folks. The open and free marketplace the Internet provides is shortly going to look like the best darn mess of few-to-many distribution systems for "content" the world has ever known. It will not be the free and open marketplace it was in the first place, and should remain. The end-state will (be) a vast matrix of national and private silos and walled gardens, each a contained or filtered distribution environment. And most of us won't know what we missed, because it never quite happened.

More thoughts about DEVONthink

In January, I posted some thoughts and links about GemX TexNotes Pro and DEVONthink. Prentiss Riddle has recently added his thoughts about the latter:

DEVONthink got a lot of attention recently when science writer Steven Johnson wrote an NYT piece about it and similar tools, crediting them with helping him come up with the ideas that go into his work. But in two subsequent blog posts he convinced me that his techniques are not generalizable. He had a research assistant to copy quotes and marginalia from his reading into DEVONthink, and he says directly that its success depended on the quality and granularity of what he saved: “most of the entries are in a sweet spot where length is concerned: between 50 and 500 words. If I had whole eBooks in there, instead of little clips of text, the tool would be useless". Since I need a tool to manage larger, still undigested documents (i.e., PDFs I haven’t read yet), it wouldn’t work its magic for me. Furthermore, DEVONthink only supports a single hierarchical organizational structure without tags or bibliographic metadata. So I’m still looking for a personal library application.

Jamming with your computer

AKAV put me on to GemX TexNotes Pro:

I managed to find a tool supporting a highly stochastic writing process - by keeping track of all my random thoughts. It's highly interlinkable, easy to use and runs smooth so far. … The only thing I miss is integration with a Bib-tex database.

I've just started playing with TexNotes (Windows-only) and so far it looks very good. Like AKAV, I need something that can work with me as I jot down scattered thoughts, quotations and ideas that I know are interlinked and amount to a post, an article or a book.

, a Mac-only program, is also very interesting but seems to go way beyond what TexNotes can do (amongst other things, it's a freeform database). On his blog, Steve Johnson explains a great working relationship he has evolved with this program, and in the NYT he suggests,

… 2005 may be the year when tools for thought become a reality for people who manipulate words for a living, thanks to the release of nearly a dozen new programs all aiming to do for your personal information what Google has done for the Internet. These programs all work in slightly different ways, but they share two remarkable properties: the ability to interpret the meaning of text documents; and the ability to filter through thousands of documents in the time it takes to have a sip of coffee. Put those two elements together and you have a tool that will have as significant an impact on the way writers work as the original word processors did. … These tools are smart enough to get around the classic search engine failing of excessive specificity: searching for ''dog'' and missing all the articles that have only ''canine'' in them. Modern indexing software learns associations between individual words, by tracking the frequency with which words appear near each other.

And this, from his blog, about his 'digital research library': 'When you're freewheeling through ideas that you yourself have collated -- particularly when you'd long ago forgotten about them -- there's something about the experience that seems uncannily like freewheeling through the corridors of your own memory. It feels like thinking.' And a tantalising prospect: 'The other thing that would be fascinating would be to open up these personal libraries to the external world. That would be a lovely combination of old-fashioned book-based wisdom, advanced semantic search technology, and the personality-driven filters that we've come to enjoy in the blogosphere.'

Cory has a fine, general comment on Steve Johnson's use of DEVONthink:

… his computer jams with him, suggesting neat tangents to his subjects. It's a great example of good computer-human interaction, where computers are used to programatically count and compare quantifiable elements (word and phrase frequencies) and human beings are used to pass judgement on the output of the computers. People are good at understanding and crap at counting; computers are just the reverse.

Wikipedia, Google and open access to knowledge

Is Truth the first victim?

Tech Central Station:

The user who visits Wikipedia to learn about some subject, to confirm some matter of fact, is rather in the position of a visitor to a public restroom. It may be obviously dirty, so that he knows to exercise great care, or it may seem fairly clean, so that he may be lulled into a false sense of security. What he certainly does not know is who has used the facilities before him.

Robert McHenry is Former Editor in Chief, Encyclopædia Britannica


To the Editor:

Re ''Google Is Adding Major Libraries to Its Database'' (front page, Dec. 14)

While having online access to some great libraries promises to facilitate research in democratizing access to books, it is worth keeping some things in mind. A digital version of a book -- especially a rare one, printed centuries ago -- is not a replacement for the hard copy. Not only has printed paper proved a durable technology, but there is also much to be gained by visiting the libraries, examining the actual books and entering into discussions with librarians and other researchers. Gaining access to a digital reproduction of an older text makes it easier to take a first step, but little good research will be done simply sitting alone in front of a computer screen.

Lisa Shapiro
Vancouver, British Columbia
Dec. 14, 2004

The writer is an assistant professor of philosophy at Simon Fraser University

The gatekeepers are enraged, a priesthood agitated once again, and it's an easy spectacle to enjoy. But there are difficult issues, too. Larry Sanger (link via Many2Many), formerly of Wikipedia and its co-founder:

… the following must be taken in the spirit of someone who knows and supports the mission and broad policy outlines of Wikipedia very well. First problem: lack of public perception of credibility, particularly in areas of detail. … regardless of whether Wikipedia actually is more or less reliable than the average encyclopedia, it is not perceived as adequately reliable by many librarians, teachers, and academics. The reason for this is not far to seek: those librarians etc. note that anybody can contribute and that there are no traditional review processes. … there are a great many benefits that accrue from robust credibility to the public. One benefit, but only one, is support and participation by academia. Second problem: the dominance of difficult people, trolls, and their enablers. … A few of the project's participants can be, not to put a nice word on it, pretty nasty. And this is tolerated. So, for any person who can and wants to work politely with well-meaning, rational, reasonably well-informed people--which is to say, to be sure, most people working on Wikipedia--the constant fighting can be so off-putting as to drive them away from the project. The root problem: anti-elitism, or lack of respect for expertise. There is a deeper problem--or I, at least, regard it as a problem--which explains both of the above-elaborated problems. Namely, as a community, Wikipedia lacks the habit or tradition of respect for expertise. As a community, far from being elitist (which would, in this context, mean excluding the unwashed masses), it is anti-elitist (which, in this context, means that expertise is not accorded any special respect, and snubs and disrespect of expertise is tolerated).

Larry Sanger is now 'on the academic job market'. I can't believe he hasn't discovered for himself how trolls and difficult people are quite fully enough represented in academia — a glance almost any week at the Letters pages of the TLS, LRB, etc will make that clear. Wikipedia has no monopoly in that market.

Clay Shirky takes each of Sanger's points and deals with them fairly but firmly. He says:

Of course librarians, teachers, and academics don’t like the Wikipedia. It works without privilege, which is inimical to the way those professions operate. This is not some easily fixed cosmetic flaw, it is the Wikipedia’s driving force. … The physical book, the hushed tones, the monastic dedication, and (unspoken) the barriers to use, these are all essential characteristics of the academy today. It’s not that it doesn’t matter what academics think of the Wikipedia — it would obviously be better to have as many smart people using it as possible. The problem is that the only thing that would make the academics happy would be to shoehorn it into the kind of filter, then publish model that is broken, and would make the Wikipedia broken as well. …

(Wikipedia) is valuable as a site of argumentation and as a near-real-time reference, functions a traditional encyclopedia isn’t even capable of. (Where, for example, is Brittanica’s reference to the Indian Ocean tsunami?) The Wikipedia is an experiment in social openness, and it will stand or fall with the ability to manage that experiment. Whining like Sanger’s really only merits one answer: the Wikipedia makes no claim to expertise or authority other than use-value, and if you want to vote against it, don’t use it. Everyone else will make the same choice for themselves, and the aggregate decisions of the population will determine the outcome of the project. And 5 years from now, when the Wikipedia is essential infrastructure, we’ll hardly remember what the fuss was about.

The best thing on this vexed question of authority that I've read in this whole debate is from Collin Brooke (I've cited it here before):

... credibility is something you earn and develop, not something you simply have. When we ask our students to do research and to prepare the results in written form, we are teaching them to earn credibility through breadth and depth of research. You don't earn credibility by citing an "authoritative source," whatever that means. You earn it by testing your sources against one another, understanding what the reasons are for differences of opinion, and figuring out how to resolve them or to choose among positions, etc. In other words, authority should be something that each of us assigns to our sources, not the other way around. It is the result of research, not a prerequisite.

Which goes to support Danah Boyd's view (and she writes as a contributor to and user of Wikipedia):

i do not consider it to be equivalent to an encyclopedia. I believe that it lacks the necessary research and precision. The lack of talent and practice mostly comes from the fact that most entries have limited contributers. Wikipedia is often my first source, but never my last, particularly in contexts where i need to be certain of my facts. Wikipedia is exceptionally valuable to read about multiple sides to a story, particularly in historical contexts, but i don't trust alternative histories any more than i trust privileged ones. … I don't believe that the goal should be 'acceptance' so much as recognition of what Wikipedia is and what it is not. It will *never* be an encyclopedia, but it will contain extensive knowledge that is quite valuable for different purposes.

(See also Slashdot.)