- Home
- Aaron Swartz
The Boy Who Could Change the World Page 4
The Boy Who Could Change the World Read online
Page 4
It’s been eight years since Tim Berners-Lee published his Semantic Web Roadmap and it’s difficult to deny that things aren’t exactly going as planned. Actual adoption of Semantic Web technologies has been negligible and nothing that promises to change that appears on the horizon. Meanwhile, the million-dollar-code people have not fared much better. Google has been able to launch a handful of very targeted features, like music search and answers to very specific kinds of questions, but these are mere conveniences, far from changing the way we use the web.
By contrast, Wikipedia has seen explosive growth, Amazon.com has become the premier site for product information, and when people these days talk about user-generated content, they don’t even consider the individualized sense that the W3C and Google assume. Perhaps it’s time to try the third way out.
Wikimedia at the Crossroads
http://www.aaronsw.com/weblog/wikiroads
August 31, 2006
Age 19
A couple weeks ago I had the great privilege of attending Wikimania, the international Wikimedia conference. Hundreds from all over the world gathered there to discuss the magic that is Wikipedia, thinking hard about what it means and why it works. It was an amazing intellectual and emotional experience.
The main attraction was seeing the vibrant Wikipedia community. There were the hardcore Wikipedians, who spend their days reviewing changes and fixing pages. And there were the elder statesmen, like Larry Lessig and Brewster Kahle, who came to meet the first group and tell them how their work fits into a bigger picture. Spending time with all these people was amazing fun—they’re all incredibly bright, enthusiastic, and, most shockingly, completely dedicated to a cause greater than themselves.
At most “technology” conferences I’ve been to, the participants generally talk about technology for its own sake. If use ever gets discussed, it’s only about using it to make vast sums of money. But at Wikimania, the primary concern was doing the most good for the world, with technology as the tool to help us get there. It was an incredible gust of fresh air, one that knocked me off my feet.
There was another group attending, however: the people holding up the platform on which this whole community stands. I spent the first few days with the mostly volunteer crew of hackers who keep the websites up and running. In later days, I talked to the site administrators who exercise the power that the software gives them. And I heard much about the Wikimedia Foundation, the not-for-profit that controls and runs the sites.
Much to my surprise, this second group was almost the opposite of the first. With a few notable exceptions, when they were offstage they talked gossip and details: how do we make the code stop doing this, how do we get people to stop complaining about that, how can we get this other group to like us more. Larger goals or grander visions didn’t come up in their private conversations; instead they seemed absorbed by the issues of the present.
Of course, they have plenty to be absorbed by. Since January, Wikipedia’s traffic has more than doubled and this group is beginning to strain under the load. At the technical level, the software development and server systems are both managed by just one person, Brion Vibber, who appears to have his hands more than full just keeping everything running. The entire system has been cobbled together as the site has grown, a messy mix of different kinds of computers and code, and keeping it all running sounds like a daily nightmare. As a result, actual software development goes rather slowly, which cannot help but affect the development of the larger project.
The small coterie of site administrators, meanwhile, are busy dealing with the ever-increasing stream of complaints from the public. The recent Seigenthaler affair, in which the founding editor of USA Today noisily attacked Wikipedia for containing a grievous error in its article on him, has made people very cautious about how Wikipedia treats living people. (Although to judge just from the traffic numbers, one might think more such affairs might be a good idea . . . One administrator told me how he spends his time scrubbing Wikipedia clean of unflattering facts about people who call the head office to complain.
Finally, the Wikimedia Foundation Board seems to have devolved into inaction and infighting. Just four people have been actually hired by the Foundation, and even they seem unsure of their role in a largely volunteer community. Little about this group—which, quite literally, controls Wikipedia—is known by the public. Even when they were talking to dedicated Wikipedians at the conference, they put a public face on things, saying little more than “Don’t you folks worry, we’ll straighten everything out.”
The plain fact is that Wikipedia’s gotten too big to be run by just a couple of people. One way or another, it’s going to have to become an organization; the question is what kind. Organizational structures are far from neutral: whose input gets included decides what actions get taken, the positions that get filled decide what things get focused on, the vision at the top sets the path that will be followed.
I worry that Wikipedia, as we know it, might not last. That its feisty democracy might ossify into staid bureaucracy, that its innovation might stagnate into conservatism, that its growth might slow to stasis. Were such things to happen, I know I could not just stand by and watch the tragedy. Wikipedia is just too important—both as a resource and as a model—to see fail.
That is why, after much consideration, I’ve decided to run for a seat on the Wikimedia Foundation’s Board. I’ve been a fairly dedicated Wikipedian since 2003, adding and editing pages whenever I came across them. I’ve gone to a handful of Wikipedia meetups and even got my photo on the front page of the Boston Globe as an example Wikipedian. But I’ve never gotten particularly involved in Wikipedia politics—I’m not an administrator, I don’t get involved in policy debates, I hardly even argue on the “talk pages.” Mostly, I just edit.
And, to be honest, I wish I could stay that way. When people at Wikimania suggested I run for a Board seat, I shrugged off the idea. But since then, I’ve become increasingly convinced that I should run, if only to bring attention to these issues. Nobody else seems to be seriously discussing this challenge.
The election begins today and lasts three weeks. As it rolls on, I plan to regularly publish essays like this one, examining the questions that face Wikipedia in depth. Whether I win or not, I hope we can use this opportunity for a grand discussion about where we should be heading and what we can do to get there. That said, if you’re an eligible Wikipedian, I hope that you’ll please vote for me.
Who Writes Wikipedia?
http://www.aaronsw.com/weblog/whowriteswikipedia
September 4, 2006
Age 19
I first met Jimbo Wales, the face of Wikipedia, when he came to speak at Stanford. Wales told us about Wikipedia’s history, technology, and culture, but one thing he said stands out. “The idea that a lot of people have of Wikipedia,” he noted, “is that it’s some emergent phenomenon—the wisdom of mobs, swarm intelligence, that sort of thing—thousands and thousands of individual users each adding a little bit of content and out of this emerges a coherent body of work.” But, he insisted, the truth was rather different: Wikipedia was actually written by “a community . . . a dedicated group of a few hundred volunteers” where “I know all of them and they all know each other.” Really, “it’s much like any traditional organization.”
The difference, of course, is crucial. Not just for the public, who wants to know how a grand thing like Wikipedia actually gets written, but also for Wales, who wants to know how to run the site. “For me this is really important, because I spend a lot of time listening to those four or five hundred and if . . . those people were just a bunch of people talking . . . maybe I can just safely ignore them when setting policy” and instead worry about “the million people writing a sentence each.”
So did the Gang of 500 actually write Wikipedia? Wales decided to run a simple study to find out: he counted who made the most edits to the site. “I expected to find something like an 80-20 rule: 80% of the work being don
e by 20% of the users, just because that seems to come up a lot. But it’s actually much, much tighter than that: it turns out over 50% of all the edits are done by just .7% of the users . . . 524 people. . . . And in fact the most active 2%, which is 1400 people, have done 73.4% of all the edits.” The remaining 25% of edits, he said, were from “people who [are] contributing . . . a minor change of a fact or a minor spelling fix . . . or something like that.”
Stanford wasn’t the only place he’s made such a claim; it’s part of the standard talk he gives all over the world. “This is the group of around a thousand people who really matter,” he told us at Stanford. “There is this tight community that is actually doing the bulk of all the editing,” he explained at the Oxford Internet Institute. “It’s a group of around a thousand to two thousand people,” he informed the crowd at GEL 2005. These are just the three talks I watched, but Wales has given hundreds more like them.
At Stanford the students were skeptical. Wales was just counting the number of edits—the number of times a user changed something and clicked save. Wouldn’t things be different if he counted the amount of text each user contributed? Wales said he planned to do that in “the next revision” but was sure “my results are going to be even stronger,” because he’d no longer be counting vandalism and other changes that later got removed.
Wales presents these claims as comforting. Don’t worry, he tells the world, Wikipedia isn’t as shocking as you think. In fact, it’s just like any other project: a small group of colleagues working together toward a common goal. But if you think about it, Wales’s view of things is actually much more shocking: around a thousand people wrote the world’s largest encyclopedia in four years for free. Could this really be true?
Curious and skeptical, I decided to investigate. I picked an article at random (“Alan Alda”) to see how it was written. Today the Alan Alda page is a pretty standard Wikipedia page: it has a couple photos, several pages of facts and background, and a handful of links. But when it was first created, it was just two sentences: “Alan Alda is a male actor most famous for his role of Hawkeye Pierce in the television series MASH. Or [sic] recent work, he plays sensitive male characters in drama movies.” How did it get from there to here?
Edit by edit, I watched the page evolve. The changes I saw largely fell into three groups. A tiny handful—probably around 5 out of nearly 400—were “vandalism”: confused or malicious people adding things that simply didn’t fit, followed by someone undoing their change. The vast majority, by far, were small changes: people fixing typos, formatting, links, categories, and so on, making the article a little nicer but not adding much in the way of substance. Finally, a much smaller amount were genuine additions: a couple sentences or even paragraphs of new information added to the page.
Wales seems to think that the vast majority of users are just doing the first two (vandalizing or contributing small fixes) while the core group of Wikipedians writes the actual bulk of the article. But that’s not at all what I found. Almost every time I saw a substantive edit, I found the user who had contributed it was not an active user of the site. They generally had made less than 50 edits (typically around 10), usually on related pages. Most never even bothered to create an account.
To investigate more formally, I purchased some time on a computer cluster and downloaded a copy of the Wikipedia archives. I wrote a little program to go through each edit and count how much of it remained in the latest version.* Instead of counting edits, as Wales did, I counted the number of letters a user actually contributed to the present article.
The details: I downloaded a copy of the enwiki-20060717-pages-meta-history.xml.bz2 archive, broke it up into pages, iterated over the revisions and recursively applied Python’s difflib.SequenceMatcher.find _ longest _ match to each revision and the latest revision. (I used find _ longest _ match instead of get _ matching _ blocks because get _ matching _ blocks didn’t properly handle blocks being reordered.) I only counted the characters which hadn’t already been matched by an earlier revision.
If you just count edits, it appears the biggest contributors to the Alan Alda article (7 of the top 10) are registered users who (all but 2) have made thousands of edits to the site. Indeed, #4 has made over 7,000 edits while #7 has over 25,000. In other words, if you use Wales’s methods, you get Wales’s results: most of the content seems to be written by heavy editors.
But when you count letters, the picture dramatically changes: few of the contributors (2 out of the top 10) are even registered and most (6 out of the top 10) have made less than 25 edits to the entire site. In fact, #9 has made exactly one edit—this one! With the more reasonable metric—indeed, the one Wales himself said he planned to use in the next revision of his study—the result completely reverses.
I don’t have the resources to run this calculation across all of Wikipedia (there are over 60 million edits!), but I ran it on several more randomly selected articles and the results were much the same. For example, the largest portion of the “Anaconda” article was written by a user who only made 2 edits to it (and only 100 on the entire site). By contrast, the largest number of edits were made by a user who appears to have contributed no text to the final article (the edits were all deleting things and moving things around).
When you put it all together, the story becomes clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site—the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it’s the outsiders who provide nearly all of the content.
And when you think about it, this makes perfect sense. Writing an encyclopedia is hard. To do anywhere near a decent job you have to know a great deal of information about an incredibly wide variety of subjects. Writing so much text is difficult, but doing all the background research seems impossible.
On the other hand, everyone has a bunch of obscure things that, for one reason or another, they’ve come to know well. So they share them, clicking the edit link and adding a paragraph or two to Wikipedia. At the same time, a small number of people have become particularly involved in Wikipedia itself, learning its policies and special syntax, and spending their time tweaking the contributions of everybody else.
Other encyclopedias work similarly, just on a much smaller scale: a large group of people write articles on topics they know well, while a small staff formats them into a single work. This second group is clearly very important—it’s thanks to them encyclopedias have a consistent look and tone—but it’s a severe exaggeration to say that they wrote the encyclopedia. One imagines the people running Britannica worry more about their contributors than their formatters.
And Wikipedia should too. Even if all the formatters quit the project tomorrow, Wikipedia would still be immensely valuable. For the most part, people read Wikipedia because it has the information they need, not because it has a consistent look. It certainly wouldn’t be as nice without one, but the people who (like me) care about such things would probably step up to take the place of those who had left. The formatters aid the contributors, not the other way around.
Wales is right about one thing, though. This fact does have enormous policy implications. If Wikipedia is written by occasional contributors, then growing it requires making it easier and more rewarding to contribute occasionally. Instead of trying to squeeze more work out of those who spend their life on Wikipedia, we need to broaden the base of those who contribute just a little bit.
Unfortunately, precisely because such people are only occasional contributors, their opinions aren’t heard by the current Wikipedia process. They don’t get involved in policy debates, they don’t go to meetups, and they don’t hang out with Jimbo Wales. And so things that might help them get pushed on the back burner, assuming they’re even proposed.
Out of sight is ou
t of mind, so it’s a short hop to thinking these invisible people aren’t particularly important. Thus Wales’s belief that 500 people wrote half an encyclopedia. Thus his assumption that outsiders contribute mostly vandalism and nonsense. And thus the comments you sometimes hear that making it hard to edit the site might be a good thing.
“I’m not a wiki person who happened to go into encyclopedias,” Wales told the crowd at Oxford. “I’m an encyclopedia person who happened to use a wiki.” So perhaps his belief that Wikipedia was written in the traditional way isn’t surprising. Unfortunately, it is dangerous. If Wikipedia continues down this path of focusing on the encyclopedia at the expense of the wiki, it might end up not being much of either.
Who Runs Wikipedia?
http://www.aaronsw.com/weblog/whorunswikipedia
September 7, 2006
Age 19
During Wikimania, I gave a short talk proposing some new features for Wikipedia. The audience, which consisted mostly of programmers and other high-level Wikipedians, immediately began suggesting problems with the idea. “Won’t bad thing X happen?” “How will you prevent Y?” “Do you really think people are going to do Z?” For a while I tried to answer them, explaining technical ways to fix the problem, but after a couple rounds I finally said:
Stop.
If I had come here five years ago and told you I was going to make an entire encyclopedia by putting up a bunch of web pages that anyone could edit, you would have been able to raise a thousand objections: It will get filled with vandalism! The content will be unreliable! No one will do that work for free!