- Home
- Aaron Swartz
The Boy Who Could Change the World Page 6
The Boy Who Could Change the World Read online
Page 6
One presentation was by a usability expert who told us about a study done on how hard people found it to add a photo to a Wikipedia page. The discussion after the presentation turned into a debate over whether Wikipedia should be easy to use. Some suggested that confused users should just add their contributions in the wrong way and more experienced users would come along to clean their contributions up. Others questioned whether confused users should be allowed to edit the site at all—were their contributions even valuable?
As a programmer, I have a great deal of respect for the members of my trade. But with all due respect, are these really decisions that the programmers should be making?
Meanwhile, Jimbo Wales also has a for-profit company, Wikia, which recently received $4 million in venture capital funding. Wales has said, including in his keynote speech at Wikimania, that one of the things he hopes to spend it on is hiring programmers to improve the Wikipedia software.
This is the kind of thing that seems like a thoughtful gesture if you think of the software as neutral—after all, improvements are improvements—but becomes rather more problematic if technical choices have political effects. Should executives and venture capitalists be calling the shots on some of these issues?
The Wikipedia community is enormously vibrant and I have no doubt that the site will manage to survive many software changes. But if we’re concerned about more than mere survival, about how to make Wikipedia the best that it can be, we need to start thinking about software design as much as we think about the rest of our policy choices.
False Outliers
http://www.aaronsw.com/weblog/writefp
September 5, 2006
Age 19
So far my Wikipedia script has churned through about 200 articles, calculating who wrote what in each. This morning I looked through them to see if there were any that didn’t match my theory. It printed out a couple and I decided to investigate.
The first it found was “Alkane,” a long technical article about acyclic saturated hydrocarbons that it said was largely written by Physchim62. Yesterday a good friend was telling me that he thought long technical articles were likely written by a single person, so I immediately thought that here was the proof that he was right. But, just to check, I decided to look in the edit history to make sure my script hadn’t made an error.
It hadn’t, I found, but once again simply looking at the numbers missed the larger point. Physchim62 had indeed contributed most of the article, but according to the edit comments, it was by translating the German version! I don’t have the German data, but presumably it was written in the same incremental way as most of the articles in my study.
The next serious case was “Characters in Atlas Shrugged,” which the script said was written by CatherineMunro. Again, it seemed plausible that one person could have written all those character bios. But again, an investigation into the actual edit history found that Munro hadn’t written them; instead she’d copied them from a bunch of subpages, merging them into one bigger page.
The final serious example was “Anchorage, Alaska,” which appeared to have been written by JeffreyAllen1975. Here the contributions seemed quite genuine; JeffreyAllen1975 made tons of edits, each contributing a paragraph at a time. The work seemed to take quite a toll on him; at his user page he noted, “I just got burned-out and tired of the online encyclopedia. My time is being taken away from me by being with Wikipedia.” He lasted about four months.
Still, something seemed fishy about JeffreyAllen1975, so I decided to investigate further. Currently, the “Anchorage” page has a tag noting that “The current version of the article or section reads like an advertisement.” A bit of Googling revealed why: JeffreyAllen1975’s contributions had been copied and pasted from other websites, like the Anchorage Chamber of Commerce (“Anchorage’s public school system is ranked among the best in the nation. . . . The district’s average SAT and ACT College entrance exam scores are consistently above the national average and Advanced Placement courses are offered at each of the district’s larger high schools”).
I suspect JeffreyAllen1975 didn’t know what he was doing. His writing style suggests he’s just a kid: “In my free time, I am very proud of my-self by how much I’ve learned by making good edits on Wikipedia articles.” I’m pretty sure he just thought he was helping the project: “Wikipedia is like the real encyclopedia books (A through Z) that you see in the library, but better.” But his plagiarism will still have to be removed.
When I started, just looking at the numbers these seemed to be several cases that strongly contradicted my theory. And had I just stuck to looking at the numbers, I would have believed that to be the case as well. But, once again, investigation shows the picture to be far more interesting: translation, reorganization, and plagiarism. Exciting stuff!
(The Dandy Warhols) Come Down
http://www.aaronsw.com/weblog/comedown
September 22, 2006
Age 19
Well, the Wikipedia election has finally ended. The good news is that I can now talk about other things again. (For example, did you know that Erik Möller eats babies?) I have a backlog of about 20 posts that I built up over the course of the election. But instead of springing them on you all at once, I’ll try to do daily posting again starting Monday. (Oooh.)
The actual results haven’t been announced yet (and probably won’t be for another couple days, while they check the list of voters for people who voted twice) but my impression is that I probably lost. Many wags have commented on how my campaign was almost destined to lose: I argued that the hard-core Wikipedia contributors weren’t very important, but those were precisely the people who could vote for me—in other words, I alienated my only constituency.
“Aaron Swartz: Why is he getting so much attention?” wrote fellow candidate Kelly Martin. “The community has long known that edit count is a poor measure of contributions.” Others, meanwhile, insisted my claims were so obviously wrong as to not be even worth discussing.
Jimbo Wales, on the other hand, finally sent me a nice message the other day letting me know that he’d removed the offending section from his talk and looked forward to sitting down with me and investigating the topic more carefully.
And for my part, I hope to be able to take up some of the offers I’ve received for computer time and run my algorithm across all of Wikipedia and publish the results in more detailed form. (I’d also like to use the results to put up a little website where you can type in the name of a page and see who wrote what, color-coded or something like that.)
As for the election itself, it’s much harder to draw firm conclusions. It’s difficult in any election, this one even more so because we have so little data—no exit polls or phone surveys or even TV pundits to rely upon. Still, I’m fairly content seeing the kind words of all the incredible people I respect. Their support means a great deal to me.
The same is true of the old friends who wrote in during my essays along with all the new people who encouraged me to keep on writing. Writing the essays on a regular schedule was hard work—at one point, after sleeping overnight at my mother’s bedside in the hospital, I trundled down at seven in the morning to find an Internet connection so I could write and post one—but your support made it worth the effort.
I hope that whoever wins takes what I’ve written into consideration. I’m not sure who that is yet, but there are some hints. I was reading an irreverent site critical of Wikipedia when I came across its claim that Jimbo Wales had sent an email to the Wikipedia community telling them who they should vote for. I assumed the site had simply made it up to attack Jimbo, but when I searched I found it really was genuine:
I personally strongly strongly support the candidacies of Oscar and Mindspillage.
[. . .]
There are other candidates, some good, but at least some of them are entirely unacceptable because they have proven themselves repeatedly unable to work well with the community.
For those reading
the tea leaves, this suggests that the results will be something like: Eloquence, Oscar, Mindspillage. But we’ll see.
The letdown after the election is probably not the best time to make plans, but if I had to, I’d probably decide to stay out of Wikipedia business for a while. It’s a great and important project, but not the one for me.
Anyway, now everyone can go back to vandalizing my Wikipedia page. Laters.
Up with Facts: Finding the Truth in WikiCourt
http://www.aaronsw.com/weblog/001175
February 19, 2004
Age 17
I’m an optimist. I believe that statements like “Bush went AWOL” or “Gore claims to have invented the Internet” can be evaluated and decided pretty much true or false. (The conclusion can be a little more nuanced, but the important thing is that there’s a definitive conclusion.)
And even crazier, I believe that if there was a fair and accurate system for determining which of these things were lies, people would stop repeating the lies. I would certainly try to. No matter how much I wanted to believe “Dean’s state record sealing was normal” or “global warming does exist,” if a fair system had decided against it, I would stop.
And perhaps most crazy of all, I want to stop repeating falsehoods. I believe the truth is more important than particular political goals, so I want to build a system I can trust. I want to know that when I make claims, I’m not speaking out of political distortion but out of honest truth. And I want to be able to evaluate the claims of others too.
So how would such a system work? First, large claims (“Gore is a serial liar,” “Ronald Reagan was a great president”) would be broken down into smaller component parts (“Gore claimed to have invented the Internet,” “Ronald Reagan’s economic plan created jobs”). On each small claim, we’d run The Process. Let’s take “Gore falsely claimed to have invented the Internet.”
First, some ground rules. Everything is open. Anyone can submit anything, and all the records are put on a public website.
We’d begin with collecting evidence. Anyone could submit helpful factual evidence. We’d get videotape from CNN of what exactly Gore said. We’d get congressional records about Gore’s funding of the Arpanet. We’d get testimony from people involved. And so on. If someone challenged a piece of evidence’s validity (e.g., “that photo is doctored,” “that testimony is forged”), a Mini-Process could be started to resolve the issue.
Then there’d be the argument phase. A wiki page would be created where each side would try to take facts from the evidence and use them to build an argument for their case. But then the other side could modify the page to provide their own evidence, expand selective quotations, and otherwise modify the page to make it more accurate and less partisan. Each side would continue bashing the other side’s work until the page gave the best arguments from each side, presented in such a way that nobody could object. (You may think that this is impossible, but Wikipedia has ably proven that it can work.)
Finally, there’d be the adjudication phase. This is the hard part. A group of twelve fair-minded intelligent people (experts in the field, if necessary) would agree to put aside their partisanship and come to a conclusion based on the argument. Hopefully, most of the time this conclusion would be (after a little wiki-rewriting from both sides) unanimous. For example, “While Gore’s phrasing was a little misleading, it is clear Gore was claiming to have led the fight for providing funding for research that was later developed into the Internet—a claim that is mostly true. Gore was one of the research’s major backers, although others were involved.”
The panel would be assembled by selecting people widely seen as fair-minded and intelligent, but coming from different sides of the political spectrum. It is likely many would accept—all they’d need to do was read a page and spend a little time agreeing to summarize it. And in doing so, they’d provide a great contribution to political debate (as well as getting their side represented).
All of these phases would be going on essentially simultaneously—the argument could be updated as new evidence came to light, new evidence could be added to fill holes in the argument, and the adjudicating jury could keep tabs on the page as updated.
And once a decision on an issue was made, it could be cited as evidence in the argument for a related issue (“Gore is a serial liar”).
Everything would be very fluid and wiki-like. We’d make up the rules as we went along, seeing what was necessary. And when we learned from our mistakes, we could go back and fix them.
This seems like an awful lot of effort for just coming to a decision on a couple of silly issues, but I think it’s far more than that. The result would be a vast collection of trustable arguments for many of the hot topics of the day, a collection that could be relied on through time to give you the fair truth—because everybody had essentially signed off on it (it is publicly modifiable, after all). And if you look at the effort expended on these claims and political fights, spending a little time getting the facts right seems like a small price to pay.
Welcome, Watchdog.net
http://www.aaronsw.com/weblog/watchdog
April 14, 2008
Age 21
As you’ve probably noticed, it’s political insanity season in the U.S. I can hardly go outside these days without running into someone complaining about the latest piece of campaign gossip. I’ve mostly tried to keep it off this blog, but it’s hard to not get swept up in the fever. As someone who wants to make a difference in the world, I’ve long wondered whether there was an effective way for a programmer to get involved in politics, but I’ve never been able to quite figure it out.
Well, recent events and Larry Lessig got me thinking about it again and I’ve spent the past few months working with and talking to some amazing people about the problem. I’ve learned a lot and must have gone through a dozen different project ideas, but I finally think I’ve found something. It’s not so much a finished solution as a direction, where I hope to figure more of it out along the way.
So the site is called Watchdog.net and the plan has three parts. First, pull in data sources from all over—district demographics, votes, lobbying records, campaign finance reports, etc.—and let people explore them in one elegant, unified interface. I want this to be one of the most powerful, compelling interfaces for exploring a large data set out there.
But just giving people information isn’t enough; unless you give them an opportunity to do something about it, it will just make them more apathetic. So the second part of the site is building tools to let people take action: write or call your representative, send a note to local papers, post a story about something interesting you’ve found, generate a scorecard for the next election.
And tying these two pieces together will be a collaborative database of political causes. So on the page about global warming, you’ll be able to learn more about the problem and proposed solutions, research the donors and votes on the issue, and see or start a letter-writing campaign.
All of it, of course, is free software and free data. And it’s all got a dozen different APIs to make it easy for others to build on what we’ve done in their own work. The goal is to be a hub, connecting citizens, activists, organizations, politicians, programmers, and everybody else who’s interested in politics.
The hope is to make it as interesting and easy as possible to pull people into politics. It’s an ambitious goal with many pieces and possibilities, but with all the excitement right now we want to get something up as fast as possible. So we’ll be developing live on Watchdog.net, releasing pieces as soon as we finish them. Our first goal is to put up data about every representative and a way to write them.
I’ve managed to find an amazing group of people willing to help out with building it so far. And the Sunlight Network has encouraged me and graciously agreed to fund it. But we still need many more hands, especially programmers. If you’re interested in working on it, whether as a volunteer or for pay, please email me, telling me wha
t you’d like to help with.
A Database of Folly
http://crookedtimber.org/2012/07/03/a-database-of-folly/
July 3, 2012
Age 26
The open data movement is a hammer which has gathered the support of many nails. There are the curious taxpayers, who feel their annual checks mean they deserve a peek at the interesting facts the government has collected. There are the ambitious business owners, who see an opportunity to privatize profits from work with socialized costs. And there are the self-styled activists, who believe that if we reveal the data on what the government is really doing, we will arrest corruption by exposing it to sunlight.
The coalition is a confusing mix of these very different motivations (as Tom Slee observes), and the benefits of such a tactical alliance has come with the cost of some confusion. So let’s be clear about what open data can and cannot do.
If the St. Louis Fed publishes reams of economic data, it can certainly make it easier for Mr. Yglesias to make his fantastic charts. If the MTA makes real-time subway information public, it can certainly let Mr. Ernst improve his fantastic app. And, as the talented Mr. Lee pointed out to me, his careful collection of data about members of Congress and the bills they’re passing can be an invaluable resource for professional activists.
So, if I got to choose whether the government should share the data it’s collected, I’d happily vote yes. In fact, I spent several years of my life using the FOIA laws to force it to do just that. I can’t claim my work had any particular impact, but as a curious taxpayer, it was a weirdly enjoyable hobby.
But the open data movement often claims to be much more than that. They insist open data will not just help a few people with their jobs or a few kids with their hobbies but, as the Sunlight Foundation puts it, “make government transparent and accountable.” And that I just don’t see.