Tag results

Clash of the encyclopedias - SHARISM conference

Well, this blog is slowly rumbling towards retirement, but I have a nice announcement for any readers in or near China :) In just a couple of weeks I will be speaking at a ‘Sharism’ forum in Shanghai as part of the Get It Louder festival. Pretty cool!! The other speakers look an awesome crowd.

The name “sharism” comes from an essay by Isaac Mao, a venture capitalist and blogger. His conclusion is suitably bold:

Emergent democracy will only happen when Sharism becomes the literacy of the majority. Since Sharism can improve communication, collaboration and mutual understanding, I believe it has a place within the educational system. Sharism can be applied to any cultural discourse, CoP (Community of Practice) or problem-solving context. It is also an antidote to social depression, since sharelessness is just dragging our society down. In present or formerly totalitarian countries, this downward cycle is even more apparent. The future world will be a hybrid of human and machine that will generate better and faster decisions anytime, anywhere. The flow of information between minds will become more flexible and more productive. These vast networks of sharing will create a new social order−A Mind Revolution!

Well, I’m going to be speaking about something much smaller :)

Clash of the Encyclopedias: Is Competition Good for Sharing?

One of the benefits of the open web is that good ideas can flourish easily. In the Chinese speaking web, the idea of an online encyclopedia has been especially fruitful. With the Chinese Wikipedia enjoying its eighth birthday this month, it’s worth examining whether the fragmentation of efforts ultimately leads to a better product and bigger communities, or if the “us vs them” mentality is harmful to sharing.

I don’t know any details yet about how much it will cost, or even where it will be specifically, but I’m sure those details will be surfacing any day now! I believe there will be simultaneous interpreting (a la Wikimania Buenos Aires) which will be great for the attendees.

11 October, 2010 • ,


Is mass collaboration all it's cracked up to be?

On Friday I spoke at the National Library of Australia’s Innovative Ideas Forum. It is an annual day, free and open to the public, that aims to “highlight new ideas”.

The other speakers were excellent and I will probably post again when the videos/podcasts become available.

The NLA encouraged the audience to take part via Twitter and they certainly did. It’s the first time a talk I’ve given has been so heavily “liveblogged” and I was a bit nervous about it but I needn’t have been.

thanks aldellit! And more tweets. My name proved very hard to spell. Just as well I didn’t tell them my username. ;)

Download this essay as a PDF

(the PDF doesn’t have any links, but it may be easier on the eyes)


Is Wikipedia a one-off?

Is mass collaboration all it’s cracked up to be?

This essay is a slightly modified version of a speech given at the National Library of Australia’s Innovative Ideas Forum 2010. It is licensed under the Creative Commons Attribution ShareAlike license.

What is mass collaboration?

This month I had my fifth Wikipedia birthday. Five years ago, I was 21, in my fourth year of university and probably procrastinating before exams, and I discovered Wikipedia, and created an article for the Victorian Women’s Football League. After a few more edits to the same article, I left a lengthy comment on the talk page of the HSC article. And thus my Wikipedia career began.

Over the years I’ve written articles on famous pubs and theatres, the town where I grew up, bands and authors that I like. I’ve taken and added around 100 photographs of tennis players, and hundreds more from my travels abroad and even just in the streets of Melbourne. I’ve made over 10,000 edits on Wikimedia Commons, which is a sister project to Wikipedia that hosts images and media files. I’ve written over 200 blog posts about Wikipedia related topics. I’ve welcomed newcomers, banned trolls, drafted policies, I’ve volunteered for projects, I’ve mediated disagreements. I’ve answered press requests. I’ve been to three international Wikimania conferences and expect to attend my fourth in Poland this July. And for the past couple of years I’ve concentrated my attention on Wikimedia Australia, which is a local not-for-profit that aims to promote free cultural works, such as Wikipedia.

So it’s safe to say that Wikipedia has changed my life.  It’s easy to forget how Wikipedia has changed the world. It may not impact on your life in a way that you constantly re-evaluate, but Wikipedia is to the internet as the Library of Alexandria was to the ancient world. The thing is amazing in itself, but it is also amazing in what it stands for, and how it opens our eyes and forces us to re-evaluate the possible and impossible. Ten years ago it was obvious that a project that let anyone contribute, with no pre-publish checks or balances, would never have a chance at creating the biggest reference work ever known, or even anything remotely comprehensive. That was obviously impossible. Today, our horizons for the impossible have been pushed back that much further.

Yet the Library of Alexandria was destroyed, and while fire and imperial coups are not a great concern for Wikipedia, there is no guarantee that Wikipedia as we know it will continue. The greater risk at this point is of complacency. Taking Wikipedia and its success for granted has the potential to harm many a fledging mass collaboration project as well as Wikipedia itself. And it would be a sad loss if we were to let that happen.

Wikipedia is just the most well known of hundreds or maybe even thousands of mass collaboration ventures that exist online. Mass collaboration projects involve self-selected participants from anywhere in the world coordinating their work with others towards project goals. Many wikis are mass collaboration projects, even if they currently only have dozens of editors rather than hundreds. Social networks, blogs or microblogs can be used as tools in mass collaboration projects, but they are not mass collaboration projects themselves — there is no single goal directing everyone’s efforts.

I’m interested in the questions around the potential for mass collaboration projects to change society as we know it. We have seen that they can achieve amazing things just in coordinating to write a reference work together, or write an operating system together. Could the goal of a project like this be taken offline? Could a mass collaboration project be as effective offline as they are online, or are such efforts doomed to fail?

What are the promises of mass collaboration?

Mass collaboration promises to change the world. And yet like so many new technologies and technological practices, the reality takes a while to catch up to the dream.

Wikipedia promises comprehensiveness, but somehow we get deletionism, where if it doesn’t look like your brand new article was born among the pages of Encyclopaedia Britannica, it might be deleted before you’ve had time to finish congratulating yourself on hitting the edit button in the first place. The experience is hardly a welcoming one for new editors.

It promises depth by accretion, but we get controversies about biographies made defamatory and left unattended for months.

It promises a destruction of the ivory tower of traditional journalism and academia, but as the community experienced exponential growth and the kind of overnight success that takes years to attain, it also embraced a conservativeness in its belief about the Wikipedia product. Embracing encyclopedia traditionalism was a way of defending itself, a kind of appeal to existing authority.

There is the promise of the potential to overturn European and North American biases in how the academic model of the world is constructed, yet the embrace of traditional encyclopedism and verifiable sources means the existing biases are entrenched rather than overturned.

Most of all, it promised that ‘anyone can edit’ and to let you ‘ignore all rules’, but instead we have the dubious honour of creating what is probably the world’s most arcane bureaucracy that ordinary people are actually expected to interact with. Lawyers speak in Latin, but Wikipedians speak in acronyms, and the result is no less exclusionary.

Now these points represent a cynical view. Most of the time, we get both parts. We get everything. Parts of Wikipedia are bureaucratic messes, and other parts are shining beacons for the wisdom of crowds, and how our shared understanding is improved through argumentation. But the honeymoon for Wikipedia is well and truly over, and as part of the web “wallpaper” we now take for granted, it remains to be seen how it might successfully continue.

What is the real potential of mass collaboration?

The vision statement of the Wikimedia Foundation, the US not-for-profit that keeps the lights on at Wikipedia, is as follows:

Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.

The sum of all knowledge. Every single human being. Now that’s a pretty bold claim. It’s also something written with the benefit of hindsight, when Wikipedia was already on its way to making that a potential reality.

Yet as bold as that statement is, to my mind, it doesn’t go far enough. While providing a cost-free, accessible comprehensive educational resource is undoubtedly a force for good, it’s only half the story.

The part of using Wikipedia that really changes how you think, and understand the world and yourself, is the editing part. The contributing, participating part. Understanding that you have the authority, the permission, to contribute to the world’s collective record of itself, that your perspective is valuable and needed, is the game-changing manoeuvre here.

Now if you think that this is nothing special, and it’s always been available to anyone who worked hard enough for it, I invite you to check your privilege. Consider how your nationality, your class, your education, your gender, might contribute to voices like yours being privileged above others. For most people in most of the world, having their voice heard and valued is an unimaginable dream.

Even people in the western world, who generally start out with an advantage, realise how valuable this affirmation is. I have been heartened this week to see criticism of Apple’s iPad for being ‘lean back’ rather than ‘lean forward’. Some media outlets see it as manna from heaven that a popular technology will force people back to being passive consumers, receptacles for advertising and squares of pre-approved “content”. But the presence of the criticism means that we are starting to collectively internalise participation as the default mode for digital interaction. We get a new device, we want to know where the edit tab is. We want to know where the comment box is. We want to know where the feedback forum is.

The owner’s manifesto of MAKE magazine is “if you can’t open it, you don’t own it”. There is no single word yet for the concept of this kind of ownership over digital works. If you own it, it doesn’t mean you created it, but you take some responsibility for participating in it in some way. One day we will be able to say about websites, “if you can’t participate in it you don’t own it”, and the participation right of digital works will be as meaningful and important as the property right of physical objects.

How is Wikipedia typical and how is it atypical?

Now all this talk about participation doesn’t mean that we’re all running around thinking deep thoughts about everything we take part in online. Every community has its die-hards and equally every person has some things which they are passionate about participating in, and others which they are happy to be an observer or spectator of. That’s normal, that’s the power law distribution or long tail that says 90% of Wikipedia readers never edit, 9% have a dabble and 1% become dedicated contributors. All large-enough communities are like that, and that’s what drives these projects, not what marks them as failures. This is what Clay Shirky says in his book Here Comes Everybody, which I highly recommend reading to get a comprehensive understanding of how systems like Wikipedia work.

While Wikipedia serves as a great example of what is possible with mass collaboration, it can also harm potential projects that might take Wikipedia as its template, and expect similar outrageous success.

Taking Wikipedia today as a model for a new project is bound to end in tears. Firstly, the Wikipedia of today has almost ten years of existence baked into it. The Wikipedia of 2001 may be a better comparison, when a guideline that said “ignore all rules” was actually sincere advice and not a sad irony.

Secondly the success of Wikipedia is of course an outlier. The initial promise of Wikipedia was simply that you could add your contribution right away, not that you could help write the world’s biggest encyclopedia. So keep your promise proportional to reality.

Thirdly consider carefully your rhetorical model. What does your ideal contribution look like? It won’t look like Wikipedia’s, unless your project is also to create an encyclopedia. Wikipedia benefited from broad shared understanding of the idea of an encyclopedia. Many new mass collaboration models will have to struggle through creating their own rhetorical model, until such projects become mainstream.

Having said that, there are elements of Wikipedia that any mass collaboration project would do well to consider. The first is that of free content licenses.Wikipedia is licensed under the GNU Free Documentation License and also the Creative Commons Attribution ShareAlike license. These licenses are known as ‘copyleft’ — a “hack on copyright” that turns it against itself, and uses the law to tell the public that certain works can be re-used without seeking permission first. These licenses are a guarantee that the community will always be able to salvage its own work, should everything go haywire. It’s not that uncommon: funding dries up, and a host organisation no longer has the resources to devote to an experimental Web 2.0 project. These licenses also give the community a degree of autonomy, which it will need should a managerial decision not be to their liking. This forces a host organisation to be as honest and transparent as the community requires – the threat of a project “fork” means a host organisation keeps its monopoly because the community wants them to, not because they don’t have any other choice. For the same reasons, projects should use technology that is freely available, uses open file formats, and make managed data sets easily available, such as through “Export” functionality. In this respect I take my hat off to Google for its Data Liberation project, which aims to make it easier for users to move their data in or out of Google applications. If you’re using closed technology and copyright to hold your community hostage, chances are you’re doing it wrong.

The free content license also means I am not as concerned as you might expect about the prospect of Wikipedia imploding. I believe that Wikipedia as we know it, is not currently on a sustainable track. There are too many rules. The community is too unfriendly. It’s even a problem that it’s now too mainstream. While going mainstream is great for the impact of the project, it’s not great for recruiting new editors. Niches, fringes and outsider status are what make people passionate contributors, and Wikipedia as a whole now has none of these.

However, Wikipedia the product, as in what we have all written together, is guaranteed to outlive the current incarnation of Wikipedia the community or Wikipedia the project. Because of the free content license, anyone around the world who wants can download a copy of the entire thing and start a competing fork. Fork is a term from open source software, where anyone who is unhappy with how a software project is progressing, can take an entire copy of it and start working on it independently. While the right to fork is very important in open source development, communities work to avoid doing so where possible, because it divides efforts and loyalties.

Forking the entirety of Wikipedia is likely to be a pretty painful exercise because of its size, and any fork project is likely to be overwhelmed by bad edits before it can make any progress. But I see a missing piece of the puzzle as being the potential for part-forking. Wikipedia in its current incarnation is too reliant on a centralised mechanism for editing. We are missing some technology to let groups easily fork a small set of articles, and just edit those articles, and keep them synchronised. This would let narrowly focused groups take care of a subset of articles, and have them be just as readily available to readers, without the hassle of the centralised bureaucracy. There exists technology for actions like this in the open source software world – we can do it for code, with distributed version control, but we can’t yet do it easily for collections of prose. I think the distributed or decentralised wiki is what will breathe new life into Wikipedia.

Australian mass collaboration projects

Clay Shirky says something which we need to keep in mind when thinking about mass collaboration, which is that we get failure for free, and lots of it. What this means is that the barrier to starting a mass collaboration project is now so low, we can all afford to start dozens each day. We no longer need to evaluate if it will be worth the cost, because if you’re on the internet the cost is next to nothing. This means instead of only seeing projects that have been assessed by an organisation as being likely profitable — including administrative and managerial overheads — we can see nearly every project that’s ever popped into anyone’s head. The overheads have dropped to nearly nil. So what we get looks like an awful lot of failure, and it is. The difference is that previously all this failure never had a chance to get out of the starting blocks. But neither did the rare successes.

It is difficult to convince risk-averse management to take on experimental new projects, and even more so if you have to admit that the most likely outcome is failure. But we will have to find a way of making likely failure acceptable to traditional institutions. If this risk is too great for them to take part, they will need to consider the alternative risk of becoming obsolete by choosing not to engage in new ways at all.

Luckily in Australia we have many bold institutions, or more to the point bold individuals in the cultural and public sectors, who are dipping their toes or even a whole foot, into the 2.0 Web. I will mention just a few, to demonstrate the breadth and inventiveness of our experiments so far.

WikiNorthia is a wiki local history project coordinated by a few community libraries in Melbourne. The City of Melbourne put their 5 year plan into a wiki as part of public consultation, called FutureMelbourne. Founders and Survivors is an Australian Research Council project, tracking Tasmanian convicts and their descendents by combining detailed historical records with publicly contributed family history artefacts. The Powerhouse Museum in Sydney was the first Australian institution to take part in the Flickr Commons project, and engages with its online community in a consistent and considerate way. OpenAustralia is an open source project with no institutional backing that reworks Hansard records to make parliamentary proceedings more accessible and politicians more accountable.

Lastly I could not talk about mass collaboration today without mentioning the National Library’s Australian Newspapers Digitisation Program. After OCRing newspapers spanning some 150 years, Rose Holley and her team put their millions of pages and OCR text up on the web and invited the public to correct the texts. With little publicity, it has found a niche of people who enjoy correcting the text from old Australian newspapers. Thousands of users have collectively corrected millions of lines of text, and told the Library how addictive and interesting they found it. This project has really found a sweet spot for user contributions and I am already looking forward to hearing how it evolves in the future.

The range of projects being undertaken shows there is no shortage of good ideas and interested users in this country. If we keep up the same pace we may soon be showing the rest of the world how it’s done.

Where to now?

Mass collaboration offers a way for groups to organise without having an organisation. But organise to do what? Mass collaboration has been used very effectively to organise around single issues or single events, but it is yet to really cross over to the kind of sustained action that we so far only know how to do through lobby groups, politics, charities and the like.

A major part of the problem lies in crossing back from the online landscape into the political world we are so used to, where groups need to be incorporated to be taken seriously, political responsibilities may stop at an arbitrary border, and the law marches at a speed that could be considered glacial in internet time. These difficulties mean that mass collaboration projects struggle to be taken seriously and often don’t bother, preferring to stay in their sphere of influence — the online sphere.

Wikimedia Australia is such a group — we want to take the interesting, amazing things we’ve learned about mass collaboration offline, and help more sectors and more individuals take part in similar projects. Not only should everyone have equal access to use free cultural works, like Wikipedia, but just as importantly, everyone should have an equal opportunity to participate in creating them. We have certainly struggled at times to meet the expectations of meatspace, and so a challenge I see looming for us as a society, is: how can we better accommodate mass collaborative projects making changes in the real world? What organisational structures might we be able to support? How might they be accommodated in law, especially given their borderless nature? If we can find answers to these questions, then maybe we will see the potential of mass collaboration as a genuinely revolutionary force.

19 April, 2010 • ,

Comment [3]

Charles Matthews: Backing Limited Perspectives

This is a guest post by Charles Matthews. See also his previous posts. —Brianna

That is not what BLP stands for on Wikipedia, though you might sometimes wonder. This now-notorious three-letter acronym stands for Biography of Living Person. Wikipedia hosts several hundred thousand of them, and the summary deletion of a number of those has recently caused consternation and recrimination, not to speak of admin-on-admin disrespectfulness, of a kind that hasn’t been seen for, oh, all of several years.

BLPs are troublesome because in real world terms they affect lives, in legal terms they may be defamatory, and in Wikipedia terms content policies must be applied very strictly, and still may give poor results. But they predominate among biographies: there is a decay law saying if you go back a decade by birth date the number of biographies for that year of birth drops off by a factor (could be something like 20% or 30%) and this is quite marked as you get back to 1900 and before. Around 1983 is the peak (over 8000), which tells us what? Duh, sport (Finnish speedway stars, anyone?) and popular culture. Editors add but do not necessarily maintain well numerous articles about young stars aged 27 or so who are mentioned in the media.

Back among the grumpy folk known as “old school Wikipedians” the term “MySpace page” may occasionally pass the lips, but surprisingly, perhaps, there is a classic old-style inclusionist argument that works the other way. In a polite form it reads “if you come across work of others on the site that is substandard, your first task is to try to improve it, before cutting it or sending it for deletion”. In the matter of BLPs substandard means just one thing: references absent or low-grade. Wikipedia shouldn’t post things about real people out there that are just made up. We all agree. So, an editor finding a substandard BLP should try to reference it better.

Nice theory. BLPs are speedy-deleted by the thousand as newly-posted pages when unreferenced, sometimes quite wrongly, because as posted they don’t have the references needed to support them (no verifiability and/or no convincing reason to support notability). The drama has come up when the same criteria, or stricter, have been applied to articles dormant on the site for years: unreferenced BLPs that seem not to be going anywhere better.

So what is the “old school” counter-view? ‘Wikipedia has no fixed rules’ is part of the old-time mantra called the ‘five pillars’. Which allows for tectonic shifts in how things are done. Some anti-BLP activism has homed in on the broken nature of incremental change in dealing with the issue. Some BLPs are inherently problematic, functioning only as places of wars between supporters and denigrators of a real person (I have to babysit three of those). As Wikipedia expands, it gets into the area of biographies that are not that easy to reference. And such tenuous biographies may just have to remain, as things stand, because the inclusionist view amounts to saying that you are obliged to nurture them. And indeed better references or a controversy may turn up tomorrow: I started Ruth Padel never dreaming she’d be in the news so prominently. I read a history book by David Gress not knowing he was going to appoint himself to climate change controversy.

So what has happened? The limited perspective that there is no real lower threshold for biography on Wikipedia has created another limited perspective, that only a radical cull and shift to a seriously summary deletionist policy on BLPs can save Wikipedia from a future as a morass of neglected gossip about real people. Some demonstrative admin actions on the site have brought the matter to the top of the agenda. These things get messy and costly in human terms, but the logjam gets broken along with the eggs for the omelette, and the real losers are peaceful editors who detest mixed metaphors. No, this is serious stuff, but the lurching motion is unfamiliar to those who haven’t seen Wikipedia in this mood.

25 January, 2010 • , ,


Why the reporting on Wikipedia is so bad

So Jimmy Wales has a piece on the Huffington Post about the bad reporting of flagged revs. Frankly, of all the things I would ping traditional media on, confused reporting of a complex new editing approval system is one the last things — I have yet to see anyone in the community explain clearly and concisely the system under consideration, so I think it is asking rather a lot that outsiders should grok it when we are struggling with it.

Of particular interest was

I believe that the underlying facts about the Wikipedia phenomenon — that the general public is actually intelligent, interested in sharing knowledge, interested in getting the facts straight — are so shocking to most old media people that it is literally impossible for them to report on Wikipedia without following a storyline that goes something like this: “Yeah, this was a crazy thing that worked for awhile, but eventually they will see the light and realize that top-down control is the only thing that works.”

Hmm. So it’s that the Wikipedia story is all sunshine and light, and they’re all cynical hacks? I think more likely, is the fact that they simply don’t understand how Wikipedia works.

In musing about Software Freedom Day, I watched a video of a talk by Bill Thompson in which he talked about the “‘10 cultures’ problem” (see Wikipedia for reference, or just watch the video – he gives a detailed explanation), by which he means the divide between those who understand how technology works, and how to work it (in theory, if not practice), and those who do not. (Yes, the title is a binary joke. Did you get it? Then you’re on this side of the divide.)

The fact that we can still see stories published about some article on Wikipedia being wrong, says to me that those stories are written by people who simply don’t understand how Wikipedia works. That is not to defend Wikipedia containing wrong information at any given time. But it is to say, the focus is not in the interesting, important parts. As Bill Thompson puts it, in a debate about national ID cards, it’s like focusing the argument on the physical card itself, rather than the national identity register.

I would like to see reporting on wrongful Wikipedia blocks – cf. reporting on when people are wrongly barred from voting. And no I’m not saying Wikipedia is a democracy, or should be one. But when the promise is engaging and empowering people around the world to develop the sum of all knowledge, and when the impact is what it is (top 5 website), then yes, it is right to have the scrutiny of traditional media all over it.

I mean, it is all there for them to find, too. But they don’t know how, is my guess.

22 September, 2009 • ,

Comment [2]

Charles Matthews: Evolution, not Revolution?

A guest post by Charles Matthews. See also his previous posts. —Brianna

After a quietish half-year, by the drama metric (one of the two unsubtle ways to talk about the English Wikipedia, the other being article count), July is heating up. The constitutional issue is traditionally one great big grey area with a few livid spots. It may now flare up, with results that are less predictable than usual. Where does this upwelling of political angst come from? And will there actually be change? Successful constitutional innovations are in fact few, and the traditional demand for a new, good-looking-on-paper constitution is equally traditionally disappointed. The wiki technology has seen substantial changes, such as logging in with one username on all the Wikimedia (WMF) sites; there is nothing recent you can point to that has smoothly upgraded the social side of the site. In a startling reminder that those at the heart of the matter, the embattled Arbitration Committee (ArbCom), are anything but complacent about the general direction, stalwart Arbitrator Kirill Lokshin resigned a few days ago over the hostile reception to a plan for a new ‘plug-in’ to the system, taking one of the 2009 intake with him.

The whole business is rooted in events of four or even five years ago: the period in which Jimmy Wales started to pull back from micro-managing the English Wikipedia (enWP). His role in the other language Wikipedias has always been largely symbolic, and one question is, with enWP still the flagship of the WMF, whether Jimbo should simply be a figurehead on the ship? This is not in fact a question the ArbCom has worried about too much about in the past (I should note that I have been out of the loop entirely for six months). If you take the issue of biography of living persons (BLP) as really concerning, much more so than constitutional niceties, the David Rohde story shows Jimbo still has a role as much more than a symbol: the editor of the New York Times phones him. BLP is vexed because there are hundreds of thousands of such articles, each one being a potential problem. When the ArbCom prompted a noticeboard to be set up the site for basic admin policing of BLPs in 2008, there was a predictable onsite row about the ArbCom overstepping its role in dispute resolution. (The OTRS email system gets something like 300 emails a week, typically complaints prompted by BLP troubles, but mere statistics cut no ice.) Jimmy Wales summarily deleted an article designed to attack a journalist writing about enWP: more attacks on him. The Rohde story was by remote control as far as Wales’s involvement went, but controversy raged. Was a life really at stake? Some people seem very certain about the answers to questions so indeterminate by nature.

So Jimmy Wales has pulled back some way, and the real point is not that he is still active on some fronts, but that there is no single replacement. The ArbCom is there to handle the worst disputes, but as an elected body has become the default object of constitutional debate. The politics can look simple, one-dimensional: picture an axis with hard-line administration at one end (people who would talk about “executive decisions” if they could get away with it), and at the other end extreme free-speeches and wiki purists. At first sight this looks no contest: enWP is not a purist wiki, because it has content policy (see On Notability), and if you get out of line, there are over 1000 admins to straighten you up. No one says that Wikipedia guarantees free expression. But once you mix special interests into the brew, you find greater complexity. Divisive talk about admins versus “article people” is one sign of this; fringe science and featured articles generate such strong feelings; such matters can constitute planks in electoral platforms for, what else, ArbCom. The way this all pans out can be sometimes be read in detail on Wikipedia’s criticism sites, if you feel it worthwhile to make it past the sneery misinformation which is their usual stock-in-trade (believe me, unless you have 90% of the story straight already it is essentially impossible to extract value).

What is hard to believe, right now, is that ArbCom+plug-ins, in other words the setting-up of some other bodies on the site to help management, is such a complete dog of a solution. In another part of the forest, there are people questioning Jimbo’s actual constitutional powers, namely (a) appeals from ArbCom decisions, and (b) implementing ArbCom election results by selection new Arbitrators. The scandal of User:Sam Blacketer shows that (b) is not a trivial matter: it’s the Internet, folks, and sometimes we’re in an episode of “House” with Hugh Laurie saying “everyone lies”. But in any case it is hard to see how to move ahead by evolution, not revolution, with (a) or with (b), without some sort of plug-ins. An impasse, and while I regret that Kirill resigned, I know how he feels. Wikipedia is taken seriously, now, something I wouldn’t change; I wish on occasion some of that seriousness would percolate into constitutional discussion onsite.

21 July, 2009 • , , ,


Charles Matthews: What did we learn from "Matthew Hoffman"?

This post is by Charles Matthews. Charles was a member of the English Wikipedia ArbCom from 2006 to 2008. His first guest post was On Notability. —Brianna

Some ArbCom (Arbitration Committee) cases on the English Wikipedia can reach the mainstream media: there was a recent decision on Scientology-related editing which did just that. Others are very much for insiders, and the innocuously-named Matthew Hoffman case, the topic of a recent ArbCom statement, is an example. I brought the case, a year and a half ago. This will be part retrospect, and part a meditation on “ArbCom 2009”.

What did we learn, then? The short answer is “not enough”. ArbCom 2009 has come to the view that the case should never have been accepted. I don’t think I’ll hire them as historians: the decision they have recently issued about the case is much the same as saying that in 2009 the case would not have been taken, and if taken would have been handled very differently. I’m not quarrelling with that conclusion since it is probably simply true, and it is well within ArbCom’s remit to reconsider matters and the way they were dealt with in the past. What catches my eye there is that justice was always an issue in the Hoffman case, since User:Matthew Hoffman was permanently banned by two admins on no evidence at all. That is one point, and the new statement changes nothing about it. And the other is that Wikipedia is a dynamic place. ArbCom 2009 is not ArbCom 2007 which accepted the case – only a couple of those Arbitrators are still there – and the whole context changes, particularly since ArbCom is an elected body. Elections also matter in this story, since both admins in the frame ran in the 2007 elections that could have put them on ArbCom 2008, and the case was concurrent with the election period.

The Matthew Hoffman case was brought by me because I thought the ArbCom (of which I was a member 2006-8) should look at how it could happen that two admins at the Adminstrators Noticeboard (AN) could decide on the flimsiest of grounds that the Matthew Hoffman account was a sockpuppet (of some other unspecified account), never think to ask for a CheckUser run to verify this and see what other accounts were involved, and one of them (SH as I shall call him) block the account permanently, with a misleading log entry saying “vandalism-only”. Now, in the light of the Scientology decision, the rationale on the admins’ side can be clarified this way: the class of ‘single-purpose accounts’ (SPAs) brings itself under suspicion, because an SPA edits just in one area. When (as for much Scientology-related editing) there is reason to believe that the editing of a group of SPAs is centrally organized, then worries increase. This argument was brought up in the Hoffman case, with creationism in the place of Scientology. The ArbCom of the time took little notice of this line of reasoning (rightly, in my view). It is still no crime to be an SPA, though it will in practical terms tend to tell against an editor in dispute resolution. Note the distinction, though: Hoffman was blocked by admins not trying to resolve a dispute, because the AN discussion of his case took place while he was blocked for 72 hours. That’s the key problem here with natural justice. Hoffman was locked out of responding on the site to the sockpuppet claim by a short block. (ArbCom found that while the Hoffman account was an SPA, there was no evidence at all that it was a sock. Suspicion is not evidence, but it plays a part in how matters are handled administratively on the site, so that justice is not always served.)

Someone else, before I got there, had put it to SH that the block should be reconsidered, only to be told that “sorry, it was consensus at AN”. Here’s another thing we learned, namely two admins on a noticeboard (meaning an unregulated onsite process) can decide to block someone indefinitely, on no evidence, and then fend off outside interest. That was as of 2007, and I don’t suppose the same uncritical attitude would pass muster now. It took some months for the matter to get to court, and I’ll not rehearse the whole history. The fact is that SH’s block was his personal responsibility, and was so treated by ArbCom when it took the case, which brought forth little general illumination beyond the SPA argument I have mentioned. It was shoehorned into being a case about SH; I (naturally) was recused, and this was not the inquiry I had wanted, but it was all out of my control. For more on the facts see my only extensive onsite discussion ; the matter is in the first two questions, but the joint statement in the blue box at the top of the page explains why I’m not going to cover this ground again, and indeed stopped short then.

I was outraged by the whole business: a culture of admins being unreasonable rather than responsive in this matter just created a fall guy. Let’s hope that has changed. How should it all work, in the big picture? My view: admins should be granted plenty of discretion in using their powers to defend Wikipedia’s content and mission. But admins who make poor discretionary decisions should expect to have to defend those decisions rationally when challenged; and failure to engage and make an acceptable case is a serious question mark over the admin. It’s not the mistake (we all make them), but the attitude to discussing the decisions that make up the admin workload. The admin community is in potential conflict with the small ArbCom (of about 1% of the size of the admin body) that can remove their powers. Some other Wikipedias do without an arbitration process, and so the justice mechanism is the admin body and its self-regulation; but self-regulation can be flawed, too. ArbCom can review ‘community bans’, namely bans upheld by all admins, but this kind of review now rarely causes trouble and it is unusual for a community ban appeal to succeed; this path isn’t really controversial.

The dispute that arose could certainly have been avoided by applying the maxim “thoughtful, not combative”. It was disastrous (all round) that a block discussed briefly at AN was confused with a community ban, with so much muddle. Was Hoffman a vandal, a sock, or a disruptive editor, and did anyone care which? None of the above: it was a bad block being covered up. Perfunctory discussion at AN must not be held up as deciding these matters once and for all. Why would it not have been important at least to know of what other account the Matthew Hoffman account was a sock? Why was he run off the site before being asked whether it was a real name? Those questions are pretty much rhetorical, but let’s not lose sight of natural justice. There has been strong advocacy, and much procedural argument, but let’s also hear it for the facts, evidence, and setting matters straight.

Hoffman hasn’t returned to Wikipedia. Moving on, what do we learn about ArbCom 2009? The ArbCom, as of 2009, seems to be binding itself to operate in a more tightly constrained way, by placing emphasis in its Hoffman statement on procedural rather than evidential matters. We are back to justice, but this is more like the apparatus of the television lawyer drama. In fact the ArbCom was changing as of 2008, accepting many fewer cases than before, and we are now at perhaps 25% of the caseload numerically compared to the peak period in 2006/7. These cases are generally more complex, and take several times as long to close.

The bigger picture is of admins plus ArbCom in tension on the English Wikipedia, as a shifting relationship that went through an uneasy period in 2008. We are certainly seeing some movement at the moment.

19 June, 2009 • , , ,

Comment [4]

Teaching teachers about Wikipedia

Last year I spoke at the Australian Computers in Education conference (you can see/hear my presentation at Slideshare), and although the crowd at my talk was small, the flow-on effects of giving it have been worthwhile. A couple of months ago I was contacted by a New Zealand teacher, Sandy, who had attended my talk and enjoyed it, and wanted to find a local editor who might give a similar presentation to her local area teachers’ “professional development” day.

After a bit of scouting around I located Matt Lane, a New Zealander editor with over 5k edits at English Wikipedia. Eminently qualified! I put them in touch and hoped it would work out.

Today I heard back from Sandy, who reported that the day was a great success:

Marlborough learning Community ICT cluster Teacher Only Day:

This was our clusters 4th Teacher only day based around up-skilling and using ICT in the classroom to enhance the teaching and learning opportunities for students (learning years 9-13).

The day started with a keynote presentation and then staff went to 2 workshops/ breakouts, lasting about 1.5 hours each, and with no more than 16 in a group. One of these was Wikipedia, how it works and how it can be used, by Matt Lane (NZ).

Matt presented a very interesting and entertaining look at Wikipedia and its wide reaching use by internet users and by educationalists. Staff left the presentation excited about the potential to use Wikipedia in their classrooms to aid teaching and as a site for students to access reliable and up to date information about their research topics. Wikipedia was no longer seen as a potentially “dodgy” place to get information but in fact one of the best places to start/ continue ones research through the links and opportunities to verify information presented there.

Well done Matt! And thank you Sandy for following up.

If you’re not averse to a spot of in-person rambling about the Wikimedia philosophy and practices, see if you can think of ways of increasing your visibility as a Wikimedia editor locally. Making yourself known might lead to more oppotunities than you realise.

28 April, 2009 • ,

Comment [2]

Edits in unexpected places

I went to de.wp because I wanted to contact a user from Commons who had pointed to his German Wikipedia user page. While there, I found that thanks to unified login, I was already logged in there. Sweet! I went to the preferences to change the language to English so I could at least read the menus.

While on the preferences page, I was rather surprised to notice that it said I had 6 edits at the German Wikipedia. To my knowledge I have never edited the German Wikipedia since after universal login was turned on. I may have added images to stuff previously as an anon user, but that was definitely pre-universal-login. So what were these edits?

My mysterious six German edits

Hm… they appear to be an utterly random selection of edits that I made years ago… and in English! Trust me, I have never done any German “rewording” or copyediting in my life. :)

Looking at the history of the German article on Mulesing is not very enlightening. A handful of recent German looking edits preceded by a ton of very English looking edits. The log reveals what actually happened:

It was transwikied. My name is there because the history is preserved. What is weird, though, is that edits I made in English point to, and are recorded against, my username in German, although I never made those edits in the German Wikipedia. I don’t know if this is unified login being ultra super smart, or a strange side-effect.

Anyone have any insight into this? Have you ever found any “unusual” edits like this in an unexpected wiki? :)

25 March, 2009 • ,

Comment [8]

Wikipedia forces the release of a Norwegian encyclopedia

Previously reported by Jon Harald, and a few weeks ago on foundation-l, but I wanted to highlight the good news again:

Our “national lexicon” here in Norway, Store Norske Leksikon, went online with its new free edition today. The new edition has user contributed articles. The chief editor says some of the reason for the new edition is the harsh competition from Wikipedia, especially no.wikipedia.org which outnumbered their previous article count last year, now counting 209,079 articles. Also the alternate version nn.wikipedia.org (a variation in Nynorsk) is growing steadilly, now counting 46,466 articles. Store Norske Leksikon now claims they has 300,000 articles after inclusion of two other encyclopedias, a medical encyclopedia Store medisinske leksikon and a biographical encyclopedia Biografisk leksikon. Previously they had 155,000 articles.

Wikipedia in bokmål should have 300K articles around February or March next year, it depends on how we will be influenced by the changes in SNL.

The Norwegian Wikipedia was launched in 2001. It’s one of the top 25 language editions of Wikipedia.

Further from the mailing list:

The release has been given a lot of press coverage, and some comparisons between the encyclopedias has been done. Two of them, in Dagbladet and Dagsavisen, has concluded that Wikipedia is best. According to Aftenposten the new edition will cost Kunskapsforlaget and their owners Aschehoug og Gyldendal NOK 25 mill over the next 3 years, approx USD 3.6 mill.

cf. Britannica asking for reader contributions

14 March, 2009 •


Innovation and commerce on the French Wikipedia - WikiPosters

I recently learned of a most interesting project currently taking place on the French Wikipedia. It is called WikiPosters. At each image page on the French Wikipedia, an unobstrusive link is inserted that says “Get a poster of this image (new!)”. Clicking on it drops down a short menu that provides a link to purchase a poster of that particular image through the WikiPosters website.

The “purchase a print” menu on the French Wikipedia image page (link)

Ordering the poster on the WikiPosters website (link)

(And it works with SVGs!!!)

What’s interesting is that this project was organised by the French Wikipedia community, originally spearheaded by Plyd. The printer is a commercial printer. They make a small donation to Wikimedia France for each poster purchased, but they have no contract or arrangement with them. And why would they? Wikimedia France no more controls the French Wikipedia than the Wikimedia Foundation (or Wikimedia Australia :)) controls the English Wikipedia. It is right, I feel, that the agreement should be with the French Wikipedia community.

I emailed Plyd to get some more information about the project. He sent me an excellent reply that I have just copied below.

In my opinion, free knowledge should leave online-only.
Printers are ready to spread free knowledge,
demand of printed knowledge is big,
we have numerous valuable pictures :
let’s just link pictures and printers !

That’s what I proposed in 2007. A test-link “Get a poster of this picture” had a great success on fr.wikipedia.org (over 6000 clicks a day). Unfortunately, I did not spend enough time to get in contact with a first printer. But one year later, in May 2008, a French printer contacted me. He was convinced of the potential of such project and proposed himself as a pilot of the project. That’s how it really started. The project took long discussions on French Wikipedia, about how to respect free licences, about the donations the printers could do, about legal issues etc. We eventually draw an open partnership, without signatures.
Then, the pilot printer developed his specific Website that could receive links from Wikipedia. We made the menu and the generator of licence data to provide along with the poster.
The menu was activated for all accounted-users during one month and we just activated it for everyone yesterday. [that would be 2008-12-16]

Main points of the partnership :

We are impatient to know how many posters will be distributed…
If it works as much as I hope, there are many ideas for next steps :

I asked Plyd if he had had trouble getting the community to accept the idea. While it seems an obvious benefit to me, for contributors to a “non profit” project it can often be confusing that commerce might have any place at all.

Actually most (I’d say 90%) of the community was really defending the project, but some voices did not like the ‘for-profit’ aspect of the printer. We put a parallel with search engines on Wikipedia search page, the booksellers on isbn pages or the geolocalisation tools also provided on French Wikipedia. […] [T]he partnership does not require any donation. his is up to the printer. I think it’s a good point for the printer to help the project by a donation. In my humble opinion, his 1.50€ donation will more convince poster buyers, like the first 1000€ donation helps to convince the community.

[…] They (a really minority part of the community) did not like that some people could make some money from contributors work, without even telling them. This shows that the free licences important lines are still not fully nderstood by everyone. Fortunately, other Wikipedians helped me explain differences between commercial and non-commercial free licences.
I really appreciate that a large majority was supporting the project.

If the project turns out to be a success on French Wikipedia, the Wikimedia Commons community will just about have to look at implementing it. I think they would be crazy not to. Again it goes back to the mission — disseminating free educational content. What wonderful classroom posters would some of our SVG masterpieces make?

The tricky, or rather interesting, part, then, may be finding suitable printing partners in enough countries in the world. (Although WikiPosters does worldwide shipping, the cost is prohibitively expensive.) Interesting because understanding free content is difficult enough, let alone how to massage the caprices of an amorphous online community. The WikiPosters folk are brave — I commend them.

Purchasing statistics seem to be available (not sure how often that is updated), so it will definitely be an interesting thing to keep an eye on over the next few months.

The other interesting aspect is how Plyd managed to pull this off, that is convince the community enough to take part. It’s well known that the Wikipedia communities (maybe all online communities? maybe all communities?) become increasingly petrified as they age. Petrified in place, and petrified of change.

And why we may all rejoice at the joys of a volunteer-driven non-hierarchy (or something), we rarely recognise the missed opportunities of our leaderless groups. As an example, I think that Wikibooks may be floundering a bit without formal links to curricula, publishers and other open courseware folk. They are in a more crowded “open” field than many of the other Wikimedia projects and struggle to distinguish themselves. (On a side note, I wonder what will come of Neeru Khosla of an “open source textbooks” group joining the WMF Advisory Board. I could not help but think of Wikibooks, but it didn’t rate a mention.)

Maybe Plyd is one of those magical people who can draw people together and convince them to put aside their differences, like a wiki Mary Poppins. But I hope not. I hope he is an ordinary person and that his success in this “real world” endeavour, will convince other ordinary folk in all the Wikimedia projects, to think about how they might pursue “real world” engagements that, yes, disseminate our works effectively and globally.

17 December, 2008 • , , , , ,

Comment [7]

Horses for courses

It’s a good time for NaBloPoMo, because there seems to be a lot going on. Like:

and lots of other stuff.

It was a nice surprise to see the front page of English Wikipedia today:

I’m not sure who organises these things (well actually, I think Featured Articles still has its BDFL position, “Featured Articles Director”, who decides when which articles get to appear on the main page), but: nice work. It’s a nice, relatively subtle way to acknowledging this major event in US affairs without alienating the readership of the rest of the world.

Horses for courses.

Me, on the other hand, well I live in a state that takes a holiday for a horse race. So I’m going to my workmate’s barbie and I’m definitely not going to watch any horse racing at all. :)

04 November, 2008 •


How to change Wikipedia (not just one article)

This is the only place I heard about this paper — identi.ca.

It is often claimed that anyone can edit Wikipedia, and this is somewhat true for individual articles, but the overall structure is increasingly hard to change.

This is a paper by Lars Aronssen, a Wikimedia Sverige board member, who goes by the name LA2. It was presented at the Free Society Conference and Nordic Summit (FSCONS), which I was deeply envious of those who were able to attend. (WMSV was one of the three principal organisers, along with Creative Commons and the Free Software Foundation Europe.)

In the paper Lars talks about his experiences implementing big changes on the Swedish Wikipedia. And they are really big — like implementing [[Category:Man]] or [[Category:Woman]] on all biography articles. If you wanted to do that on your local project how would you start? How would you rate your chances of success? What would you change about your normal approach to try and increase the odds of community acceptance?

This paper is an interesting read for those interested in the culture of the Swedish Wikipedia, but more importantly it is a must-read for anyone interested in implementing project-wide changes, such as those relating to categories or templates for large numbers of articles. It’s a very valuable case study that I will no doubt look up should I want to return to wiki-reforming. :) Much thanks to Lars for writing it up and sharing it.

Edited to add: I am a bit scared of the paper disappearing, so I have uploaded it locally.
greatchanges.pdf [132.85units_k]

03 November, 2008 • ,

Comment [3]

WikiDashboard now works on live Wikipedia

I don’t recall reading this yet…WikiDashboard now works on live Wikipedia. Previously it was operating on an April 2008 dump.

Well, mostly. Because it is relying on toolserver feeds, you will often enough get timeout-style errors.

As a reminder here is what the dashboard gives you:

(from the FAQ)

There is a similar thing for users (as opposed to articles).

It’s pretty neat that they have it integrated with a Reddit site, so you can easily share interesting page history analyses with other dashboard users. Now what would be even neater was if the IP names were auto-linked to WikiScanner. Oh and if it was available as an extension or gadget, and didn’t have to rely on the poor toolserver’s contrary feeds, that would be the best thing ever. :)

01 November, 2008 •


National Library of Australia catalogue incorporates Wikipedia text into author pages

The National Library of Australia catalogue has started incorporating Wikipedia text into author pages, e.g. Donald Knuth.

Screenshot of the author page for James Spigelman (cf. Wikipedia article)

As anyone who has tried to parse wikitext or even Wikipedia’s HTML will know, it’s not an easy task. Looks like the NLA needs to work on scrubbing references and ignoring disambiguation pages.

To see if they were caching data or pulling it live, I made a minor edit to the intro of James Spigelman and reloaded the NLA author page. To my surprise the change I had just made was updated, meaning they are pulling data live. (They are also pulling thumbnails from Commons, as in the Knuth bio, although the link to the image page has not been preserved.)

I suppose the NLA’s requests are like a fly on the back of Wikipedia, but still, it may not be a particularly good idea.

(via the Australian Wikipedians’ Notice Board)

01 November, 2008 • ,


Video from AussieChix microconf - Wikipedia & the education system

I had a great, although tiring, day yesterday: I went to the AussieChix microconf event. AussieChix is the Australian arm of LinuxChix. The “microconf” was a one-day event simultaneously happening in Melbourne and Sydney, with speakers in both cities, connected via videoconferencing. Giant thanks to Mary and Alice for organising it, and Google Australia and their wonderful employees for donating their space, bandwidth and time to enable us to have this event.

I first got involved with LinuxChix not long after WikiChix was founded, I suppose. I was curious about this group that we were modelling on, and I was probably feeling more confident about exploring Linux. I really can’t speak highly enough about the Australian LinuxChix. They are some amazing women. Every single one of them is just doing really cool stuff. Whether they are quiet or boisterous, they are all really strong and each have their own way of not taking shit from other people. It’s like women-company nirvana for me. And that we all just utterly geek out is the icing on the cake. :)

Anyway enough raving. I gave a ~15 minute talk on “Wikipedia & the education system”. It’s not anything super polished, just some thoughts I have been having since I attended ACEC, the computers in education conference.

Talk: Wikipedia & the education system from Brianna Laugher on Vimeo.

Skimmable format:

26 October, 2008 • , , ,

Comment [2]

New MediaWiki search features on Wikipedia

Two great features now appear on Wikipedia search:

1 – Blue box on right – search results from sister projects! (It does show results from other projects, but Wikinews is the most common.)

2 – Parenthetical links after article names – links to specific article sections. Useful for jumping to the relevant part of an article right away.

(For all I know they were enabled ages ago, but I only just noticed.)

I’m guessing these are native MediaWiki features rather than admin-enabled JavaScript etc hacks, but can anyone tell me for sure?

17 October, 2008 •

Comment [8]

☍ Data-scraping Wikipedia with Google Spreadsheets

Forgot this one, but the title says it all. And jam it into a Yahoo Pipe, too! This is so neat. I am more and more impressed with Google Spreadsheets. Oh, and FWIW the API couldn’t handle this. (via waxy.org)

17 October, 2008 • ,


Charles Matthews on Notability

This is a guest post by Charles Matthews. Charles has been a Wikipedia editor since 2003, an Arbitration Committee member since 2006, and is one of the authors of How Wikipedia Works. He commented on my post Wikipedia, the deeply conservative and traditional encyclopedia and I invited him to expand his thoughts on the topic into a complete post. The links were added by me; apart from that it is wholly his text. —Brianna

Should Wikipedia have an article on Charles Herman Kuhl? Some people would think so. Kuhl’s claim to fame is to have been slapped around by George Patton, in “shell shock is for cissies” mode. The fact that we even have to discuss such an issue seems to me to be a good example of what is wrong with the “notability debate”.

Starting in another way, I was asked in conversation at Wikimania about a proposal to create Wikipedia articles for each human gene. What were my reactions here? I said immediately, there should be no orphans, and the articles shouldn’t be any kind of walled garden. A walled garden would be some sort of sub-project that ordinary Wikipedia editors would find tough to relate to: say if the articles were hard to understand and edit in the ordinary way, or if standard deletion criteria were somehow suspended by fiat. Well, nobody can call off the deletionists that way.

What I did not have as a first or second reaction is “are human genes notable”? I thought, I guess, that the genes are a big part of what make us us. It isn’t interesting to quibble about that kind of thing. I also didn’t react with a query about “reliable sources”. The way I’d go about such a project is with a big listing of the genes, first. Create articles (therefore not orphans) from the listing page(s). If gene XYZ is not really well documented yet, don’t create the article, but leave it on the listing with whatever verifiable info there is so far. If a new gene article is a bit thin, agree to merge it back into the listing pro tem. Over the years, the gene articles project should grow up to reflect the science.

Where’s the problem here? Well, Wikipedia has its content policy, and part of that (or allied to that) is “topic policy” and/or “title policy”, the business of ruling of what topics the encyclopedia should cover, and the details of titling per topic. We tend not to talk about “topic policy”, and only vaguely about what is “encyclopedic”. Wikipedia anyway is only approximately an encyclopedia. What it is, really, is a tertiary source. And Wikipedia only approximately operates by choosing notable topics, in the everyday sense. It has a topic policy that allows topics in Sumerology to be selected according to what is notable to a Sumerologist. Quite rightly. Per field of endeavour, per academic discipline, Wikipedia is interested in surveying the major and minor but still reckonable topics. This is a different issue, by the way, from the mission statement “to provide the whole world’s information”. It is the question of the packages, not the contents.

So, there are a few annoying and catchy misconceptions around. They are like the tunes to bubblegum pop songs, in the way that you can’t get them out of your head even if you want to. Notability doesn’t apply to facts, but to topics: you are probably thinking of verifiability. We don’t have “notable” facts, but facts have salience or not relative to a given topic (very relevant to BLP). Notability is just a guideline. It therefore cannot be a reason to force inclusion of a topic. This is where US Army Private Charles Herman Kuhl comes in. For all there may be a guideline saying notability can be assessed by the presence of good sources, it cannot be the last word: Kuhl was written up by Time magazine, doesn’t make him within sensible “topic policy”. An article about him, simply based on the Patton incident, is probably a classic ‘coatrack’ in fact, written to make Patton look bad rather than to inform. The catchy tune here is that “enough reliable sources make a topic notable”. Oh no they don’t. Enough good sources are probably necessary for a topic’s inclusion, otherwise the article will be paltry. But the witness, bystander or (in this case) passive victim in a famous incident is not really notable: the “Patton slaps GI” topic is clearly destined for all time to be a subsection in the Patton article. The necessary condition of reliable sources isn’t sufficient.

Notability works adequately as a way to exclude topics at AfD: given five days to dig up reasons to keep an article, a sensible decision can often be taken, and the false positive and negatives are not so serious. A marginal decision at AfD decides the issue for six months, but not in fact forever, and debates are curtailed where they might cover imponderables. This works well enough. CSD A7 has worked much worse, in the past, and the bar has now been lowered: “An article … that does not indicate why its subject is important or significant. This is distinct from questions of verifiability and reliability of sources, and is a lower standard than notability.” This used to read in terms of an “assertion of notability”: the problem being that some people wouldn’t take office-holders to be notable for their office alone. An arguable point, but it seems to have been admitted that the “assertion” thing was broken.

Topic policy isn’t as broken, but what we now know is that the catchy isn’t always helpful in this area. [[Category:Wikipedia notability]] is a subcategory of [[Category:Wikipedia content selection]], but completely dominant … around 100 pages in there, and only the Fancruft, Neologism and Recentism pages escape. Really, we should start to revamp, explaining more clearly what the policy on topics is. There were recent polls to try to change the policy, but I thought the proposals were nearly all wrong-headed, and probably aimed at some of the successful tenets we have. In effect we do not allow subpages in article space and insist that summary style, highly desirable as it is, operate only through individually notable topics. Here we see topic policy plus concision constraining content policy, and a good thing too.

15 October, 2008 • , ,

Comment [2]

The trouble with WP:NOT (What Wikipedia is not)

My last post on Wikipedia’s reliance on traditional methods of authority prompted a burst of comments. I want to respond to some of the comments and expand some of my original ideas here.

First, thanks to these people: Poulpy wrote Wikipédia, l’encyclopédie profondément conservatrice et traditionnelle [fr], which appeared on Planète Wikimedia. Ulf Larsen wrote Slettere versus beholdere [no] on the Norwegian Bokmål Wikipedia Village Pump. It was also mentioned by Ben Yates, in the History News Network blog, and on the Russian Wikipedia’s Current issues page. I love to see people seeding interesting discussions across languages. I only wish there were a few more discussions seeded from other languages to English.

Mary asked how much of the Verifiablity and No Original Research (NOR) policies I considered to be necessary.

Geoff responded, better than I ever could,

These guidelines originally were created to deal with cranks & kooks who insisted on including their “important research.” Since then, guidelines like these have morphed into criteria with their own rationale for existence, ignoring the need that they be secondary to the need to create a useful work of reference — an encyclopedia.

I’ve often thought that Wikipedia needs to rewrite its policies on a more logical basis, starting with the assumption that we are writing an encyclopedia, a premise from which we logically derive all of the needed guidelines. But I’m damned if I can figure out how to do it: I’m still trying to formulate the starting statement — a useful definition of an encyclopedia.

Yep. OK first, the point about “a useful definition of an encyclopedia”. It is really a problem that there is no positive definition for what Wikipedia is. All we really have is What Wikipedia is not (WP:NOT). The reason that’s a problem is unless you have some very extraordinary and persistent individuals, rules such as policies only expand. It’s extremely difficult to get a shorter, less detailed policy accepted. It takes great writing and diplomatic skills, and probably some very savvy networking and wheedling. Unsurprisingly individuals with the talent and willingness to cull rules pages are in much lower supply than those who are able to just add one or two more sentences. Because WP:NOT is a negative definition, when it expands, it just eats up more things that Wikipedia may not be. Wikipedia can be whatever you may dream — until you put it into action and someone doesn’t like it, and adds your dream to WP:NOT.

A positive definition allows for much more creativity. It just sets the parameters, that an acceptable article/topic area will have these properties, and if you can think of a new presentation method that provides those things, then it should be acceptable.

When I was an administrator at Wikimedia Commons, for a time I tried to work on policies, mainly to resist instruction creep and the mindless importation of Wikipedia policies. Commons has no 3RR rule. For a long time we had no blocking policy. Our desire to minimise policy occasionally proved disconcerting to people who lived exclusively on Wikipedia. (I was disappointed to see recently the formation of What Commons is not.)

The main policy governing Wikimedia Commons is Project scope. It is a positively defined and broadly inclusive document — perhaps too inclusive at times. Partly because Commons was initially set up to serve Wikipedia, and partially because we didn’t have the cultural baggage of an existing academic entity to emulate, Commons’ most important scope restriction is Must be realistically useful for an educational purpose. That is pretty vague, but because of usefulness it ties us to our audience’s needs.

As Geoff mentioned, with his idea of formulating a premise from which all policies can be logically derived, I tried to do this for Commons, too. I came up with a page called Commons:Principles. It was this:

Wikimedia Commons is a collection of useful, free content media files.

It is curated by a worldwide open and multilingual community that shares a commitment to improving the collection.

Wikimedia Commons is a project of the Wikimedia Foundation and this is our contribution, as part of the Wikimedia community, to furthering the Foundation’s vision and mission.

Various words and phrases were linked in the rest of the document to explain it. The idea was that it was a link between the WMF’s Mission and Vision, and our grab-bag of ad-hoc rules. If a rule couldn’t be derived from this document then we shouldn’t really need it, or the document was incomplete. I never pushed this for project-wide adoption so it remains another random page of notes in my userspace. (There are some related notes on another userspace page called Aims and goals.)

Anyway… could Wikipedia have a positive definition? Is one based on “usefulness” and the principle of do-no-harm possible? Wrong and misleading information (or do I mean unsourceable information?) is not useful.

Could we find a way to define Verifiability that was more broad than the traditional definition, including methods such as personal experience, keeping in mind the ideas of usefulness and minimising harm?

Surely we could. Wikipedia has already had to figure out lots of stuff, and we have managed. Although we fall back on traditional encyclopedia ideas, they have no mechanisms for coping with e.g. breaking news or detailed fields of pop culture such as television. We have figured those things out ourselves. We can figure more things out ourselves.

Definitions don’t work in a vacuum, they only work in relation to other things. “Identica is an open-source Twitter.” “Twitter is blogging but only 140 characters at a time.” or “Twitter is SMS but on the web, to everyone.” “Blogging is keeping a diary on the web for the whole world to read.” “Keeping a diary is putting your thoughts into words and sentences.”

Surely Wikipedia (the product) is more than “Encyclopedia Britannica, but on the web”. Don’t get me wrong, “on the web” gives us a lot: links, no real limit on the potential number of articles. But we could do more. We could be more.

So anyway, stuff is broken, how could it be fixed? I mentioned two ways last time, changing policy or essentially having everyone implicitly agree to ignore it (just as, say, much of the populace ignores copyright law on a day to day basis). But these both seem terribly unlikely. There is a third: a fork. Forks are very bad for a project in the short-term — very painful and causing reduced productivity — but in the long-term can either be benign, or even positive, or serve as a death-knell. A successful fork may be a serious wake-up call to a community where previous calls for policy reform have not been successful.

I’m not advocating for a fork at this point… but we should never forget that it is an option available to us. :)

I believe that thinking about the intent of one’s edits is an intended corollary of “Ignore all rules.”Geoff

I strongly agree, but unfortunately I think Ignore all rules is pretty much dead. It has been suffocated by all the rules!

12 October, 2008 •


Wikipedia, the deeply conservative and traditional encyclopedia

Wikipedia, the product, does not aim to be anything new or radical. It aims to be something quite old-fashioned and conservative — a comprehensive secondary reference on all branches of knowledge. An encyclopedia.

Wikipedia, the process, OTOH, is deeply radical. Mass, collaborative, pseudonymous, minimum-standards-free, help-yourself authorship — yep, that looks pretty radical.

On the face of it this observation just makes me think “duh”. But it just crystallised in my mind recently, in discussion with an academic, so I want to capture the things that I think are interesting about this.

The first is that it wasn’t always so. Wikipedians were not always certain about what type of content was appropriate for inclusion in Wikipedia. As time passed, the shared understanding became more distinct, or at least it was clearer that certain things were definitely not appropriate. And thus we have WP:NOT (What Wikipedia is not), which is somehow one of the most interesting policies as it is so heavily relied upon yet still a definition of negativity.

Wikipedia used to contain dictionary definitions. It used to contain tutorial-type material. It used to contain quotes. It used to contain lists. (OK, it still does. It’s a happy mystery to me how they really survived.) Lots of other things that used to be considered OK on Wikipedia were progressively shoved off onto sister projects, other wikis or just forced to find their own place in the web world (deleted). As I wrote in my first blog post, I think it is no accident that the progressively narrower understanding of “what Wikipedia is” has happened parallel with the development and expansion of Wikipedia’s sister projects.

Why did this happen?

My theory is that in the very early days, maybe pre-2004 (which I didn’t personally participate in), no one much was paying attention to Wikipedia. It didn’t matter if someone created an article you had a problem with because chances are you wouldn’t even know about it. Just getting a Wikipedia article in a top 10 search result was a cause for celebration.

By the end of 2005, we had lived through Seigenthaler. This incident especially threw a lot of scrutiny on Wikipedians, who previously had been just happy geeking out and amusing themselves. Seigenthaler was the mainstream media and traditional authorities (academics) demanding to know what the hell Wikipedia thought it was doing and how dare they and just exactly how were they planning to meet the standards of those traditional fonts of knowledge anyway?

I think it panicked the community, and the defensive reaction that was collectively taken was to retreat into the standards and practices of those traditional authorities. Appease them by adopting their methods, deferring to their authority. “Look, we’re not untrustworthy — we’re trying to write the same kinds of documents as you. We have the same ideals about verifiability as you.”

The idea of notability is also very much a product of this time. Academics decide what is worthy of study. Wikipedians mimicked this by becoming gatekeepers to the wiki. This “guideline” is the most obvious mark of a community appealing to established authority. Paper encyclopedias must have inclusion/exclusion criteria because they have limited resources: limited time, limited money (to pay authors), and limited paper/space. But Wikipedia, with no publishing deadline, written by volunteers and provided virtually free over the internet, has none of these excuses.1,2 The notability guideline is the most blatant example of the Wikipedia community retreating into traditional acceptability because we were spooked. Maybe also because we wanted to be respected, respectable? We wanted to be gatekeepers too, and enjoy that decisive feeling of “keeping order” amongst the rabble of the world?

What’s interesting now is that when I see people say “I love Wikipedia”, it’s almost always for all the ways it is not a traditional encyclopedia. The exhaustive coverage. (A thought: Wikipedia appears exhaustive for everything except your favourite niche.) The articles on stuff you couldn’t have imagined. The totally bizarre lists. The obscure details that you know someone very obsessed has left behind — and you’re pleased they bothered.

I’m sure I remember a newspaper article (or Wired?) commending the Wikipedia article Podcast on being the best comprehensive guide to podcasting available anywhere. This was like six months after podcasting really started to take off. The word was still hardly appearing in print, and had almost definitely never appeared in (let alone been the subject of) a book. But this article was full of very poorly sourced information, definitely no book references. This would be something Wikipedia the Traditional Encyclopedia would frown upon. But it was damn useful!

The further Wikipedia’s coverage and cultural reach expands, the more we will have this problem. Academics do not typically consider “everything in the world” deserving of study. Even if they are po-mo pop-culture theorisers, they will draw the line somewhere. The other thing is that there is not enough of them. Wikipedia is writing up the world faster than academia can study, hypothesise, research and publish about it. (And if you’re lucky, in a language you speak, in a journal you can access via a library near you, too. Good luck.) When we tie our respectability to traditional authority by invoking their methods, we must also accept the limitations of those methods: because there are many things that academics will never study, Wikipedia will never cover those topics in a way that is internally consistent and acceptable. I am not just thinking about the trivia of Western life, but more importantly major cultural knowledge that has not happened to have yet fallen under the Western academic radar.

The next most obvious point is that people’s understanding, both readers and editors, of “what Wikipedia is”, may not match what Wikipedia purports to be (ie policy). This is naturally a constant struggle, that plays out over hundreds of pages every day, for either (1) individuals to be persuaded to change their understandings to match existing policy, or (2) people to be persuaded to change policy to match the understandings of individuals. Both are necessary, but (2) is much more difficult, and probably like any great bureaucracy, becoming ever more so.

I guess I think it is a great shame that Wikipedia has doomed itself to such a limited existence by, as I say, falling back on the methods of traditional authority for respectability. But! Just like the Podcast article, it is only limited if everyone carries out the letter of the law (policy). If we use social pressure to encourage the keeping of articles that are non-harmful and useful (if not sourceable), the law may be irrelevant.

Another alternative is that Wikipedia will come to a less strict understanding of itself that is more in line with readers’ expectations and needs. But given that the trend to date has just been tightening the screws, I do feel it is unlikely in the immediate future. If it were to happen — just imagine the relief! We could figure out our own understanding of what we are, based on our own strengths, instead of trying to live up to someone else’s standard for no reason other than that.

[1] Although it is not free for the Wikimedia Foundation to provide, I have never heard anyone suggest enforcing a Notability guideline in order to save them bandwidth!

[2] There is a good argument for instituting some kind of notability criteria in relation to people, in order to avoid harm. But the notability criterion as it stands applies to basically everything.

10 October, 2008 •

Comment [12]

WP:DYK + identi.ca -> enwpdidyouknow

I decided to write a script to convert Wikipedia’s main page Did you know? (DYK) updates into identi.ca friendly messages. The result is enwpdidyouknow. If you use identi.ca, you can subscribe and receive regular DYK goodness. If not, you can still subscribe to the RSS feed, although it will seem pretty weird as it is broken into messages of less than 160 characters. I should make it also produce just a regular atom or RSS feed without the message length limitation.

It’s run for 24 hours now and it seems to be working OK. It updates in batches because that’s how Template:did_you_know is updated (except by humans). When a message has to broken into 2, it posts them virtually together, but it always leaves a 2 minute gap between different messages, to stop flooding a little bit.

I put some info, including my source code, here: http://dyk2identica.modernthings.org/. It’s really rough and ready. No one will be too surprised to hear that by far the hardest bit was figuring out how to correctly parse the wikisyntax. :)

I should probably move it all to the toolserver. I haven’t figured out what license it is yet. Suggestions welcome.

Wikipedia + MediaWiki API + mwclient + enwp.org service + identica API = new article fun :)

18 August, 2008 • , ,

Comment [1]

WikiProjects starting to use pageview stats in article assessment

While bopping around en.wp I noticed Wikipedia:WikiProject Mathematics/Wikipedia 1.0/Frequently viewed/List.

WikiProject Mathematics use the page view stats and a bot to mark article talk pages of their most popular pages.

WikiProject Aviation has started putting together an annotated list, with the monthly and daily views and also the article’s current assessment rating. That kind of table is a great motivator — their fourth most viewed article during June is only “Start” class.

WikiProject Pharmacology made what looks like a one-off report of their articles that get more than 80,000 pageviews per month (there are 43), annotated with GA/FA icons.

WikiProject Human Genetic History just put together a table for February 2008. It’s not clear what if anything they intended to do with it.

Sage Ross wrote in March about how biographies are much more popular than “history of” articles:

For historians who want to reach a broad audience through Wikipedia, putting historical context into biographies and topics of contemporary interest is probably more effective than writing concept-, artifact- or event-based historical articles.

This is great. It would be even better if there was some kind of toolserver thing that could generate reports for WikiProjects (or maybe specific templates/categories). We have the dots… but we don’t connect them very well.

Are there any other WikiProjects using the pageview stats in this way?

03 August, 2008 • ,


Easiest image/video request ever: nodding

Can you find/create free media to illustrate the article Nod ?

Bonus points for the Bulgarian/Sri Lankan “negative nod”!

Extra bonus points if you can do an “acknowledgement nod” into your webcam without cracking up.

28 July, 2008 •

Comment [1]

Y Combinator's "Startup Ideas We'd Like to Fund": "More open alternatives to Wikipedia"

From Y Combinator’s Startup Ideas We’d Like to Fund:

23. More open alternatives to Wikipedia. Deletionists rule Wikipedia. Ironically, they’re constrained by print-era thinking. What harm does it do if an online reference has a long tail of articles that are only interesting to a few people, so long as everyone can still find whatever they’re looking for? There is room to do to Wikipedia what Wikipedia did to Britannica.


“There is room to do to Wikipedia what Wikipedia did to Britannica.” Now that’s a wake-up call if I ever heard one.

Y Combinator are a venture capitalist firm who write a good blog for wannabe startups.

(And in case you’re wondering, their Wikipedia article was never nominated for deletion…)

22 July, 2008 •

Comment [1]

WikiProject so effective, it skews study results

Banksia spinulosa, public domain.

Seriously, how cool is this story?

The paper is Scientific citations in Wikipedia by Finn Årup Nielsen— the paper itself is dual-licensed GFDL and CC-BY-SA — and it analyses the cite journal template uses from the April 2007 database dump. The author compares the prevalence of Wikipedia citations to general scientifier community citations.

The success of WikiProject Banksia causes a noticable outlier:

Original graph

The one circled in red is Australian Systematic Botany.

Australian botany journals received a considerable number of citations…in part due to concerted effort for the genus Banksia, where several Wikipedia articles for Banksia species have reached “featured article” status.

Right now, there are six. Now it’s just a matter of waiting for the “rest” of Wikipedia to catch up.

The number of people working on this project, you can count on one hand and still have fingers left over.

The Banksia gallery on Wikimedia Commons, and category, are also impeccably sorted and organised (and detailed!).

It makes me smile to be able to report this, because it shows how much just a few dedicated souls can achieve, by quietly and steadily busying themselves.

And it’s damn cool. Congratulations, WikiProject Banksia.

15 April, 2008 • ,

Comment [2]

Links for 2008-04-06

Click here to lend your support to: Support the Libre Graphics Meeting and make a donation at www.pledgie.com !

05 April, 2008 • , ,

Comment [2]

[guest] Rethinking the Top Ten

Written by Waldir Pimenta


Some people might not know about the www.wikipedia.org template. That is the page that defines what appears on the main wikipedia portal, www.wikipedia.org. Evidently, the template is protected, and thus it is frequent to see people from wikipedias that reach milestones commenting on its talk page requesting an update. However, there is a draft version that can be edited by anyone. This is something more people should be aware of.

Now comes the cool part.

If we remove all the requests for updates from the template’s talk, some very interesting thoughts show up, in discussions spanning several months and even years. These are proposals that cannot be simply put on the draft page to be later synchronized with the main template, since they would represent big changes that require some discussion first.

One of these proposals is the “top ten rule” discussion. The problem is, when the wikipedia.org portal had first implemented the globe design with the ten wikipedias floating around it, the natural choice was the ten biggest wikipedias at that time. But when the Russian wikipedia started approaching the 100,000 milestone (the sections below the globe only went up to 10,000 at that time), many people started proposing its inclusion on the globe, since would “graduate” from the 10,000 level. But what most people didn’t realize, was that (quoting User:Mxn) “most of the top 10 editions were featured around the logo long before they reached 100,000 articles, so getting to 100,000 isn’t why they’re up there”. The fact that at some point they ended up being the only 100,000+ editions of wikipedia was merely coincidental.

Nevertheless, those discussions about Russian wikipedia, and later the Chinese wikipedia (which led to the creation of the 100,000+ section under the globe) questioned the criteria of size for being featured around the globe (which never had been extensively discussed anyway), and proposed some alternative criteria, thus effectively lauching the seeds for a long-awaited reform.

This is when the Top Ten Wikipedias discussion comes in. By collecting the ideas spread across the huge www.wikipedia.org template talk page and posing them together in a separate page, and providing a table with some actual results for the application of some of those criteria (and of course, some spamming around the village pumps for the biggest ‘pedias), the arena was open for a very productive discussion, which is actually ongoing at this very moment! The times are of change, and excitement is in the air. You could be part of the revolution! Go ahead, be bold and add your comment!



How interesting that the “100,000+ rule” for inclusion on www.wikipedia.org was never originally planned.

The proposal for a new evaluation of what constitutes a “top 10” is very detailed and worth a look, keeping in mind the question: what do you value most about Wikipedia? What factor makes a Wikipedia the most useful? Depending on which factors get favoured, the “top 10” could look extremely different to how it currently does. The question of “what do we value” naturally brings the case of the Volapük Wikipedia to mind (vo.wp scores a prominent text link on this portal, but not top 10 as of yet).

Thankyou to Waldir for taking the time to write this up and share it. —Brianna

04 April, 2008 • ,

Where do users go after the main page?

Thanks to Tim and Domas and Henrik, we can examine page views. Yay, statistics.

I copied all the links from the menus (sidebar and topbar) and got their monthly page view totals for February 2008, and then calculcated their average daily page views.


Side bar:

Top bar:

the [1] is because of unusual access pattern for Portal:Technology and applied sciences which suggests it was only linked from the main page on the 17th February.

Also, the top bar only appears on the main page, whereas the sidebar appears on every page.

Four repeated links may be overkill.

wpmainpagelinks.sxc [6.88units_k]

14 March, 2008 • ,

Comment [5]

The responsibility of Wikipedia in the wider world

Jim Redmond has a post on his blog that almost read my mind, called One thing that Wikipedians often overlook: not everybody gets it:

Most non-Wikipedians still don’t get how Wikipedia works; they still think that its content is centrally controlled.

This is part of the reason this week we saw the SMH report More woes for Wikipedia’s Jimmy Wales, about Jeff Merkey’s claims of “cash for kindness” or donations for Wikipedia article editorial favours.

When Wikipedia was small and ranked on the 10th page of Google results or worse, it didn’t matter so much if a person’s Wikipedia article was full of nonsense. But when your Wikipedia article can rank higher than your official site, you have a problem. That’s the major reason for the English Wikipedia policy, Biographies of living people. I really recommend having a look at it, even if you’re familiar with the acronym.

Biographies of living persons (BLPs) must be written conservatively, with regard for the subject’s privacy. Wikipedia is an encyclopedia, not a tabloid; it is not our job to be sensationalist, or to be the primary vehicle for the spread of titillating claims about people’s lives. An important rule of thumb when writing biographical material about living persons is “do no harm”.

Jimmy Wales has made it clear repeatedly that Zero information is preferred to misleading or false information.

And that is why you might blank a poorly written article about a controversial figure.

It may be hoping too much to ask the general public or the media to understand the purpose and process of OTRS, but it is worth noting that it is a private method of complaining about one’s article. It’s a selection of trusted volunteer editors working together with WMF staff and board (when appropriate) to answer the questions of those who can’t or won’t use a wiki talk page, but can use email.

It is, quite frankly, thankless and largely invisible work. If disputes are resolved successfully, you’ll never hear about it.

As the figurehead for Wikipedia, Jimmy Wales is often approached or written to personally, by people that should actually be writing to OTRS, but the process is too esoteric to figure out. It’s rather like contacting Rupert Murdoch to complain about an article by a staff writer in some random NewsCorp paper, except that Wales takes it on himself to be involved in this resolution process, rather than palming it off to a secretary.

So in blanking Merkey’s article, Wales was actually following the single most ethically serious policy Wikipedia has, showing that Wikipedia is not an anarchy or a free-(libel)-for-all, but a project that takes the responsibility of high web visibility seriously and tries to minimise the negative impact it has on people’s lives.

And while Wales was acting to minimise the harm Wikipedia causes in other people’s lives, the news media shows that when there’s a whiff of controversy, that idea doesn’t apply.

If you had even the vaguest idea about how Wikipedia works, you would surely reject out-of-hand as unlikely if not ridiculous, the idea that Wales would offer editorial favours in exchange for donations. Because he better than anybody knows how impossible that is. The whole article history is right THERE.

But if Wikipedia is just a big black box that somehow produces timely articles, then it is not an unreasonable idea.

Ultimately, recent new stories say to me that while Wikipedia has developed responsible processes over the past couple of years, it has done an extremely poor job at communicating their existence to the outside world. So it’s not enough to be big; we really do have to try and get everyone involved. Only by being a part of it, and understanding how it works, will people know enough to be able to dismiss nonsense claims when they see them.

If Wikipedia was a type of travel, at the moment it’s somewhere between a rocket and a aeroplane, in terms of accessibility and participation and general understanding of how it works. There’s still too much that’s mysterious and seemingly random and magical.
Reading and editing Wikipedia needs to be as familiar as riding a bicycle. Almost everyone can do it, with a few hours practice and maybe some training wheels. No special test or license. You can go anywhere. That’s what Wikipedia needs to be like.

14 March, 2008 • , ,

Comment [5]

Ten possibly provoking thoughts about improving the quality of Swedish Wikipedia

This is the name of an excellent essay by Lennart Guldbrandsson, chair of Wikimedia Sverige (Sweden). You can read the original Swedish or a translated English.

Some of the points are provocative indeed (like point 1, “delete the bad articles”). It is well worth reading to see the perspective of a smaller project, and new ideas on how chapter activities can positively reinforce the online efforts towards greater quality.

10 March, 2008 • ,


Templatology, an essay

Templates are one of MediaWiki’s most versatile features. I was thinking about them recently because of a discussion with other editors about whether a particular template should even exist, and if so, what should its wording be. Templates are a now ubiquitous part of English Wikipedia articles and MediaWiki wikis everywhere, so it may be interesting to look at how they have evolved. (Warning: this is quite long.)

What is a template?

Templates are a feature that provide “boilerplate” text or style, whenever you want to have a standard look or text across more than one page. In MediaWiki, to put a template called “foo” (that is, you would find it in the wiki at [[template:foo]]) on any page, you would put {{foo}}. They can also take “parameters”, or particular values that you can change for each time it is used: {{foo|parameter value 1|parameter value 2}}.

Various types of templates are referred to by other names, including infoboxes, naxboxes, notices and warnings, which more reflect the purpose of those templates.

Another name used is “tag”. When a template is used on a page, it creates a link in the database between the page name and the template. This means one use of templates is to mark pages that you want to group together for some reason. These grouped pages can then be found listed at Special:Whatlinkshere/Template:Foo. If you only wanted to use a template for this grouping purpose, you could make the template so it actually had no visible content. However categories usually make more sense for this purpose.

A history of templates

Templates as we know them today were first introduced in August 2004, MediaWiki v1.3, along with categories and the MonoBook skin still used today. Before this they were in the MediaWiki namespace with the “system messages” or user interface messages. With this move they also got the feature of “parameters”.

The first revision of the Help:Template page on meta was in June 2004 (I suppose by this stage they already had the practice of running the latest MediaWiki version live for Wikimedia sites, rather than the latest release which is typically after). The opening paragraph is now cute:

Templates, or custom messages, have grown from humble beginnings as an afterthought in a localisation feature. They are now used in almost 10% of pages in the English Wikipedia database.

I asked Duesentrieb to run a query like this, and apparently there are 229,686 en.wp main namespace non-redirect pages without templates – a very neat 10%. So from 10% usage to 90% usage in less than four years. Pretty impressive, especially given there is no edict mandating their use.

However, this is actually getting well ahead of ourselves. There is an interesting post from Larry Sanger in May 2001 called Do we need templates ?:

From: “Krzysztof P. Jasiutowicz”
> Do we need templates of pages ?
> Groups of pages – rock bands, biographies, film entries share common
> features and therefore want some kind of templates.
> Pages of the same category edited by different people tend to follow
> sometimes incompatible patterns or disagree with each other.

One of the reasons that Wikipedia works—why it is developing so quickly and is so attractive to contributors (compelling, one might say…) is that anyone can come in and contribute in practically any fashion. Instigating templates has a number of implications for how we might begin to think of Wikipedia: it would become a collection of standardized information rather than a collection of information that people just happen to feel inspired to input. Who is interested in inputting “standardized information”? Maybe some people, but surely not nearly as many as those who are interested in inputting whatever information they know.

Suppose we were to require (somehow) that everyone writing about the countries of the world input the information in exactly the format of the CIA Factbook. Who, honestly, would want to do that? And on the other hand, who would want to contribute a lot of generally accurate, useful information that will eventually add up to weighty, detailed articles, not necessarily all in the same format?

If I finish the quote here we can all enjoy a guffaw about how things have changed. I think his answer to the question Who is interested in inputting “standardized information”? has been shown to be wrong. Empty edit boxes freak people out. Structured stuff where you just fill out a missing bit here or there is much easier to deal with. (This is also why bots have been so successful in “seeding” wikis. It’s much easier to correct something that’s wrong, rather than write a correct paragraph from a blank slate.)

However, a fairer quote would include the following, where Larry clearly recognises that “it’s early days yet”:

Eventually, I suspect, we’re going to have huge amounts of information, and it will be possible for people to go in and render related entries in a similar format. It’s generally better to impose order after creation, in a way that reflects the natural categories of things as information is given. […] [I]n a constantly-growing, constantly-improving encyclopedia, why not just let people add whatever information they want, and when it’s reached a certain level of maturity, only then start imposing some uniformity on the way similar information is presented?

And that seems to be more or less what happened. I’m not great at this online ethnography biz, so I don’t have any other choice quotes from 2001 to 2004, although I expect there was further discussion about templates and their appropriateness.

What’s interesting is how far they’ve spread. While first imagined as kind of article skeleton structure, they’re now just as widely used in all kinds of talk pages, user pages, maintenance and communication tasks.

A taxonomy of templates

There are some broad classes of templates that can be described:

Now into the user realm —

Any other clear classes I missed? (There are a few I can think of which are pretty boring, hence not here.)

Template complexity

This is what you see when you edit the article on the Melbourne suburb of Hawthorn. Note how the template takes up the entire first screen, and it’s not even done! For a newbie it must be pretty bizarre — although frankly this one’s formatted quite well. But if you’re just trying to get into the guts of it (and remember newbies may not know about section editing), it’s quite “WRONG WAY, GO BACK”.

So there is the complexity of templates — and typically these infobox ones — within articles. Maybe one day MediaWiki will get some whiz-bang “template adder” for articles and all that ugly template code won’t appear in the edit box. That would be nice.

Then there is the complexity of trying to edit the templates themselves. This is nothing short of a nightmare. Template syntax is approaching a very ugly programming language, especially if you throw in parser functions. The migration to the new preprocessor (Feb 2008) has shown deeply nested templates all over the place.

I don’t really see a solution to this, unfortunately. People can’t help themselves “improving” stuff. Here is one way things get complex real fast:

  1. There are two or more functions that display different content but in a similar context.
  2. Someone decides to combine in them in a single template that takes a parameter, which says which content to display. The old templates get deleted/redirected.
  3. Helloooo, complexity.

Repeat this a few different times, at a few different levels, in a few different contexts, and suddenly you’ll find it all very difficult to try and untangle.

Convenience becomes necessity

All templates begin life because someone finds it easier to make a boilerplate and post that, rather than posting something longer, and having to look it up each time.

However once a template exists, the expectation soon develops that whenever it is applicable, it should be used, and the plain text equivalent should not. Even if previously, you could take or leave the plain text equivalent.

I don’t know why this happens, but it does — without fail.

Templates in user communication

This is actually the crux of what I intended to write about. :) In my 2007 Wikimania presentation I talked quite a bit about the wording, attitude and intent of the English Wikipedia user talk templates. I complained that the wording was often officious, scolding and impersonal, and they were not likely to encourage people to become part of the community.

In hindsight, maybe I had the wrong idea about them all along. John Broughton says this in Wikipedia:The Missing Manual (my review):

The primary purpose of a warning about vandalism or spam, perhaps counter-intuitively, is not to get the problem editor to change her ways. (It would be nice if they did so, but troublemakers aren’t like [sic] to reform themselves just because someone asked nicely.) Rather, when you and other editors post a series of increasingly strong warnings, you’re building a documented case for blocking a user account from further disruptive editing. If the warning leads to the editor changing his ways before blocking is necessary, great – but don’t hold your breath.

(Yes, the gender did change in the middle of that paragraph. :) Srsly, accept singular they already!)

If this is a widespread attitude, that you have to wait until someone receives a level 4 template before it’s legitimate to block them, then it’s not too surprising that there is so much trouble with “gaming” on en.wp. That IS a game, isn’t it? It’s hard for me to not see that situation as leading to punitive block. It’s certainly not leading to a preventative one!

I guess my problem with user warning templates is I have a feeling they don’t work. I have a feeling they don’t improve a situation. I have a feeling they don’t get read — users don’t pay attention to their content.

If there was evidence that anyone read them, learned something from them, or some situation was averted — that would be nice. [Of course such evidence would be anecdotal. That’s all we have when it comes to user interactions.]

Image deletion notification templates

When an uploaded file is nominated for deletion or is actually deleted, it is commonly considered courtesy to inform the uploader, via a template to their user talk page. If they didn’t receive this, they would have no idea their upload had been deleted until they tried to go look at it, which is a pretty nasty surprise. It’s now quite common to visit a user talk page and see a dozen odd notices about missing information on files. Because they are often placed by bots, many can pile up without a human there to notice, “OK, this person seriously doesn’t get this concept, time for a chat”. This is even more true on Commons.

These templates perform two functions: notification + admonishment. They would be better if they were simplified to a single line and only used for notification. Admonishment is something that should be between two humans.

Templates on Commons

There is one benefit to templates that I cannot ignore on Commons and it is that of translation. Translated templates may mean two users can “communicate” (of a fashion) despite not having any language in common.

Templates are for the benefit of the poster, not the receiver

The benefits are

Just as automated phone answering services are for the benefit of the company, not the caller.

Receiving a form reprimand is patronising. I am not the only one who has this emotional reaction – as Wikipedia has Don’t template the regulars.

It follows from this that templates are patronising to newbies too. I guess the only reason this is considered acceptable is that as they’re newbies, they won’t realise this template is a form response. (Well, except for how it’s totally generically worded, yeah.) So, since we’re all equal ‘n all, go ahead and template the regulars.

(So far there is no essay Don’t template the newbies. Instead, treat everyone equally badly. ;))

It would be very valuable to see an in-person observational study of people’s reactions as they learn to edit Wikipedia, including how they react to templates. Maybe the vast majority appreciate the “official” warning as it gives them some direction. Maybe they really do pay attention to them.

Maybe the problem is not the tool, but the way it’s being used. Maybe the only thing to do is take a sharp knife to the language that is used, and help resist the idea of messages as block precipitators, rather than messages as useful informers and educators.

10 March, 2008 • ,

Comment [5]

Vanity wiki stats

Ben Yates points to Wikipedia article traffic statistics. Guess what? It’s not just articles. You can also use it to see how many times your userpage was viewed.

Verifiability wins!

(Note this tool doesn’t know about redirects, so for accuracy you should check those too and add them all up.)

Now can we get some ordered lists out of this data or what?

05 March, 2008 •

Comment [1]

Wikipedia: the Missing Manual

O’Reilly sent me a copy of Wikipedia: The Missing Manual (also amazon) for review. Really I am a bad person for such a task — they should give it to newbies and encourage them to dive in, see how they go, and then report how they feel about the book. But I guess there is some value in a perspective that learned it the hard way first (or at least, blog buzz).

Is this book needed or necessary? Yes. Wikis are very good at two tasks, at least: writing an encyclopedia and writing documentation. Interestingly, Wikipedia fails massively at the latter. Well, not so much at the writing of it as the organising, culling and simplifying of it.

I suppose it is not helped that policies, guidelines, manual of style, essays and wikiprojects all share the same space. Perhaps it would be useful to create new namespaces for some of these – at least MOS and wikiprojects. Essays could be folded back into user subpages (like userbox templates were). When they are all cited as if they held equivalent weight (I was surprised to learn WP:COOL was only an essay), it makes it extremely difficult to get a grasp on what you’re supposed to know.

Another idea might be to explicitly flag versions of policies and guidelines for “experience”, e.g. everything with a “experience rating 1” would be expected to be read by newbies. “5” would be howtos for bureaucrats, arbcom and Mechanics.

But because devoting oneself to organising and sorting projectspace has bad consequences for encyclopedia involvement, I don’t think it will happen.

On first read I got quite a kick out of seeing the familiar screenshots and policy statements in dead-tree format. Yeah — “we made it”. Chapter 15, on uploading images, was especially dear to my heart as I helped design the current upload forms. (With any luck those screenshots will soon be out of date, actually. A vastly improved JavaScript modified form is in the works.)

We made it all right… Wikipedia is now an institution, there’s no doubt about it. Not looking so radical now.

There are two major omission from this book and one of them is related to this. There is not a single mention of the policy Ignore all rules. That’s right, Wikipedia’s first ever rule doesn’t rate a mention in a book devoted to the minutiae of how to get an enhanced watchlist and get an article deleted. It’s really quite strange. The author John Broughton would undoubtedly be familiar with it, having authored the Editor’s index to Wikipedia. One can only assume he thinks it’s on the way out. Then, Wikipedia will be a much less interesting community.

The other major omission is an explanation or discussion of the concept of free content. “Free content” scores one reference in the book’s index, to a section “Uploading a Non-free Image” in the “Adding Images” chapter. He refers to the WMF licensing policy and says,

Free content is any work that doesn’t require permission or payment for any use, including commercial. At most, free content requires attribution: crediting the person who created the image. Free content also has no restrictions on redistribution of the image by others.

Well, for a start, this is just wrong. Free content can also require ShareAlike use, which is a “restriction on redistribution”.

He then breezes over Wikipedia’s fair use policy. Considering how much trouble people have with it, I think it would be better to cover it thoroughly or not at all. Simply reciting the conditions that must be met is not that enlightening. Better would be a full expanded explanation of the ideas of free software, free culture, freedom for users, copyleft, etc.

Aside from these two gaping holes, I can’t really fault Broughton’s writing, which is refreshingly free of cynicism. If he sometimes belabors a point of process, it is actually a good indication that that process is due for massive simplification. Adding references, for example. He goes into great detail about article deletion nominations; I thought these could all be done more or less by magic JavaScript now? That would seem a much better option to explain, IMO.

The organisation of the book’s content is not bad, although I don’t understand why the appendices “A Tour of the Wikipedia Page” and “Reader’s Guide to Wikipedia” don’t lead the book rather than being hidden at the back. This book would also become 20% cooler if the inside of the covers had the MediaWiki syntax cheatsheet and a list of frequent shortcuts/policies and guidelines printed inside them. That would be so much cooler!

Physically, the book is a little crowded. The pages need to be bigger, or the margins smaller, to allow the many screenshots to take up more space. I am not sure the frequent “note” and “tip” asides wouldn’t be better worked into the main text. (Hey, just like trivia sections!) And unfortunately the binding is cheap. Having finished reading it, my index pages are now falling out. That’s disappointing, but a book like this is not really intended to be a tome for all time anyway, so it’s not that surprising.

Sooner or later I will post my smaller nitpicks to the publisher’s errata page, but they’re just small fry.

There is a pretty nice piece in the New York Times only just about this book – The Charms of Wikipedia. The author is clearly pretty enthralled with Wikipedia. Hey, more power to him. The real test is if this book can convert a Wikipedia skeptic, or maybe tame a troublesome user.

Phoebe Ayers, Charles Matthews, Ben Yates, and SJ Klein (four upstanding Wikipedians all) are working on a book called How Wikipedia Works. (see also meta) Reportedly they will license it under the GFDL. This is excellent news.

I hope Broughton’s book is not only massively successful, but that it inspires a host of measured, high-quality documentation of all the Wikimedia projects, and then some.

03 March, 2008 • ,

Comment [3]

Why Wikipedia doesn't need protecting from the masses

It’s not as though our existing volunteers are abnormally intelligent, or particularly gifted at writing an encyclopedia; they’re just some people who wound up helping. Why does this indicate the population at large is going to be worse? We are the population at large, we just want to get a bigger slice of it.

Andrew Gray (first emphasis is mine, second is his)

This is from a foundation-l thread in November 2007. It’s been rolling around in my head since then, so finally I’m writing it down so it can leave. Being an expert at using and contributing to Wikipedia has little bearing on encyclopedia-article-writing ability.

26 February, 2008 •


Links for 2008-02-21

© skenmy, CC-BY

21 February, 2008 • , , , , , , ,


linux.conf.au LinuxChix miniconf

Woot, today was the LinuxChix miniconf of linux.conf.au (LCA), one of the three big free software related conferences held around the world each year.

I spoke on Wikipedia (duh), giving a kind of second-level introduction aimed at cutting through bureaucracy by explaining what was important and what could wait until later. I always used to think I had to read all the relevant policies and guidelines before I did anything. So I would spend hours pouring over MoS pages and the like before even writing a paragraph.

Later I got much more relaxed about it and figured, correctly, that someone else would clean it up to conform to MoS if it really bothered them that much (and evidently it does, or else it’s easier to make automated changes that relate to formatting than actual content).

In a nice surprise I saw Nick Jenkins, who I didn’t realise was attending LCA. He took notes on my speech and they’re probably better than mine so I recommend reading those. :) You can also read my slides from Wikimedia Commons.

There was lots of video going on and I will link it up whenever I see it published.

Stormy Peters gave a great talk about community managers. As I listened to her talk I realised… I am a community manager. All the things she mentioned are exactly the things I do in Wikimedia, mostly for Wikimedia Commons. How interesting.

Heaps of interesting people at LCA, and interesting talks. In the unlikely event that you are reading this and also attending LCA, come and say hi. It looks like I will be attending a lot (like, six or so) talks relating to multimedia and Ogg and so on. Well if it’s that or kernel hacking… :)

+ Photo from Mary of me musing during my talk. “Is Wikipedia run by Wikia… let me think…”

29 January, 2008 • , ,


Of bots and conlangs: the Volapük Wikipedia

“Vükiped”: logo of
the Volapük Wikipedia

If you are after some good wikidrama reading as you settle in for 2008, it’s hard to go past the current Volapük Wikipedia. This tale is a potent combination of machine translation, bots, minor constructed languages, language advocacy and statistics. At heart it is a tussle over the answers to the questions, “What is Wikipedia?” and “Why do we create Wikipedias?”

I first became aware of the Volapük Wikipedia (vo.wp) in October when I was doing some planning for the Commons Picture of the Year competition, deciding which languages I should push as a priority. I looked at the meta page List of Wikipedias and found there was 15 Wikipedias with over 100,000 articles. That seemed like a neat cut-off point, and so I made my list.

Except, the 15th one was “Volapük”, and I felt more than a little embarrassed that I had never heard of this language before, because I love languages and linguistics…looking further along that table revealed vo.wp had only 5 admins and 250 users… that was a tenth or less the size compared to the others in the top 15 (compared proportionally). What were they doing?

At that time, SmeiraBot had made over 3/4 of the total edits on the entire wiki. So the disproportional growth was thanks to bots.

A month or so beforehand, someone had had some similar realisations to me, and made a proposal to close vo.wp. I commented on that proposal in favour of deleting the vast majority of the bot generated articles. In brief, Smeira’s actions offended my feeling of what Wikipedia was, because there would never be a community to maintain 100,000 articles in this language. Is Wikipedia just a free content encyclopedia, or is it an free content encyclopedia written and maintained by a community? That proposal ended up being closed as Keep. Despite all the heat and light, I doubt many of the commenters actually wanted the entire thing deleted.

Then on Christmas Day, Arnomane made a proposal for a Radical cleanup of Volapük Wikipedia. His proposal was not to close the project but just delete the vast majority of the bot articles. That set off a lengthy thread on foundation-l called A dangerous precedent which is still ongoing.

There are two red herrings that have been floating about in this debate. The first, if people are opposed to this bot bomb then they are opposed to all bot-generated articles. Of course not. Bots have a time and place. Seeding new wikis is certainly a very useful function of bots. But “seeding” provokes the idea that people will be around, a community, to tend to the articles after that. This was a seeding for a wiki bigger than the Romanian Wikipedia. Romanian has 28 million first- or second-language speakers. 28 million people to potentially tend to ro.wp’s 98 736 articles. Volapük has 20. Twenty. Total. vo.wp’s bot generated content is hugely out of proportion to the reality of its speakers.

Why do we create Wikipedias? This is where the “language ego” must come in. I don’t know the right term for it but I’m sure there is one… People want to create a Wikipedia, an encyclopedia, when they feel that their language is one worthy of communicating written knowledge. That is part of the reason why people get so hot under the collar when they get even a hint of a suggestion that someone has said a minority language does not deserve some X the same as other, larger languages. Linguistic rights belong to speakers of natural languages, I think, not constructed languages. If you want to disagree on that point, then OK, but they should definitely not just be swept together as “minority languages” of equal cultural and historical importance to the human race.

Is it OK for Wikipedia to be used as a conlang-promotional experiment if it is shaped like an free content encyclopedia, even one that is virtually doomed to permanent poor quality? That’s not a trick question…

31 December, 2007 • , ,

Comment [12]

Breaking news: German Wikipedia rids the world of sexism!

What an achievement!

…Oh, wait, they just deleted the category.

I look forward to hearing that they have ridden the world of idiocy by similar methods.

13 November, 2007 • ,

Comment [1]

What's hard about Wikipedia?

Child + computer lessons = free knowledge?
(Nevit Dilman, GFDL )

Erik reported some good news to foundation-l recently: WikiEducator has won a grant of US$100,000 for ‘‘the Learning4Content project to assist in building capacity in MediaWiki editing skills for at least 2500 educators in 52 countries of the Commonwealth’‘.

I’m not very familiar with WikiEducator, but they look like WMF might if you dragged everyone away from their computers. I imagine they overlap a fair bit. Maybe it’s like: WMF is all about the content creation, and WikiEducator is about the content distribution.

The full Learning4Content proposal is here.

Luckily Erik has got in their ear – they only want to use CC-BY or CC-BY-SA. :D (see section G)

One of the outcomes is ‘‘The establishment of a community of free content developers.’‘ (I think they mean developers as in editors, rather than coders.) But the main activity that seems like it will lead to this is ‘‘Develop tutorials for Wiki editing[…]’‘ which is reflected in the summary as “MediaWiki editing skills”.

So, what’s hard about Wikipedia? Is it just learning how to use MediaWiki? I don’t think so. That is just the first step, and for the computer-literate, one that is soon passed.

What’s hard?

Although I’ve talked about Wikipedia, these points all apply to all Wikimedia projects, with the possible exception of NPOV.

So I wonder, what else is essential to the Wikimedian culture? Is anything here superfluous?

How well are we doing at sharing these as our values? (Especially given half of them are not explicitly stated)

I wonder if WikiEducator will cover these kinds of things?

28 October, 2007 • , ,

Comment [1]

CaFeConf 2007; unacademic knowledge

CaFeConf 2007 is just finished, and WMF had no less than Wikimedia Argentina’s Patricio Lorente representing. CaFeConf 2007 is the 6th conference of open/free software and GNU/Linux and is held each year in Buenos Aires (at least, as far as I can tell from Google’s translation of the Spanish Wikipedia article – any volunteers for translating it to English? :)).

Patricio’s slides are licensed under the GFDL and there is also video although the sound quality in particular is not too great. I believe his talk about the problems wiki communities face as they grow in size, but since I don’t understand Spanish I can’t tell you the nuances of it.

I was lucky enough to have Patricio attend my Wikimania talk. Lucky, because Patricio is a true believer, passionate and enthusiastic, and interested in the kinds of problems I mentioned in my talk. (And a lovely chap to boot.)

One of the last slides from his talk says this:

Recordar, todo el tiempo, que son
los novatos quienes llegan con
contenido nuevo en su equipaje.
La megalólopis Wikipedia debe
poder recibirlos con la calidez y
comprensión propia de la pequeña

Or, as rendered by Google Translate:

Remember, all the time, which are
Novices who arrive with
New content in his luggage.
The encyclopedia should megalólopis
Able to receive them with warmth and
Own understanding of the small

I suppose this is a poetic restating of WP:BITE, which is just as well, because it never hurts to be reminded why exactly biting newcomers is bad (not just because others are watching). (If you can speak Spanish I’d love to know a more natural translation.)

I did an interview this morning on a friendly morning talk show, your basic “what is Wikipedia, how do you know it’s reliable, WikiScanner/Captain Smirk" deal. At one point they commented on my job title (computational linguist) and said something like, “I suppose that helps with all the wiki stuff.” And I remembered no… Wikipedia is not just for the geeks and the technically literate. Two million articles, big deal. If we really want to accurately represent “the sum of all human knowledge” we need input from all humans, not just the ones who understands 1s and 0s.

I mentioned farming and parenting as two fields that we need more input on. I have a farmer friend and I know he knows a ton of things that are poorly represented in Wikipedia, if at all. Farmers are generally out farming, rather than watching morning TV with a laptop in hand, no surprise there. But I guess in the future there will be more conflict between “knowledge” and “stuff without sources”. The ever-increasing crackdown on the need for citations and reliable sources should make the showdown necessary. Because it is no secret that science and the arts and academia have not studied everything that makes up people’s lives, even in a western country like Australia.

Do I sound anti-sources? I’m not. For a good many topics a reliable sources crackdown is the only way to go. But when otherwise uncontroversial, useful articles get deleted as “non-notable” because there are no possible sources because academia hasn’t come to it yet, I think we are not applying the fifth pillar quite often enough.

If there is no conflict, it could only mean the sources brigade had a victory and the keepers of “unacademic knowledge” left early, defeated. I would consider that a loss.

23 October, 2007 • , ,

Comment [1]

Freebase, Wikipedia and the right to fork

Screenshot of Freebase personal type definition, 'free content collection'

Two nights ago I went to the first Freebase user meeting outside the US. (You can tell I’m setting myself up for a, “I was there when…”)

It was organised by Kirrily Robert, who’s taken enough with her “new crack habit” to set up a specialised blog just for it.

So, what is Freebase? It claims to be a “database of everything”. There are several points of comparison with Wikipedia. Where Wikipedia is an “encyclopedia”, Freebase wants to be “everything”. It is far more structured than Wikipedia (which anyone who’s ever wrangled with an esoteric template might appreciate). Like Wikipedia, it’s a free content project: data derived from Wikipedia is GFDL (natch) and everything else is CC-BY. They have a very excellent and well-documented API — they’re not afraid to share. Bring on the mash-ups!

There are several more differences worth discussing. Currently, Freebase is alpha and invitation-only for write permission (ie an account). No worries, give it time.

More importantly, the back-end. Freebase is built on Metaweb’s closed-source back-end that is going to remain that way. Apparently they intend to release some kind of regular data dump, and even allegedly would have no problem with someone taking that entire data set and throwing it into MySQL or what-have-you and setting up a total project fork.

If it was free software, there would be a right to fork. But this is only free content. Is there any kind of corresponding “right to fork” for a free content community? Should there be?

If not, maybe this joke from Evan about “crowdsourcing” is just a truth:

The other reason that I would wait until I had an entire data dump downloaded on my own disk before really barracking for Freebase is because I read their TOS:


We provide access to portions of the Site and Service through an API; for purposes of this Terms of Service, such access constitutes use of the Site and Service. You agree only to use the API as outlined in documentation provided by us on the Site. You may not use the API or any other features of the Site or Service to duplicate or copy the Site or Service.

Bummer. Although — here’s a thought — I wonder if that conflicts with the CC-BY?

(clause 8.e from CC-BY-3.0)

This License constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You.

It’s not quite viral freedom, but almost as good. It seems to me this nice clause would render their TOS impotent.

So, interesting to see what will happen there. It’s Wiki[p|m]edia that convinced me (and taught me) about the absolutely vital right to fork. That is an incredible freedom which is vastly underappreciated by the journalists who are generally impressed with Wikipedia’s “freeness” (meaning no ads, or free access). And as a project leader, any kind of project, that is what keeps you on your toes. Maybe it is a good benchmark for deciding if you want to be a contributor to a particular project. If management gets too heavy, you can keep them in line by threatening to exercise your right to fork. Yeah!

Back to Freebase… another related, interesting aspect will be watching the development of their community and how it will be managed. Where Wikipedia was pretty grass-roots, it seems like Freebase is top-heavy, for the moment at least. Letting go, giving up control and trusting the unwashed masses is a very difficult psychological moment for anyone (who’s not a Wikimedian). Trying to get those same unwashed masses to behave themselves is a whole other kettle of fish. When I first contemplated this for Freebase two night s ago I was filled with cynicism, until I remembered… The thing about Wikipedia is that it only works in practice. In theory, it can never work.

I should make that my mantra. Every time I get cynical about something, think about that idea again. It only works in practice.

11 October, 2007 • , , , , ,

Comment [1]

Content reuse, a nice deja vu

Patience of the Grates
© CC-BY-SA Flickr’s pinkbelt

I just got into last.fm, a web.20ish site about music, and so have been trying to figure out how to train it up to know what I like. The only way I have figured out is by downloading a program that pays attention to what music I play on my computer. A window pops up with some pictures, tags and an intro bio to each musician as you play. When I played the Grates’ 19-20-20, I couldn’t help thinking the text seemed oddly familiar:

The Grates are a three-piece band from Brisbane, Australia, comprising Patience Hodgson (vocals), John Patterson (guitar) and Alana Skyring (drums). They have been lauded for their catchy songs and enthusiastic and energetic live show (Patience spends much of the show bouncing around, even while singing). They are frequently described as fun: “We just wanna have fun and hope other people do too.” (Patience ). Their sound has been compared to the Ramones, the Yeah Yeah Yeahs and be your own PET. In March 2006 they played at the South by Southwest trade music fair in Texas.

Hmm… I went looking into the Grates’ Wikipedia article history and found my first edit to it. Since then I have only made three other edits to it, one to insert a chart position, one to revert vandalism, and one just today to replace the photo (with the one linked above). The article has had a hundred or so edits by other people in the year and a half since then, but the lead has hardly changed. The first moral of the story is: take the time to write a decent lead, and it can really stick around.

Back to last.fm. The blurb on this window linked to last.fm’s wiki. At the bottom of the article there is a note indeed crediting the text as GFDL and a link to the history. The original was definitely “forked” from Wikipedia, but the attribution is sadly lacking. It’s not too surprising that last.fm users aren’t as anal as Wikipedians about attribution.

I am not too sure if the moral here is that Wikipedians should take a leaf out of the last.fm users’ book (in the spirit of sharing ‘n all that) or vice versa. Unfortunately I think the Wikipedians are fighting a losing battle.

The Grates19-20-20

08 September, 2007 • , , ,


wikimedia commonswikipedialinkscommunitymediawikiconferenceslinux.conf.auwmfcreative commonswikimaniapoty2008australiawikimedia chapterswikimedia australiavideo
(see all tags)

free culture


...& other free content projects

interesting folk