Tag results

Y Combinator's "Startup Ideas We'd Like to Fund": "More open alternatives to Wikipedia"

From Y Combinator’s Startup Ideas We’d Like to Fund:

23. More open alternatives to Wikipedia. Deletionists rule Wikipedia. Ironically, they’re constrained by print-era thinking. What harm does it do if an online reference has a long tail of articles that are only interesting to a few people, so long as everyone can still find whatever they’re looking for? There is room to do to Wikipedia what Wikipedia did to Britannica.

Ouch!

“There is room to do to Wikipedia what Wikipedia did to Britannica.” Now that’s a wake-up call if I ever heard one.

Y Combinator are a venture capitalist firm who write a good blog for wannabe startups.

(And in case you’re wondering, their Wikipedia article was never nominated for deletion…)

22 July, 2008 •

Comment [1]

WikiProject so effective, it skews study results


Banksia spinulosa, public domain.

Seriously, how cool is this story?

The paper is Scientific citations in Wikipedia by Finn Årup Nielsen— the paper itself is dual-licensed GFDL and CC-BY-SA — and it analyses the cite journal template uses from the April 2007 database dump. The author compares the prevalence of Wikipedia citations to general scientifier community citations.

The success of WikiProject Banksia causes a noticable outlier:


Original graph

The one circled in red is Australian Systematic Botany.

Australian botany journals received a considerable number of citations…in part due to concerted effort for the genus Banksia, where several Wikipedia articles for Banksia species have reached “featured article” status.

Right now, there are six. Now it’s just a matter of waiting for the “rest” of Wikipedia to catch up.

The number of people working on this project, you can count on one hand and still have fingers left over.

The Banksia gallery on Wikimedia Commons, and category, are also impeccably sorted and organised (and detailed!).

It makes me smile to be able to report this, because it shows how much just a few dedicated souls can achieve, by quietly and steadily busying themselves.

And it’s damn cool. Congratulations, WikiProject Banksia.

15 April, 2008 • ,

Comment [2]

Links for 2008-04-06

Click here to lend your support to: Support the Libre Graphics Meeting and make a donation at www.pledgie.com !

05 April, 2008 • , ,

Comment [2]

[guest] Rethinking the Top Ten

Written by Waldir Pimenta

http://www.wikipedia.org/

Some people might not know about the www.wikipedia.org template. That is the page that defines what appears on the main wikipedia portal, www.wikipedia.org. Evidently, the template is protected, and thus it is frequent to see people from wikipedias that reach milestones commenting on its talk page requesting an update. However, there is a draft version that can be edited by anyone. This is something more people should be aware of.

Now comes the cool part.

If we remove all the requests for updates from the template’s talk, some very interesting thoughts show up, in discussions spanning several months and even years. These are proposals that cannot be simply put on the draft page to be later synchronized with the main template, since they would represent big changes that require some discussion first.

One of these proposals is the “top ten rule” discussion. The problem is, when the wikipedia.org portal had first implemented the globe design with the ten wikipedias floating around it, the natural choice was the ten biggest wikipedias at that time. But when the Russian wikipedia started approaching the 100,000 milestone (the sections below the globe only went up to 10,000 at that time), many people started proposing its inclusion on the globe, since would “graduate” from the 10,000 level. But what most people didn’t realize, was that (quoting User:Mxn) “most of the top 10 editions were featured around the logo long before they reached 100,000 articles, so getting to 100,000 isn’t why they’re up there”. The fact that at some point they ended up being the only 100,000+ editions of wikipedia was merely coincidental.

Nevertheless, those discussions about Russian wikipedia, and later the Chinese wikipedia (which led to the creation of the 100,000+ section under the globe) questioned the criteria of size for being featured around the globe (which never had been extensively discussed anyway), and proposed some alternative criteria, thus effectively lauching the seeds for a long-awaited reform.

This is when the Top Ten Wikipedias discussion comes in. By collecting the ideas spread across the huge www.wikipedia.org template talk page and posing them together in a separate page, and providing a table with some actual results for the application of some of those criteria (and of course, some spamming around the village pumps for the biggest ‘pedias), the arena was open for a very productive discussion, which is actually ongoing at this very moment! The times are of change, and excitement is in the air. You could be part of the revolution! Go ahead, be bold and add your comment!

——

Comment:

How interesting that the “100,000+ rule” for inclusion on www.wikipedia.org was never originally planned.

The proposal for a new evaluation of what constitutes a “top 10” is very detailed and worth a look, keeping in mind the question: what do you value most about Wikipedia? What factor makes a Wikipedia the most useful? Depending on which factors get favoured, the “top 10” could look extremely different to how it currently does. The question of “what do we value” naturally brings the case of the Volapük Wikipedia to mind (vo.wp scores a prominent text link on this portal, but not top 10 as of yet).

Thankyou to Waldir for taking the time to write this up and share it. —Brianna

04 April, 2008 • ,

Where do users go after the main page?

Thanks to Tim and Domas and Henrik, we can examine page views. Yay, statistics.

I copied all the links from the menus (sidebar and topbar) and got their monthly page view totals for February 2008, and then calculcated their average daily page views.

Note:

Side bar:

Top bar:

the [1] is because of unusual access pattern for Portal:Technology and applied sciences which suggests it was only linked from the main page on the 17th February.

Also, the top bar only appears on the main page, whereas the sidebar appears on every page.

Four repeated links may be overkill.

Figures:
wpmainpagelinks.sxc [6.88KB]

14 March, 2008 • ,

Comment [5]

The responsibility of Wikipedia in the wider world

Jim Redmond has a post on his blog that almost read my mind, called One thing that Wikipedians often overlook: not everybody gets it:

Most non-Wikipedians still don’t get how Wikipedia works; they still think that its content is centrally controlled.

This is part of the reason this week we saw the SMH report More woes for Wikipedia’s Jimmy Wales, about Jeff Merkey’s claims of “cash for kindness” or donations for Wikipedia article editorial favours.

When Wikipedia was small and ranked on the 10th page of Google results or worse, it didn’t matter so much if a person’s Wikipedia article was full of nonsense. But when your Wikipedia article can rank higher than your official site, you have a problem. That’s the major reason for the English Wikipedia policy, Biographies of living people. I really recommend having a look at it, even if you’re familiar with the acronym.

Biographies of living persons (BLPs) must be written conservatively, with regard for the subject’s privacy. Wikipedia is an encyclopedia, not a tabloid; it is not our job to be sensationalist, or to be the primary vehicle for the spread of titillating claims about people’s lives. An important rule of thumb when writing biographical material about living persons is “do no harm”.

Jimmy Wales has made it clear repeatedly that Zero information is preferred to misleading or false information.

And that is why you might blank a poorly written article about a controversial figure.

It may be hoping too much to ask the general public or the media to understand the purpose and process of OTRS, but it is worth noting that it is a private method of complaining about one’s article. It’s a selection of trusted volunteer editors working together with WMF staff and board (when appropriate) to answer the questions of those who can’t or won’t use a wiki talk page, but can use email.

It is, quite frankly, thankless and largely invisible work. If disputes are resolved successfully, you’ll never hear about it.

As the figurehead for Wikipedia, Jimmy Wales is often approached or written to personally, by people that should actually be writing to OTRS, but the process is too esoteric to figure out. It’s rather like contacting Rupert Murdoch to complain about an article by a staff writer in some random NewsCorp paper, except that Wales takes it on himself to be involved in this resolution process, rather than palming it off to a secretary.

So in blanking Merkey’s article, Wales was actually following the single most ethically serious policy Wikipedia has, showing that Wikipedia is not an anarchy or a free-(libel)-for-all, but a project that takes the responsibility of high web visibility seriously and tries to minimise the negative impact it has on people’s lives.

And while Wales was acting to minimise the harm Wikipedia causes in other people’s lives, the news media shows that when there’s a whiff of controversy, that idea doesn’t apply.

If you had even the vaguest idea about how Wikipedia works, you would surely reject out-of-hand as unlikely if not ridiculous, the idea that Wales would offer editorial favours in exchange for donations. Because he better than anybody knows how impossible that is. The whole article history is right THERE.

But if Wikipedia is just a big black box that somehow produces timely articles, then it is not an unreasonable idea.

Ultimately, recent new stories say to me that while Wikipedia has developed responsible processes over the past couple of years, it has done an extremely poor job at communicating their existence to the outside world. So it’s not enough to be big; we really do have to try and get everyone involved. Only by being a part of it, and understanding how it works, will people know enough to be able to dismiss nonsense claims when they see them.

If Wikipedia was a type of travel, at the moment it’s somewhere between a rocket and a aeroplane, in terms of accessibility and participation and general understanding of how it works. There’s still too much that’s mysterious and seemingly random and magical.
Reading and editing Wikipedia needs to be as familiar as riding a bicycle. Almost everyone can do it, with a few hours practice and maybe some training wheels. No special test or license. You can go anywhere. That’s what Wikipedia needs to be like.

14 March, 2008 • , ,

Comment [5]

Ten possibly provoking thoughts about improving the quality of Swedish Wikipedia

This is the name of an excellent essay by Lennart Guldbrandsson, chair of Wikimedia Sverige (Sweden). You can read the original Swedish or a translated English.

Some of the points are provocative indeed (like point 1, “delete the bad articles”). It is well worth reading to see the perspective of a smaller project, and new ideas on how chapter activities can positively reinforce the online efforts towards greater quality.

10 March, 2008 • ,

Comment

Templatology, an essay

Templates are one of MediaWiki’s most versatile features. I was thinking about them recently because of a discussion with other editors about whether a particular template should even exist, and if so, what should its wording be. Templates are a now ubiquitous part of English Wikipedia articles and MediaWiki wikis everywhere, so it may be interesting to look at how they have evolved. (Warning: this is quite long.)

What is a template?

Templates are a feature that provide “boilerplate” text or style, whenever you want to have a standard look or text across more than one page. In MediaWiki, to put a template called “foo” (that is, you would find it in the wiki at [[template:foo]]) on any page, you would put {{foo}}. They can also take “parameters”, or particular values that you can change for each time it is used: {{foo|parameter value 1|parameter value 2}}.

Various types of templates are referred to by other names, including infoboxes, naxboxes, notices and warnings, which more reflect the purpose of those templates.

Another name used is “tag”. When a template is used on a page, it creates a link in the database between the page name and the template. This means one use of templates is to mark pages that you want to group together for some reason. These grouped pages can then be found listed at Special:Whatlinkshere/Template:Foo. If you only wanted to use a template for this grouping purpose, you could make the template so it actually had no visible content. However categories usually make more sense for this purpose.

A history of templates

Templates as we know them today were first introduced in August 2004, MediaWiki v1.3, along with categories and the MonoBook skin still used today. Before this they were in the MediaWiki namespace with the “system messages” or user interface messages. With this move they also got the feature of “parameters”.

The first revision of the Help:Template page on meta was in June 2004 (I suppose by this stage they already had the practice of running the latest MediaWiki version live for Wikimedia sites, rather than the latest release which is typically after). The opening paragraph is now cute:

Templates, or custom messages, have grown from humble beginnings as an afterthought in a localisation feature. They are now used in almost 10% of pages in the English Wikipedia database.

I asked Duesentrieb to run a query like this, and apparently there are 229,686 en.wp main namespace non-redirect pages without templates – a very neat 10%. So from 10% usage to 90% usage in less than four years. Pretty impressive, especially given there is no edict mandating their use.

However, this is actually getting well ahead of ourselves. There is an interesting post from Larry Sanger in May 2001 called Do we need templates ?:

From: “Krzysztof P. Jasiutowicz”
> Do we need templates of pages ?
> Groups of pages – rock bands, biographies, film entries share common
> features and therefore want some kind of templates.
> Pages of the same category edited by different people tend to follow
> sometimes incompatible patterns or disagree with each other.

One of the reasons that Wikipedia works—why it is developing so quickly and is so attractive to contributors (compelling, one might say…) is that anyone can come in and contribute in practically any fashion. Instigating templates has a number of implications for how we might begin to think of Wikipedia: it would become a collection of standardized information rather than a collection of information that people just happen to feel inspired to input. Who is interested in inputting “standardized information”? Maybe some people, but surely not nearly as many as those who are interested in inputting whatever information they know.

Suppose we were to require (somehow) that everyone writing about the countries of the world input the information in exactly the format of the CIA Factbook. Who, honestly, would want to do that? And on the other hand, who would want to contribute a lot of generally accurate, useful information that will eventually add up to weighty, detailed articles, not necessarily all in the same format?

If I finish the quote here we can all enjoy a guffaw about how things have changed. I think his answer to the question Who is interested in inputting “standardized information”? has been shown to be wrong. Empty edit boxes freak people out. Structured stuff where you just fill out a missing bit here or there is much easier to deal with. (This is also why bots have been so successful in “seeding” wikis. It’s much easier to correct something that’s wrong, rather than write a correct paragraph from a blank slate.)

However, a fairer quote would include the following, where Larry clearly recognises that “it’s early days yet”:

Eventually, I suspect, we’re going to have huge amounts of information, and it will be possible for people to go in and render related entries in a similar format. It’s generally better to impose order after creation, in a way that reflects the natural categories of things as information is given. […] [I]n a constantly-growing, constantly-improving encyclopedia, why not just let people add whatever information they want, and when it’s reached a certain level of maturity, only then start imposing some uniformity on the way similar information is presented?

And that seems to be more or less what happened. I’m not great at this online ethnography biz, so I don’t have any other choice quotes from 2001 to 2004, although I expect there was further discussion about templates and their appropriateness.

What’s interesting is how far they’ve spread. While first imagined as kind of article skeleton structure, they’re now just as widely used in all kinds of talk pages, user pages, maintenance and communication tasks.

A taxonomy of templates

There are some broad classes of templates that can be described:

Now into the user realm —

Any other clear classes I missed? (There are a few I can think of which are pretty boring, hence not here.)

Template complexity

This is what you see when you edit the article on the Melbourne suburb of Hawthorn. Note how the template takes up the entire first screen, and it’s not even done! For a newbie it must be pretty bizarre — although frankly this one’s formatted quite well. But if you’re just trying to get into the guts of it (and remember newbies may not know about section editing), it’s quite “WRONG WAY, GO BACK”.

So there is the complexity of templates — and typically these infobox ones — within articles. Maybe one day MediaWiki will get some whiz-bang “template adder” for articles and all that ugly template code won’t appear in the edit box. That would be nice.

Then there is the complexity of trying to edit the templates themselves. This is nothing short of a nightmare. Template syntax is approaching a very ugly programming language, especially if you throw in parser functions. The migration to the new preprocessor (Feb 2008) has shown deeply nested templates all over the place.

I don’t really see a solution to this, unfortunately. People can’t help themselves “improving” stuff. Here is one way things get complex real fast:

  1. There are two or more functions that display different content but in a similar context.
  2. Someone decides to combine in them in a single template that takes a parameter, which says which content to display. The old templates get deleted/redirected.
  3. Helloooo, complexity.

Repeat this a few different times, at a few different levels, in a few different contexts, and suddenly you’ll find it all very difficult to try and untangle.

Convenience becomes necessity

All templates begin life because someone finds it easier to make a boilerplate and post that, rather than posting something longer, and having to look it up each time.

However once a template exists, the expectation soon develops that whenever it is applicable, it should be used, and the plain text equivalent should not. Even if previously, you could take or leave the plain text equivalent.

I don’t know why this happens, but it does — without fail.

Templates in user communication

This is actually the crux of what I intended to write about. :) In my 2007 Wikimania presentation I talked quite a bit about the wording, attitude and intent of the English Wikipedia user talk templates. I complained that the wording was often officious, scolding and impersonal, and they were not likely to encourage people to become part of the community.

In hindsight, maybe I had the wrong idea about them all along. John Broughton says this in Wikipedia:The Missing Manual (my review):

The primary purpose of a warning about vandalism or spam, perhaps counter-intuitively, is not to get the problem editor to change her ways. (It would be nice if they did so, but troublemakers aren’t like [sic] to reform themselves just because someone asked nicely.) Rather, when you and other editors post a series of increasingly strong warnings, you’re building a documented case for blocking a user account from further disruptive editing. If the warning leads to the editor changing his ways before blocking is necessary, great – but don’t hold your breath.

(Yes, the gender did change in the middle of that paragraph. :) Srsly, accept singular they already!)

If this is a widespread attitude, that you have to wait until someone receives a level 4 template before it’s legitimate to block them, then it’s not too surprising that there is so much trouble with “gaming” on en.wp. That IS a game, isn’t it? It’s hard for me to not see that situation as leading to punitive block. It’s certainly not leading to a preventative one!

I guess my problem with user warning templates is I have a feeling they don’t work. I have a feeling they don’t improve a situation. I have a feeling they don’t get read — users don’t pay attention to their content.

If there was evidence that anyone read them, learned something from them, or some situation was averted — that would be nice. [Of course such evidence would be anecdotal. That’s all we have when it comes to user interactions.]

Image deletion notification templates

When an uploaded file is nominated for deletion or is actually deleted, it is commonly considered courtesy to inform the uploader, via a template to their user talk page. If they didn’t receive this, they would have no idea their upload had been deleted until they tried to go look at it, which is a pretty nasty surprise. It’s now quite common to visit a user talk page and see a dozen odd notices about missing information on files. Because they are often placed by bots, many can pile up without a human there to notice, “OK, this person seriously doesn’t get this concept, time for a chat”. This is even more true on Commons.

These templates perform two functions: notification + admonishment. They would be better if they were simplified to a single line and only used for notification. Admonishment is something that should be between two humans.

Templates on Commons

There is one benefit to templates that I cannot ignore on Commons and it is that of translation. Translated templates may mean two users can “communicate” (of a fashion) despite not having any language in common.

Templates are for the benefit of the poster, not the receiver

The benefits are

Just as automated phone answering services are for the benefit of the company, not the caller.

Receiving a form reprimand is patronising. I am not the only one who has this emotional reaction – as Wikipedia has Don’t template the regulars.

It follows from this that templates are patronising to newbies too. I guess the only reason this is considered acceptable is that as they’re newbies, they won’t realise this template is a form response. (Well, except for how it’s totally generically worded, yeah.) So, since we’re all equal ‘n all, go ahead and template the regulars.

(So far there is no essay Don’t template the newbies. Instead, treat everyone equally badly. ;))

It would be very valuable to see an in-person observational study of people’s reactions as they learn to edit Wikipedia, including how they react to templates. Maybe the vast majority appreciate the “official” warning as it gives them some direction. Maybe they really do pay attention to them.

Maybe the problem is not the tool, but the way it’s being used. Maybe the only thing to do is take a sharp knife to the language that is used, and help resist the idea of messages as block precipitators, rather than messages as useful informers and educators.

10 March, 2008 • ,

Comment [5]

Vanity wiki stats

Ben Yates points to Wikipedia article traffic statistics. Guess what? It’s not just articles. You can also use it to see how many times your userpage was viewed.

Verifiability wins!

(Note this tool doesn’t know about redirects, so for accuracy you should check those too and add them all up.)

Now can we get some ordered lists out of this data or what?

05 March, 2008 •

Comment [1]

Wikipedia: the Missing Manual

O’Reilly sent me a copy of Wikipedia: The Missing Manual (also amazon) for review. Really I am a bad person for such a task — they should give it to newbies and encourage them to dive in, see how they go, and then report how they feel about the book. But I guess there is some value in a perspective that learned it the hard way first (or at least, blog buzz).

Is this book needed or necessary? Yes. Wikis are very good at two tasks, at least: writing an encyclopedia and writing documentation. Interestingly, Wikipedia fails massively at the latter. Well, not so much at the writing of it as the organising, culling and simplifying of it.

I suppose it is not helped that policies, guidelines, manual of style, essays and wikiprojects all share the same space. Perhaps it would be useful to create new namespaces for some of these – at least MOS and wikiprojects. Essays could be folded back into user subpages (like userbox templates were). When they are all cited as if they held equivalent weight (I was surprised to learn WP:COOL was only an essay), it makes it extremely difficult to get a grasp on what you’re supposed to know.

Another idea might be to explicitly flag versions of policies and guidelines for “experience”, e.g. everything with a “experience rating 1” would be expected to be read by newbies. “5” would be howtos for bureaucrats, arbcom and Mechanics.

But because devoting oneself to organising and sorting projectspace has bad consequences for encyclopedia involvement, I don’t think it will happen.

On first read I got quite a kick out of seeing the familiar screenshots and policy statements in dead-tree format. Yeah — “we made it”. Chapter 15, on uploading images, was especially dear to my heart as I helped design the current upload forms. (With any luck those screenshots will soon be out of date, actually. A vastly improved JavaScript modified form is in the works.)

We made it all right… Wikipedia is now an institution, there’s no doubt about it. Not looking so radical now.

There are two major omission from this book and one of them is related to this. There is not a single mention of the policy Ignore all rules. That’s right, Wikipedia’s first ever rule doesn’t rate a mention in a book devoted to the minutiae of how to get an enhanced watchlist and get an article deleted. It’s really quite strange. The author John Broughton would undoubtedly be familiar with it, having authored the Editor’s index to Wikipedia. One can only assume he thinks it’s on the way out. Then, Wikipedia will be a much less interesting community.

The other major omission is an explanation or discussion of the concept of free content. “Free content” scores one reference in the book’s index, to a section “Uploading a Non-free Image” in the “Adding Images” chapter. He refers to the WMF licensing policy and says,

Free content is any work that doesn’t require permission or payment for any use, including commercial. At most, free content requires attribution: crediting the person who created the image. Free content also has no restrictions on redistribution of the image by others.

Well, for a start, this is just wrong. Free content can also require ShareAlike use, which is a “restriction on redistribution”.

He then breezes over Wikipedia’s fair use policy. Considering how much trouble people have with it, I think it would be better to cover it thoroughly or not at all. Simply reciting the conditions that must be met is not that enlightening. Better would be a full expanded explanation of the ideas of free software, free culture, freedom for users, copyleft, etc.

Aside from these two gaping holes, I can’t really fault Broughton’s writing, which is refreshingly free of cynicism. If he sometimes belabors a point of process, it is actually a good indication that that process is due for massive simplification. Adding references, for example. He goes into great detail about article deletion nominations; I thought these could all be done more or less by magic JavaScript now? That would seem a much better option to explain, IMO.

The organisation of the book’s content is not bad, although I don’t understand why the appendices “A Tour of the Wikipedia Page” and “Reader’s Guide to Wikipedia” don’t lead the book rather than being hidden at the back. This book would also become 20% cooler if the inside of the covers had the MediaWiki syntax cheatsheet and a list of frequent shortcuts/policies and guidelines printed inside them. That would be so much cooler!

Physically, the book is a little crowded. The pages need to be bigger, or the margins smaller, to allow the many screenshots to take up more space. I am not sure the frequent “note” and “tip” asides wouldn’t be better worked into the main text. (Hey, just like trivia sections!) And unfortunately the binding is cheap. Having finished reading it, my index pages are now falling out. That’s disappointing, but a book like this is not really intended to be a tome for all time anyway, so it’s not that surprising.

Sooner or later I will post my smaller nitpicks to the publisher’s errata page, but they’re just small fry.

There is a pretty nice piece in the New York Times only just about this book – The Charms of Wikipedia. The author is clearly pretty enthralled with Wikipedia. Hey, more power to him. The real test is if this book can convert a Wikipedia skeptic, or maybe tame a troublesome user.

Phoebe Ayers, Charles Matthews, Ben Yates, and SJ Klein (four upstanding Wikipedians all) are working on a book called How Wikipedia Works. (see also meta) Reportedly they will license it under the GFDL. This is excellent news.

I hope Broughton’s book is not only massively successful, but that it inspires a host of measured, high-quality documentation of all the Wikimedia projects, and then some.

03 March, 2008 • ,

Comment [3]

Why Wikipedia doesn't need protecting from the masses

It’s not as though our existing volunteers are abnormally intelligent, or particularly gifted at writing an encyclopedia; they’re just some people who wound up helping. Why does this indicate the population at large is going to be worse? We are the population at large, we just want to get a bigger slice of it.

Andrew Gray (first emphasis is mine, second is his)

This is from a foundation-l thread in November 2007. It’s been rolling around in my head since then, so finally I’m writing it down so it can leave. Being an expert at using and contributing to Wikipedia has little bearing on encyclopedia-article-writing ability.

26 February, 2008 •

Comment

Links for 2008-02-21


© skenmy, CC-BY

21 February, 2008 • , , , , , , ,

Comment

linux.conf.au LinuxChix miniconf

Woot, today was the LinuxChix miniconf of linux.conf.au (LCA), one of the three big free software related conferences held around the world each year.

I spoke on Wikipedia (duh), giving a kind of second-level introduction aimed at cutting through bureaucracy by explaining what was important and what could wait until later. I always used to think I had to read all the relevant policies and guidelines before I did anything. So I would spend hours pouring over MoS pages and the like before even writing a paragraph.

Later I got much more relaxed about it and figured, correctly, that someone else would clean it up to conform to MoS if it really bothered them that much (and evidently it does, or else it’s easier to make automated changes that relate to formatting than actual content).

In a nice surprise I saw Nick Jenkins, who I didn’t realise was attending LCA. He took notes on my speech and they’re probably better than mine so I recommend reading those. :) You can also read my slides from Wikimedia Commons.

There was lots of video going on and I will link it up whenever I see it published.

Stormy Peters gave a great talk about community managers. As I listened to her talk I realised… I am a community manager. All the things she mentioned are exactly the things I do in Wikimedia, mostly for Wikimedia Commons. How interesting.

Heaps of interesting people at LCA, and interesting talks. In the unlikely event that you are reading this and also attending LCA, come and say hi. It looks like I will be attending a lot (like, six or so) talks relating to multimedia and Ogg and so on. Well if it’s that or kernel hacking… :)

+ Photo from Mary of me musing during my talk. “Is Wikipedia run by Wikia… let me think…”

29 January, 2008 • , ,

Comment

Of bots and conlangs: the Volapük Wikipedia


“Vükiped”: logo of
the Volapük Wikipedia

If you are after some good wikidrama reading as you settle in for 2008, it’s hard to go past the current Volapük Wikipedia. This tale is a potent combination of machine translation, bots, minor constructed languages, language advocacy and statistics. At heart it is a tussle over the answers to the questions, “What is Wikipedia?” and “Why do we create Wikipedias?”

I first became aware of the Volapük Wikipedia (vo.wp) in October when I was doing some planning for the Commons Picture of the Year competition, deciding which languages I should push as a priority. I looked at the meta page List of Wikipedias and found there was 15 Wikipedias with over 100,000 articles. That seemed like a neat cut-off point, and so I made my list.

Except, the 15th one was “Volapük”, and I felt more than a little embarrassed that I had never heard of this language before, because I love languages and linguistics…looking further along that table revealed vo.wp had only 5 admins and 250 users… that was a tenth or less the size compared to the others in the top 15 (compared proportionally). What were they doing?

At that time, SmeiraBot had made over 3/4 of the total edits on the entire wiki. So the disproportional growth was thanks to bots.

A month or so beforehand, someone had had some similar realisations to me, and made a proposal to close vo.wp. I commented on that proposal in favour of deleting the vast majority of the bot generated articles. In brief, Smeira’s actions offended my feeling of what Wikipedia was, because there would never be a community to maintain 100,000 articles in this language. Is Wikipedia just a free content encyclopedia, or is it an free content encyclopedia written and maintained by a community? That proposal ended up being closed as Keep. Despite all the heat and light, I doubt many of the commenters actually wanted the entire thing deleted.

Then on Christmas Day, Arnomane made a proposal for a Radical cleanup of Volapük Wikipedia. His proposal was not to close the project but just delete the vast majority of the bot articles. That set off a lengthy thread on foundation-l called A dangerous precedent which is still ongoing.

There are two red herrings that have been floating about in this debate. The first, if people are opposed to this bot bomb then they are opposed to all bot-generated articles. Of course not. Bots have a time and place. Seeding new wikis is certainly a very useful function of bots. But “seeding” provokes the idea that people will be around, a community, to tend to the articles after that. This was a seeding for a wiki bigger than the Romanian Wikipedia. Romanian has 28 million first- or second-language speakers. 28 million people to potentially tend to ro.wp’s 98 736 articles. Volapük has 20. Twenty. Total. vo.wp’s bot generated content is hugely out of proportion to the reality of its speakers.

Why do we create Wikipedias? This is where the “language ego” must come in. I don’t know the right term for it but I’m sure there is one… People want to create a Wikipedia, an encyclopedia, when they feel that their language is one worthy of communicating written knowledge. That is part of the reason why people get so hot under the collar when they get even a hint of a suggestion that someone has said a minority language does not deserve some X the same as other, larger languages. Linguistic rights belong to speakers of natural languages, I think, not constructed languages. If you want to disagree on that point, then OK, but they should definitely not just be swept together as “minority languages” of equal cultural and historical importance to the human race.

Is it OK for Wikipedia to be used as a conlang-promotional experiment if it is shaped like an free content encyclopedia, even one that is virtually doomed to permanent poor quality? That’s not a trick question…

31 December, 2007 • , ,

Comment [12]

Breaking news: German Wikipedia rids the world of sexism!

What an achievement!

…Oh, wait, they just deleted the category.

I look forward to hearing that they have ridden the world of idiocy by similar methods.

13 November, 2007 • ,

Comment [1]

What's hard about Wikipedia?


Child + computer lessons = free knowledge?
(Nevit Dilman, GFDL )

Erik reported some good news to foundation-l recently: WikiEducator has won a grant of US$100,000 for ‘‘the Learning4Content project to assist in building capacity in MediaWiki editing skills for at least 2500 educators in 52 countries of the Commonwealth’‘.

I’m not very familiar with WikiEducator, but they look like WMF might if you dragged everyone away from their computers. I imagine they overlap a fair bit. Maybe it’s like: WMF is all about the content creation, and WikiEducator is about the content distribution.

The full Learning4Content proposal is here.

Luckily Erik has got in their ear – they only want to use CC-BY or CC-BY-SA. :D (see section G)

One of the outcomes is ‘‘The establishment of a community of free content developers.’‘ (I think they mean developers as in editors, rather than coders.) But the main activity that seems like it will lead to this is ‘‘Develop tutorials for Wiki editing[…]’‘ which is reflected in the summary as “MediaWiki editing skills”.

So, what’s hard about Wikipedia? Is it just learning how to use MediaWiki? I don’t think so. That is just the first step, and for the computer-literate, one that is soon passed.

What’s hard?

Although I’ve talked about Wikipedia, these points all apply to all Wikimedia projects, with the possible exception of NPOV.

So I wonder, what else is essential to the Wikimedian culture? Is anything here superfluous?

How well are we doing at sharing these as our values? (Especially given half of them are not explicitly stated)

I wonder if WikiEducator will cover these kinds of things?

28 October, 2007 • , ,

Comment [1]

CaFeConf 2007; unacademic knowledge

CaFeConf 2007 is just finished, and WMF had no less than Wikimedia Argentina’s Patricio Lorente representing. CaFeConf 2007 is the 6th conference of open/free software and GNU/Linux and is held each year in Buenos Aires (at least, as far as I can tell from Google’s translation of the Spanish Wikipedia article – any volunteers for translating it to English? :)).

Patricio’s slides are licensed under the GFDL and there is also video although the sound quality in particular is not too great. I believe his talk about the problems wiki communities face as they grow in size, but since I don’t understand Spanish I can’t tell you the nuances of it.

I was lucky enough to have Patricio attend my Wikimania talk. Lucky, because Patricio is a true believer, passionate and enthusiastic, and interested in the kinds of problems I mentioned in my talk. (And a lovely chap to boot.)

One of the last slides from his talk says this:

Recordar, todo el tiempo, que son
los novatos quienes llegan con
contenido nuevo en su equipaje.
La megalólopis Wikipedia debe
poder recibirlos con la calidez y
comprensión propia de la pequeña
aldea.

Or, as rendered by Google Translate:

Remember, all the time, which are
Novices who arrive with
New content in his luggage.
The encyclopedia should megalólopis
Able to receive them with warmth and
Own understanding of the small
Village.

I suppose this is a poetic restating of WP:BITE, which is just as well, because it never hurts to be reminded why exactly biting newcomers is bad (not just because others are watching). (If you can speak Spanish I’d love to know a more natural translation.)

I did an interview this morning on a friendly morning talk show, your basic “what is Wikipedia, how do you know it’s reliable, WikiScanner/Captain Smirk" deal. At one point they commented on my job title (computational linguist) and said something like, “I suppose that helps with all the wiki stuff.” And I remembered no… Wikipedia is not just for the geeks and the technically literate. Two million articles, big deal. If we really want to accurately represent “the sum of all human knowledge” we need input from all humans, not just the ones who understands 1s and 0s.

I mentioned farming and parenting as two fields that we need more input on. I have a farmer friend and I know he knows a ton of things that are poorly represented in Wikipedia, if at all. Farmers are generally out farming, rather than watching morning TV with a laptop in hand, no surprise there. But I guess in the future there will be more conflict between “knowledge” and “stuff without sources”. The ever-increasing crackdown on the need for citations and reliable sources should make the showdown necessary. Because it is no secret that science and the arts and academia have not studied everything that makes up people’s lives, even in a western country like Australia.

Do I sound anti-sources? I’m not. For a good many topics a reliable sources crackdown is the only way to go. But when otherwise uncontroversial, useful articles get deleted as “non-notable” because there are no possible sources because academia hasn’t come to it yet, I think we are not applying the fifth pillar quite often enough.

If there is no conflict, it could only mean the sources brigade had a victory and the keepers of “unacademic knowledge” left early, defeated. I would consider that a loss.

23 October, 2007 • , ,

Comment [1]

Freebase, Wikipedia and the right to fork

Screenshot of Freebase personal type definition, 'free content collection'

Two nights ago I went to the first Freebase user meeting outside the US. (You can tell I’m setting myself up for a, “I was there when…”)

It was organised by Kirrily Robert, who’s taken enough with her “new crack habit” to set up a specialised blog just for it.

So, what is Freebase? It claims to be a “database of everything”. There are several points of comparison with Wikipedia. Where Wikipedia is an “encyclopedia”, Freebase wants to be “everything”. It is far more structured than Wikipedia (which anyone who’s ever wrangled with an esoteric template might appreciate). Like Wikipedia, it’s a free content project: data derived from Wikipedia is GFDL (natch) and everything else is CC-BY. They have a very excellent and well-documented API — they’re not afraid to share. Bring on the mash-ups!

There are several more differences worth discussing. Currently, Freebase is alpha and invitation-only for write permission (ie an account). No worries, give it time.

More importantly, the back-end. Freebase is built on Metaweb’s closed-source back-end that is going to remain that way. Apparently they intend to release some kind of regular data dump, and even allegedly would have no problem with someone taking that entire data set and throwing it into MySQL or what-have-you and setting up a total project fork.

If it was free software, there would be a right to fork. But this is only free content. Is there any kind of corresponding “right to fork” for a free content community? Should there be?

If not, maybe this joke from Evan about “crowdsourcing” is just a truth:

The other reason that I would wait until I had an entire data dump downloaded on my own disk before really barracking for Freebase is because I read their TOS:

5. API USE

We provide access to portions of the Site and Service through an API; for purposes of this Terms of Service, such access constitutes use of the Site and Service. You agree only to use the API as outlined in documentation provided by us on the Site. You may not use the API or any other features of the Site or Service to duplicate or copy the Site or Service.

Bummer. Although — here’s a thought — I wonder if that conflicts with the CC-BY?

(clause 8.e from CC-BY-3.0)

This License constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You.

It’s not quite viral freedom, but almost as good. It seems to me this nice clause would render their TOS impotent.

So, interesting to see what will happen there. It’s Wiki[p|m]edia that convinced me (and taught me) about the absolutely vital right to fork. That is an incredible freedom which is vastly underappreciated by the journalists who are generally impressed with Wikipedia’s “freeness” (meaning no ads, or free access). And as a project leader, any kind of project, that is what keeps you on your toes. Maybe it is a good benchmark for deciding if you want to be a contributor to a particular project. If management gets too heavy, you can keep them in line by threatening to exercise your right to fork. Yeah!

Back to Freebase… another related, interesting aspect will be watching the development of their community and how it will be managed. Where Wikipedia was pretty grass-roots, it seems like Freebase is top-heavy, for the moment at least. Letting go, giving up control and trusting the unwashed masses is a very difficult psychological moment for anyone (who’s not a Wikimedian). Trying to get those same unwashed masses to behave themselves is a whole other kettle of fish. When I first contemplated this for Freebase two night s ago I was filled with cynicism, until I remembered… The thing about Wikipedia is that it only works in practice. In theory, it can never work.

I should make that my mantra. Every time I get cynical about something, think about that idea again. It only works in practice.

11 October, 2007 • , , , , ,

Comment [1]

Content reuse, a nice deja vu

Patience of the Grates
© CC-BY-SA Flickr’s pinkbelt

I just got into last.fm, a web.20ish site about music, and so have been trying to figure out how to train it up to know what I like. The only way I have figured out is by downloading a program that pays attention to what music I play on my computer. A window pops up with some pictures, tags and an intro bio to each musician as you play. When I played the Grates’ 19-20-20, I couldn’t help thinking the text seemed oddly familiar:

The Grates are a three-piece band from Brisbane, Australia, comprising Patience Hodgson (vocals), John Patterson (guitar) and Alana Skyring (drums). They have been lauded for their catchy songs and enthusiastic and energetic live show (Patience spends much of the show bouncing around, even while singing). They are frequently described as fun: “We just wanna have fun and hope other people do too.” (Patience ). Their sound has been compared to the Ramones, the Yeah Yeah Yeahs and be your own PET. In March 2006 they played at the South by Southwest trade music fair in Texas.

Hmm… I went looking into the Grates’ Wikipedia article history and found my first edit to it. Since then I have only made three other edits to it, one to insert a chart position, one to revert vandalism, and one just today to replace the photo (with the one linked above). The article has had a hundred or so edits by other people in the year and a half since then, but the lead has hardly changed. The first moral of the story is: take the time to write a decent lead, and it can really stick around.

Back to last.fm. The blurb on this window linked to last.fm’s wiki. At the bottom of the article there is a note indeed crediting the text as GFDL and a link to the history. The original was definitely “forked” from Wikipedia, but the attribution is sadly lacking. It’s not too surprising that last.fm users aren’t as anal as Wikipedians about attribution.

I am not too sure if the moral here is that Wikipedians should take a leaf out of the last.fm users’ book (in the spirit of sharing ‘n all that) or vice versa. Unfortunately I think the Wikipedians are fighting a losing battle.

The Grates19-20-20

08 September, 2007 • , , ,

Comment

wikimedia commonswikipedialinkscommunitycreative commonswmfconferenceswikimaniaflickrlinux.conf.aumediawiki
(see all tags)

free culture

wikimedia...

...& other free content projects

interesting folk