Community-curated works (CCW)

I am pleased that my post on an alternative term for ‘user-generated content’ has gained a little traction. The AboutUs.org (corrected) blog highlighted my post which prompted Josh Bancroft to agree. Well, at least we’re in violent agreement that “user-generated content” needs to go, but the ideal term to replace it has yet to be revealed, perhaps.

Jeff Moriarty says CCW “sounds like like way a museum cares for its relics… things that the normal person cannot touch”. (I guess that must be from the “curated” rather than the “community”.)

Mike Mathews also dislikes “curated”:

I do have a problem with “curated” as well, it means more “overseen”, “assisted” or “managed” instead of created. Brent [previous commenter] suggested “donated” or “invested”, both good suggestions. I’d like to add for consideration: “written”, “created”, “developed”, “produced”, “prepared”.

I contemplated “created” when I first tried to think of a replacement term. But for me “create” doesn’t capture the essence of how wikis work best. Wikis work best over longer scales of time, by allowing — encouraging — continuous incremental improvement. Certainly you can jump in and make dramatic changes (and most Wikipedia articles benefit from this every so often, and maybe Wikibooks too), but most stable, reliable, high-quality content comes from 95% tweaking and 5% dramatic changes. On other wikis like Wikimedia Commons, Wiktionary or Wikisource, dramatic changes are almost never appropriate.

Wikisource in fact demonstrates how well the wiki model and methods work with very little “creation” to speak of at all, since it’s solely about adding value to existing works. Other wikis like fan wikis or documentation wikis would rather be scraping into the “creative” domain by my reckoning.

“Developed” and “prepared” would be OK by me, except they are fairly empty terms. “Curated” seems to be too austere — how about “maintained”?

Or, any new suggestions from left-field? I am not wedded to CCW, only to finding an alternative to “user-generated content” for describing the things that wikis produce.

31 August, 2008 •

An alternative term for "User-generated content"

For a long time I have not liked the phrase User-generated content to describe what Wikimedia is doing. There are just three problems with this phrase: the words “user”, “generated” and “content”.


“Treat me as a person, not some user, consumer, addict, shallow person defined by your brand or some other form of low life.” (— Rishad Tobaccowala)

The problem is that the discourse on trends in online media still clings to the language of “us” [publishers] and “them,” [users] when it is all about the breakdown of that distinction.

Death of the User

“User” is software-speak from a block diagram. I’m not just someone who you let create an account on your website.


I understand crowdsourcing as kind of an industrial age, corporatist framing of a cultural phenomenon. There’s human energy being expended here. A company can look at that as either a threat — to their copyrights and intellectual property or as some unwanted form of competition — or, if they see it positively, then they see it as almost this new affinity group population to be exploited as a resource. (— Douglas Rushkoff)

In other words, to look at what we’re doing and only see that we’re “generating” “content” for you — generate, an only slightly less mechanical synonym for ‘create’ than manufacture — is to almost entirely miss the point.


[U]sing [‘content’] as a noun to describe written and other works of authorship is worth avoiding. That usage adopts a specific attitude towards those works: that they are an interchangeable commodity whose purpose is to fill a box and make money. In effect, it treats the works themselves with disrespect. (— Richard Stallman)

Actually, there’s only one problem at root: the attitude which leads one to choose these words. That attitude is one from the corporate world. That the best term they could come up with was “user-generated content” shows what a limited understanding the business world has of what it is we’re doing. And why should we settle for the best term THEY can come up with?

Clay Shirky suggests “Indigeneous Content”, which is fairly problematic all on its own.
The best alternative I know of is Participatory culture. But beyond that, I think it’s also worth distinguishing between the works that wikis produce vs other “user-generated content”.

Community-curated works is the best term I’ve come up with so far, to describe what the Wikimedia movement is creating, and what most other wiki communities are creating too. The individual contribution is not what’s important, it’s not what makes everything work — it’s the fact that we have a community of contributors who implicitly agree to work together, to collaborate, to try and constantly improve the content.

There seems to be intentional collaboration and incidental collaboration. Wikis are almost entirely intentional collaboration. With intentional collaboration you can directly affect other people’s contributions. With incidental collaboration, the derived value is due to some software intervention in the middle, e.g. Amazon’s recommendations.

Look at the use of tags/categories. On Flickr and del.icio.us, everyone just uses whatever tags they want in whatever manner, and there is no attempt to (or even an idea they should) try to standardise their usage. On wikis it’s a very active notion: only one label for one concept. (LibraryThing has an interesting hybrid thing happening, because individuals put their own tags as they like, but it’s then possible that they can be grouped with other tags to be considered synonymous.)

I also put Freebase as a hybrid beast because I’m not sure how much interaction and influence there is, or will be, between people. I know if you want you can use Freebase as your own personal database and not worry about trying to make your data useful to others, but I suspect it’s going to slide more and more towards the wiki-like “community-curated” side of things.

So that’s it, really — I’m happy because I figured out a decent replacement for “user-generated content”, and I can now consign it with “crowdsourcing” in the ‘bad biz-speak’ bin.

02 August, 2008 •

The Wikipedia of metaphors

Apparently Wikipedia has been well-known enough to provide a conceptual basis for people to understand other new-fangled concepts since at least September 2005. Loose Wire blog called It’s the Wikipedia of… the “new cliche”. They cite a dozen odd examples of the phrase “it’s the Wikipedia of X”. In most cases, X was referring to a website, often (but not always) a wiki, that was intending to be authoritative or all-encompassing (excellent coverage) on a particular topic. So Wikipedia is the Wikipedia of, well, everything, but for specific domains there may be a better Wikipedia. Any conceivable thing you could think about X, you can find it in the Wikipedia of X. One particularly incongruous example was that of describing the New York Times as “the Wikipedia of newspapers”. …

Really, you don’t gain much from this comparison that is missing from “the encyclopedia of X”, except that it is online, maybe.

Cogmap apparently claims to be “the Wikipedia of organisational charts”. Here the comparison doesn’t rely on coverage but on the use of a wiki applied to a different type of content: instead of an encyclopedia article, an organisational chart. Many other wikis use this comparison, since presumably knowing what Wikipedia is, is easier than knowing what a wiki is. Wikileaks also relies on this in saying their mission is to develop “an uncensorable Wikipedia”, where clearly they are not actually developing an uncensorable wiki encyclopedia.

Aaron Swartz calls Open Library “an attempt to create a Wikipedia for books”. (Eh, is this not Wikisource?) Besides comprehensiveness, he may also be borrowing the concepts of volunteer-driven and “openness/freedom”, although I suspect this aspect of Wikipedia is little-appreciated by the general public so it may be lost.

Let’s see…

By comparison to Wikipedia, Britannica should be something old-fashioned, whose business model has been destroyed a participatory web-based equivalent.

18 March, 2008 •

Of bots and conlangs: the Volapük Wikipedia

“Vükiped”: logo of
the Volapük Wikipedia

If you are after some good wikidrama reading as you settle in for 2008, it’s hard to go past the current Volapük Wikipedia. This tale is a potent combination of machine translation, bots, minor constructed languages, language advocacy and statistics. At heart it is a tussle over the answers to the questions, “What is Wikipedia?” and “Why do we create Wikipedias?”

I first became aware of the Volapük Wikipedia (vo.wp) in October when I was doing some planning for the Commons Picture of the Year competition, deciding which languages I should push as a priority. I looked at the meta page List of Wikipedias and found there was 15 Wikipedias with over 100,000 articles. That seemed like a neat cut-off point, and so I made my list.

Except, the 15th one was “Volapük”, and I felt more than a little embarrassed that I had never heard of this language before, because I love languages and linguistics…looking further along that table revealed vo.wp had only 5 admins and 250 users… that was a tenth or less the size compared to the others in the top 15 (compared proportionally). What were they doing?

At that time, SmeiraBot had made over 3/4 of the total edits on the entire wiki. So the disproportional growth was thanks to bots.

A month or so beforehand, someone had had some similar realisations to me, and made a proposal to close vo.wp. I commented on that proposal in favour of deleting the vast majority of the bot generated articles. In brief, Smeira’s actions offended my feeling of what Wikipedia was, because there would never be a community to maintain 100,000 articles in this language. Is Wikipedia just a free content encyclopedia, or is it an free content encyclopedia written and maintained by a community? That proposal ended up being closed as Keep. Despite all the heat and light, I doubt many of the commenters actually wanted the entire thing deleted.

Then on Christmas Day, Arnomane made a proposal for a Radical cleanup of Volapük Wikipedia. His proposal was not to close the project but just delete the vast majority of the bot articles. That set off a lengthy thread on foundation-l called A dangerous precedent which is still ongoing.

There are two red herrings that have been floating about in this debate. The first, if people are opposed to this bot bomb then they are opposed to all bot-generated articles. Of course not. Bots have a time and place. Seeding new wikis is certainly a very useful function of bots. But “seeding” provokes the idea that people will be around, a community, to tend to the articles after that. This was a seeding for a wiki bigger than the Romanian Wikipedia. Romanian has 28 million first- or second-language speakers. 28 million people to potentially tend to ro.wp’s 98 736 articles. Volapük has 20. Twenty. Total. vo.wp’s bot generated content is hugely out of proportion to the reality of its speakers.

Why do we create Wikipedias? This is where the “language ego” must come in. I don’t know the right term for it but I’m sure there is one… People want to create a Wikipedia, an encyclopedia, when they feel that their language is one worthy of communicating written knowledge. That is part of the reason why people get so hot under the collar when they get even a hint of a suggestion that someone has said a minority language does not deserve some X the same as other, larger languages. Linguistic rights belong to speakers of natural languages, I think, not constructed languages. If you want to disagree on that point, then OK, but they should definitely not just be swept together as “minority languages” of equal cultural and historical importance to the human race.

Is it OK for Wikipedia to be used as a conlang-promotional experiment if it is shaped like an free content encyclopedia, even one that is virtually doomed to permanent poor quality? That’s not a trick question…

31 December, 2007 • , ,

PacLing 2007

Today I attended PacLing2007, the 10th Conference of the Pacific Association for Computational Linguistics. I attended sessions on Named Entities, Lexical Semantics, Machine Translation and Terminology. There was also an invited talk by Ann Copestake on applying robust semantics. She had a neat example of how underspecification works, in solving Sudoku, and how you can make inferences from something underspecified. Well it’s easy with sudoku, I wonder how easy it is with language. :)

There were two main interesting points to me. The first is that Francis Bond, the Program Chair, asked all the presenters to license their papers under the Creative Commons Attribution 3.0 license, and they did. All of the papers from the conference program are available under this liberal license. (The webpage doesn’t say so, but each paper’s PDF has this as a footnote on the first page.) I think this is a fantastic forward-thinking and commendable move on behalf of PacLing. It acknowledges that all human knowledge builds on what came before.

The second thing that was interesting was the session Bridging the Gap: Thai – Thai Sign Language Machine Translation , although in the end it was not perhaps a terribly exciting MT system. I was curious about how TSL was represented. Apparently they have a big dictionary of Thai word <-> photograph of someone making the equivalent TSL sign(s). Given that movement is a meaningful part of sign language I wonder how well this works. I am not sure now if the presenter told me that they slice up a video of the movement into frames to represent it, or if I imagined that. :)

I spoke to the presenter (I think it was Srisavakon Dangsaart) afterwards about signwriting, which she had heard of. She seemed to indicate it wasn’t used for TSL. I asked if it couldn’t be useful for TSL ‘speakers’ to be able to write using it. Her MT system is definitely useful and cool, but it’s basically one way: not really possible for TSL ‘speakers’ to create sentences using photographs of people making signs. She said it would mean they would have to learn three languages: TSL, signwriting, and written Thai (to communicate with the rest of the population). I don’t disagree, but I imagine it would be easier to learn to write Thai given literacy first in signwriting, which I presume would be an order of magnitude easier to acquire over any phonemic representation of a language (such as an alphabet-based script, which Thai is). That would be a fertile area for research I imagine.

20 September, 2007 • , , ,


