The coming challenges for Wikimedia Commons

This is thought to be the three millionth file for Wikimedia Commons. We celebrated two million files on October 9, 2007. That was exactly a month after English Wikipedia reached two million articles. Currently, en.wp contains 2.47 million articles. (En.wp is three years older than Commons, but the difference in absolute numbers is not so surprising. Probably most articles can be illustrated in some way, and many in a number of different ways.)

So, Commons is growing fast. Adding over 100,000 files per month. That’s well over 3,000 files every single day.

3M is a friendly warning. It’s another nice milestone but most of our current contributors probably were around for 2M, and a good chunk of them are likely to be around for 4M. The part that makes me wary is the rapid growth we’re facing, and how prepared we are to not only cope with that, but make the most of it.

In my talk at Wikimania, I finished with what I saw as the four big challenges facing Wikimedia Commons in the near future:

  1. Usability
  2. Scope/censorship
  3. Project relationships
  4. Partnerships.

Partnerships:

There is an excellent post at the Powerhouse Museum’s “Fresh and New(er)” blog about their experience with three months of contributing to Flickr’s “The Commons” project. Compared to the kind of great statistics (and, it must be said, control) that Flickr offers, Wikimedia Commons offers zip. Although we are taking a lot of excellent content from Flickr, we currently give those authors and institutions next to no information about how we use their image. (In fact for a good long time we didn’t even have a regular practice of leaving the author a comment to let them know we’d copied their image.)

Of course, according to the licenses we accept (PD, CC-BY, CC-BY-SA) we don’t have to notify the authors or collection owners of anything. But it would be damn nice if we could. It might go a long way in making instutions feel more comfortable upon discovering their content let loose in Wikimedia Commons.

Which is why we need to create and offer institutional stats reports. What we do after that, I’m not too sure, but it’s a start.

Project relationships:

Here I mean the relationship between the Commons community and each community of each other Wikimedia wiki. If I ever see another project get so annoyed that they threaten to (or actually do) tell their users to stop uploading to Commons and just upload locally, then I will consider that the Commons community has massively failed. Luckily I haven’t seen too much sign of this for a while. I think perhaps we are much better integrated with other communities now than we used to be, and that will definitely continue with SUL (Single User Login). SUL should mean more participation from non-regular users, I think, so it will definitely be interesting to compare the activity levels on Commons 6 months before and 6 months after SUL.

I would like to see an automatic move-to-Commons functionality in MediaWiki. I would like to see more communities choosing to turn off local uploads. (As a listener in my talk pointed out, that probably means more communities need to choose to reject fair use.)

Usability:

If you talk to me for long about Commons, you will soon find out I have a list of feature and bug requests as long as my arm. Major functionality required includes file rename capability, increased file size limit, bulk uploading, format transcoding on upload, global Whatlinkshere, search improvements (incl. specific template fields), category improvements, i18n improvements, write/upload via API, integrated workflow processes, integrated feeds and geo-info. Yeah, a bunch of stuff!

(I am glossing over here how tech improvements are going to make sorting and searching a trillion times better, somehow…)

Now having worked a bit on the most recent incarnation of our upload form’s interface, and now having used it a bit, I’m sad to say: it still pretty much sucks. It’s still way too freaking long and picky, with requirements that are annoying when all you want to do is upload.

So, um, it’s not all just lack-of-tech that makes the Commons user experience pretty disappointing. It’s also the Commons community effort (and this definitely includes me) to desperately request every conceivable useful piece of information right at upload.

I think there’s a few reasons for this:

  1. It’s hard to correct mistakes, so we try to pre-empt them with warnings. (e.g. renaming files, hence lots of warnings about choosing a descriptive filename)
  2. It’s not as straight-forward to add information after uploading as it is at upload-time. (Afterwards, you have to deal with template syntax, rather than a nice database-like form. This generally reflects the Commons community’s wish to have an underlying database-like structure rather than free-form text.)
  3. The Commons community is obsessive and loves metadata. (This results in a rich experience for the later image browser, but a generally poor one for the image uploader. Unfortunately we lack good methods of obtaining such data apart from asking for it.)
  4. The Commons community is vigilant against copyright infringements and copyright fraud. Having your content deleted is about the most crushing thing that can happen to you on a wiki, so we try to annoy you so much that you don’t upload in the first place… unless you’re really, really sure.

Yeah, see how that’s not a great strategy, there?

So our usability problems boil down to two things: Missing functionality, and the balance of discouraging copyright violations against ease of access.

If we were able to handle copyvio deletion more agilely (maybe each file has a tab “flag as copyvio”? and then admins can view that queue of flagged files?), it may be that we would feel comfortable lowering the barrier of ease of access.

Or, alternately, we grow ever more anal with upload requirements, and piss everyone off to the high heavens. :)

Anyway it looks like Commoners generally suffer from the same instruction creep as Wikipedians. That poor Uploadtext

Scope/censorship:

Saving the most difficult for last.

Just as Wikipedia has WP:IS and WP:NOT, Commons has COM:SCOPE. The Project scope describes the boundaries of acceptable content for Wikimedia Commons.

The problem is, those boundaries tend to be pretty freaking wide, when you want to serve media for an encyclopedia in every single language, and textbooks and courseworks and dictionary definitions and all the rest of it, oh and mind your historical shortsightedness (what is pop-junk today may be sociology gold in 5, 10, 50 or 100 years).

If I read the Village pump, I see concerns raised regularly that Commons is not exactly being used in the way it was intended. Let me elaborate.

And according to wikistics (thankyou Melancholie), the most popular category, by page views so far this month, is Vulva (NSFW). The most popular image (that hasn’t been deleted) is Vagina,Anus,Pereneum-Detail.jpg (NSFW).

OK, these pages are graphic. Is it such a shock that they’re so popular? Is it necessarily a bad thing?

Although for each individual image you could probably mostly construct a more-or-less encyclopedic use for it, taken as a context-less whole, it’s hardly surprising that many users are, well, taken aback. Is this what was envisaged when the project was started?

And yet we are hardly going to delete sex- and nudity-related content wholesale (you’ll never be allowed to forget that Wikipedia is not censored!!) — so where and how can we draw a line?

After my Wikimania talk, one listener essentially suggested we introduce an adult-content filter. After thinking about it for a while, I think this may be a very good compromise. Such a filter amounts to a “warning” (or probably an account preference) before viewing potentially shocking/NSFW content.

If you have different opinions about what the major challenges facing Commons over the next 3-5 years will be (and their potential solutions), I’d love to hear them.

23 July, 2008 •

Comment

1

On the scope and censorship point, putting an adult content warning up isn't a bad way to go. I'm a Commons user, and a staunch anti-censorship person. But I also don't want Commons to devolve in to an open source porn portal.

A relevant example might be the other wiki I work with, AboutUs.org. AboutUs is a wiki guide to the internet with pages for 11 million websites. Obviously that's going to include a fair share of porn sites. The solution we use is to flag all the adult pages and put them in a walled garden where you have to log in to view them. "Log in to view" would hurt Commons too much, so that's not an option. But some sort of warning before you enter might be nice.

Steven Walling · 23. July 2008, 05:53

2

I’m warming up to a click-through warning (incidentally, this is what Flickr does). But I still don’t know that it’s even right to be concerned about it. Wikimedia is all about the success of the long tail, right. So our educational POTD is never more popular than graphic pictures of genitalia; well, does it even matter?

Maybe logically it doesn’t matter, but I’m still having trouble shaking the vague feeling of disturbedness.

pfctdayelise · 23. July 2008, 22:48

3

I’m not too surprised that our nudes are popular, though I admit the degree of their relative popularity is a bit surprising.

I agree that click-through, in and of itself, isn’t too troublesome, although identifying which images deserve to be thus ‘protected’ would likely be rather nightmarish.

Granted that we could come to some agreement as to which images should be protected, I am still uncertain of the value of doing so. The benefit of adding such a click-through warning, it seems to me, would be to prevent people from accidentally seeing images that they wouldn’t want to see. However, I imagine that those images are popular precisely because people did want to see them, i.e. a very low percentage of people would actually find the click-through warnings useful.

Unless we suspect that a decent percentage of people are seeing these images by mistake (could it even be as much as 5%? I doubt it), I don’t believe that the benefits of having these warnings will outweigh the drawbacks.

Tracy Poff · 24. July 2008, 12:23

4

Regarding the uploader and your comment of "I'm sad to say: it still pretty much sucks."

Has anyone taken a look at the newest Flickr Uploader?

http://www.flickr.com/tools/uploadr/

It is written on top of Mozilla, so it is all open-source and cross-platform (Win, Mac, Lin). If you had someone savvy, you could repurpose that application for Commons.

Gen Kanai · 24. July 2008, 13:18

5

Gen, the Flickr Uploader says Mac OS -- so does that mean Linux as well? I didn't think they were exactly compatible?

Also, there is no write/upload capability in the MediaWiki API, so writing a tool for mass uploading is still a pretty prone-to-breaking process. Nonetheless there are two so far: Commonist and Commonplace .

Anyway upload tools are nice, but the default web-based interface still needs to not suck since it's most people's starting point.

pfctdayelise · 25. July 2008, 00:41

Elsewhere on the web...

Commenting is closed for this article.

list of all posts, ever

find articles by tag

monthly archive

most popular articles

  1. [guest] Rethinking the Top Ten
  2. How to use Gmail to manage high-traffic mailing lists
  3. An alternative term for "User-generated content"
  4. NLA Innovative Ideas Forum audio/video now available
  5. Write API enabled on Wikimedia sites!
  6. Top 10 software extensions Wikimedia Commons needs in 2008
  7. Is mass collaboration all it's cracked up to be?
  8. GLAM-WIKI, day one
  9. Free MediaWiki hosting offered by Dreamhost Apps
  10. Reflections on PGIP phase 1

(from the last 30 days)