Tag results

APIs: Ask, and ye shall receive

Wow. Wikis just gave me another lesson in awesome. I love it.

While thinking about the problem of Zemanta attribution strings, I mused that we really needed to develop a “Commons API”. There is a MediaWiki API for Commons, but there are more project-specific pieces of information we would like to provide. The big three are

  1. Deletion markers (warning to machines: don’t use this file)
  2. License info for a given file
  3. Author/copyright holder attribution string for a given file

So I made a bit of a start at Commons:API, thinking we could use the wiki pages to write psuedocode algorithms for the different problems. Already I knew at least I, Bryan and Duesentrieb had run across these problems before, and definitely others too. Therefore it made sense to combine our individual algorithms together and define a single, strongest-possible-permutation version and recommend others to use it. I imagined we could describe the algorithm in psuedocode and let people provide implementations in various programming languages. Versioning would kind of suck but hey, an imperfect solution is better than nothing.

However, a perfect solution is even better than both! I barely raised the topic when Magnus actually implemented it (warning: seriously alpha).

First, Magnus is one of my wiki-heroes. You could not ask for a more responsive developer, so it is just delightful when he chimes in on a “what if” discussion. Cool new shiny things are never far away. (Surely one of the strangest things to ever grace the toolserver is still Flommons, the “Flickr-like Commons” interface. Cut away the cruft!) And he is a lovely chap to boot. He tirelessly tweaks and prods any number of “what about…” or “why not move this here?” queries.

My pythonfu is not strong enough that I could code something like this up as he does, in half an hour, but I could probably practice and make some effort and manage it in a period of time. I recognise the neat or nifty factor in creating stuff that was previously just a “what if”. Programming rocks.

Secondly, I love how responsive a wiki community can be. Sure, for every five ideas you might have, four will garner a lukewarm response at best, but every now and then one will strike a chord and get some momentum. “Build it and they will come”; wikis can also obey “name it and they will build it”. [Of course, I’m hardly the first person to suggest Commons needs an API.]

Thirdly, thinking about the other Wikimedia projects — and indeed a good many third-party MediaWiki installs — it is obvious that all the projects may like the chance to define their own API. If nothing else, to define the “deletion markers” and the featured content (rather like another of Magnus’ tools, Catfood – category image RSS feed).

So, what does that suggest… that suggests wiki users need a flexible method of defining new parts of the API. Special:DefineAPI? Probably not, too subject to change.

Extensions can define API modules. So perhaps we should develop Extension:WikimediaCommonsAPI? If every project wanted to do this it may get a bit messy, but most projects wouldn’t bother I imagine.

Again we run up against the need for Commons to have a more structured database, rather than just store all information about an image in a big text blob.

At any rate, I hope we can set the current “pre-alpha” API up as a serious toolserver or svn project with multiple contributors. Wikimedia Commons is lucky to have attracted a relatively techy community of contributors, with a number of people MediaWiki- or toolserver- savvy. Let’s see how we go.

01 April, 2008 • , ,

Comment [1]

Links for 2008-03-18

This is a preview of what the Commons upload form may look like one of these days… if I have anything to do with it :)

Things to note:

I love this form :) Try it yourself, if you’re logged in at Commons.

18 March, 2008 • , , ,

Comment [4]

Templatology, an essay

Templates are one of MediaWiki’s most versatile features. I was thinking about them recently because of a discussion with other editors about whether a particular template should even exist, and if so, what should its wording be. Templates are a now ubiquitous part of English Wikipedia articles and MediaWiki wikis everywhere, so it may be interesting to look at how they have evolved. (Warning: this is quite long.)

What is a template?

Templates are a feature that provide “boilerplate” text or style, whenever you want to have a standard look or text across more than one page. In MediaWiki, to put a template called “foo” (that is, you would find it in the wiki at [[template:foo]]) on any page, you would put {{foo}}. They can also take “parameters”, or particular values that you can change for each time it is used: {{foo|parameter value 1|parameter value 2}}.

Various types of templates are referred to by other names, including infoboxes, naxboxes, notices and warnings, which more reflect the purpose of those templates.

Another name used is “tag”. When a template is used on a page, it creates a link in the database between the page name and the template. This means one use of templates is to mark pages that you want to group together for some reason. These grouped pages can then be found listed at Special:Whatlinkshere/Template:Foo. If you only wanted to use a template for this grouping purpose, you could make the template so it actually had no visible content. However categories usually make more sense for this purpose.

A history of templates

Templates as we know them today were first introduced in August 2004, MediaWiki v1.3, along with categories and the MonoBook skin still used today. Before this they were in the MediaWiki namespace with the “system messages” or user interface messages. With this move they also got the feature of “parameters”.

The first revision of the Help:Template page on meta was in June 2004 (I suppose by this stage they already had the practice of running the latest MediaWiki version live for Wikimedia sites, rather than the latest release which is typically after). The opening paragraph is now cute:

Templates, or custom messages, have grown from humble beginnings as an afterthought in a localisation feature. They are now used in almost 10% of pages in the English Wikipedia database.

I asked Duesentrieb to run a query like this, and apparently there are 229,686 en.wp main namespace non-redirect pages without templates – a very neat 10%. So from 10% usage to 90% usage in less than four years. Pretty impressive, especially given there is no edict mandating their use.

However, this is actually getting well ahead of ourselves. There is an interesting post from Larry Sanger in May 2001 called Do we need templates ?:

From: “Krzysztof P. Jasiutowicz”
> Do we need templates of pages ?
> Groups of pages – rock bands, biographies, film entries share common
> features and therefore want some kind of templates.
> Pages of the same category edited by different people tend to follow
> sometimes incompatible patterns or disagree with each other.

One of the reasons that Wikipedia works—why it is developing so quickly and is so attractive to contributors (compelling, one might say…) is that anyone can come in and contribute in practically any fashion. Instigating templates has a number of implications for how we might begin to think of Wikipedia: it would become a collection of standardized information rather than a collection of information that people just happen to feel inspired to input. Who is interested in inputting “standardized information”? Maybe some people, but surely not nearly as many as those who are interested in inputting whatever information they know.

Suppose we were to require (somehow) that everyone writing about the countries of the world input the information in exactly the format of the CIA Factbook. Who, honestly, would want to do that? And on the other hand, who would want to contribute a lot of generally accurate, useful information that will eventually add up to weighty, detailed articles, not necessarily all in the same format?

If I finish the quote here we can all enjoy a guffaw about how things have changed. I think his answer to the question Who is interested in inputting “standardized information”? has been shown to be wrong. Empty edit boxes freak people out. Structured stuff where you just fill out a missing bit here or there is much easier to deal with. (This is also why bots have been so successful in “seeding” wikis. It’s much easier to correct something that’s wrong, rather than write a correct paragraph from a blank slate.)

However, a fairer quote would include the following, where Larry clearly recognises that “it’s early days yet”:

Eventually, I suspect, we’re going to have huge amounts of information, and it will be possible for people to go in and render related entries in a similar format. It’s generally better to impose order after creation, in a way that reflects the natural categories of things as information is given. […] [I]n a constantly-growing, constantly-improving encyclopedia, why not just let people add whatever information they want, and when it’s reached a certain level of maturity, only then start imposing some uniformity on the way similar information is presented?

And that seems to be more or less what happened. I’m not great at this online ethnography biz, so I don’t have any other choice quotes from 2001 to 2004, although I expect there was further discussion about templates and their appropriateness.

What’s interesting is how far they’ve spread. While first imagined as kind of article skeleton structure, they’re now just as widely used in all kinds of talk pages, user pages, maintenance and communication tasks.

A taxonomy of templates

There are some broad classes of templates that can be described:

Now into the user realm —

Any other clear classes I missed? (There are a few I can think of which are pretty boring, hence not here.)

Template complexity

This is what you see when you edit the article on the Melbourne suburb of Hawthorn. Note how the template takes up the entire first screen, and it’s not even done! For a newbie it must be pretty bizarre — although frankly this one’s formatted quite well. But if you’re just trying to get into the guts of it (and remember newbies may not know about section editing), it’s quite “WRONG WAY, GO BACK”.

So there is the complexity of templates — and typically these infobox ones — within articles. Maybe one day MediaWiki will get some whiz-bang “template adder” for articles and all that ugly template code won’t appear in the edit box. That would be nice.

Then there is the complexity of trying to edit the templates themselves. This is nothing short of a nightmare. Template syntax is approaching a very ugly programming language, especially if you throw in parser functions. The migration to the new preprocessor (Feb 2008) has shown deeply nested templates all over the place.

I don’t really see a solution to this, unfortunately. People can’t help themselves “improving” stuff. Here is one way things get complex real fast:

  1. There are two or more functions that display different content but in a similar context.
  2. Someone decides to combine in them in a single template that takes a parameter, which says which content to display. The old templates get deleted/redirected.
  3. Helloooo, complexity.

Repeat this a few different times, at a few different levels, in a few different contexts, and suddenly you’ll find it all very difficult to try and untangle.

Convenience becomes necessity

All templates begin life because someone finds it easier to make a boilerplate and post that, rather than posting something longer, and having to look it up each time.

However once a template exists, the expectation soon develops that whenever it is applicable, it should be used, and the plain text equivalent should not. Even if previously, you could take or leave the plain text equivalent.

I don’t know why this happens, but it does — without fail.

Templates in user communication

This is actually the crux of what I intended to write about. :) In my 2007 Wikimania presentation I talked quite a bit about the wording, attitude and intent of the English Wikipedia user talk templates. I complained that the wording was often officious, scolding and impersonal, and they were not likely to encourage people to become part of the community.

In hindsight, maybe I had the wrong idea about them all along. John Broughton says this in Wikipedia:The Missing Manual (my review):

The primary purpose of a warning about vandalism or spam, perhaps counter-intuitively, is not to get the problem editor to change her ways. (It would be nice if they did so, but troublemakers aren’t like [sic] to reform themselves just because someone asked nicely.) Rather, when you and other editors post a series of increasingly strong warnings, you’re building a documented case for blocking a user account from further disruptive editing. If the warning leads to the editor changing his ways before blocking is necessary, great – but don’t hold your breath.

(Yes, the gender did change in the middle of that paragraph. :) Srsly, accept singular they already!)

If this is a widespread attitude, that you have to wait until someone receives a level 4 template before it’s legitimate to block them, then it’s not too surprising that there is so much trouble with “gaming” on en.wp. That IS a game, isn’t it? It’s hard for me to not see that situation as leading to punitive block. It’s certainly not leading to a preventative one!

I guess my problem with user warning templates is I have a feeling they don’t work. I have a feeling they don’t improve a situation. I have a feeling they don’t get read — users don’t pay attention to their content.

If there was evidence that anyone read them, learned something from them, or some situation was averted — that would be nice. [Of course such evidence would be anecdotal. That’s all we have when it comes to user interactions.]

Image deletion notification templates

When an uploaded file is nominated for deletion or is actually deleted, it is commonly considered courtesy to inform the uploader, via a template to their user talk page. If they didn’t receive this, they would have no idea their upload had been deleted until they tried to go look at it, which is a pretty nasty surprise. It’s now quite common to visit a user talk page and see a dozen odd notices about missing information on files. Because they are often placed by bots, many can pile up without a human there to notice, “OK, this person seriously doesn’t get this concept, time for a chat”. This is even more true on Commons.

These templates perform two functions: notification + admonishment. They would be better if they were simplified to a single line and only used for notification. Admonishment is something that should be between two humans.

Templates on Commons

There is one benefit to templates that I cannot ignore on Commons and it is that of translation. Translated templates may mean two users can “communicate” (of a fashion) despite not having any language in common.

Templates are for the benefit of the poster, not the receiver

The benefits are

Just as automated phone answering services are for the benefit of the company, not the caller.

Receiving a form reprimand is patronising. I am not the only one who has this emotional reaction – as Wikipedia has Don’t template the regulars.

It follows from this that templates are patronising to newbies too. I guess the only reason this is considered acceptable is that as they’re newbies, they won’t realise this template is a form response. (Well, except for how it’s totally generically worded, yeah.) So, since we’re all equal ‘n all, go ahead and template the regulars.

(So far there is no essay Don’t template the newbies. Instead, treat everyone equally badly. ;))

It would be very valuable to see an in-person observational study of people’s reactions as they learn to edit Wikipedia, including how they react to templates. Maybe the vast majority appreciate the “official” warning as it gives them some direction. Maybe they really do pay attention to them.

Maybe the problem is not the tool, but the way it’s being used. Maybe the only thing to do is take a sharp knife to the language that is used, and help resist the idea of messages as block precipitators, rather than messages as useful informers and educators.

10 March, 2008 • ,

Comment [5]

Links for 2008-03-04

(Correction: not enabled on test.wikipedia. try this random testwiki.)

IMG_1474

(via cc-au)

04 March, 2008 • , , , ,

Comment

WMF is hiring

The Wikimedia Foundation is hiring: Software Developer / IT Support.

According to the organisation chart there’s room for one more dev, presumably more experienced than this position.

Yay devs :)

26 January, 2008 • ,

Comment

Library of Congress & Flickr: that should have been us

Some big news this week is a deal between the Library of Congress and Flickr in something they’re calling The Commons, “ The Library of Congress Pilot Project”. LoC says:

We are offering two sets of digitized photos: the 1,600 color images from the Farm Security Administration/Office of War Information and about 1,500 images from the George Grantham Bain News Service. Why these photos? They have long been popular with visitors to the Library; they have no known restrictions on publication or distribution, and they have high resolution scans. We look forward to learning what kinds of tags and comments these images inspire.

This is a great initiative on their behalf. As a public institution they should be applauded for seeking to make their collections more accessible and more useful. They are indeed a leading example for other cultural institutions to look to and hopefully take inspiration from.

It’s also a very smart move on Flickr’s behalf. It inspires warm fuzzy “public good” feelings, and let’s face it, Flickr does have the best interface for social image management, and tagging is awesome fun.

But when I read this announcement I had a bit of a feeling of being stopped in my tracks. Library of Congress and Flickr? Why wasn’t it Library of Congress & Wikimedia?

Wikimedia Commons users have long recognised the value of the LoC’s collections and there are literally thousands of their images hosted on Commons.

Sharp-eyed Lupo also reminded me of this piece in the Wikipedia Signpost, July 2006:

Wikimedia Foundation representatives met this week with officials from two major institutions regarding the issue of access to archival materials. The United States Library of Congress has expressed interest in including Wikipedia content as part of its archive collection, while also indicating that it could make a sizable amount of its own material available for use on Wikimedia projects. […]

Wikimedia interim executive director Brad Patrick, accompanied by Danny Wool, Kat Walsh, and Gregory Maxwell, met with representatives from the Library of Congress this week to discuss sharing information, sources, and media. The Library, one of the largest and most comprehensive in the world, has offered access to nearly 40 terabytes (approximately 10 million items) of digital information. “That there would be a moment’s hesitation to cooperate fully with the Library of Congress is beyond my comprehension,” said Patrick. “I’m glad that we are moving in this direction.”

Indeed… so what happened in the last eighteen months?

Brad Patrick and Danny Wool have left as staff; Kat Walsh is now on the WMF Board (I’m not sure if she was then), and Danny and Greg are still active within Wikimedia even if not as much as they once were. So not all of the connections from that time have moved on. But whatever they were thinking might happen clearly didn’t happen.

It’s disappointing that we weren’t able to make this happen. More importantly, I hope we will be able to pull our shit together and not miss such opportunities in the future.

There are three aspects:

One is on the organisational side, in terms of positioning ourselves as the partner for these kinds of ventures, public-interest and smart in collectively managing huge media sets. I don’t know how we’re doing on that front. It looks like 18 months ago we weren’t so great at following through, but at lot can and I imagine has changed in those 18 months.

The second is the software side, where we are not the best prospect. Right now Flickr probably does have a better set-up. I can only repeat my request that WMF hire more software developers and put some priority on functionality relating to media-management. It may take a year or two of serious improvements before we provide anywhere near the kind of usability that Flickr does.

The third is the community side, in terms of do Wikimedians welcome these kind of ventures. And for once this is actually the easy part. For Wikimedia Commons I feel pretty confident in saying we would rejoice to receive this kind of news.

It is a bit of a kick up the proverbial.

18 January, 2008 • , , ,

Comment [8]

Top 10 software extensions Wikimedia Commons needs in 2008


D-I-Y: © Cburnett, GFDL

The end of the year is typically a time for reflection and planning. Planning is much easier than reflection :) so here’s my list of the top ten MediaWiki extensions that Wikimedia Commons (hereafter, just “Commons”) needs.

#0. SUL

Ah, SUL. No acronym brings wry grimaces to the face of a Wikimedian better than this one, and perhaps no issue better demonstrates the consequences of the Wikimedia Foundation’s shoestring budget. Bug 57, Single user login, unified login, CentralAuth — whatever you call it, it should mean that an account created at any Wikimedia wiki allows one to log in at any Wikimedia wiki. Promised since at least 2006, you can currently take part in the testing at test.wikipedia, so there is progress. The full spec is on meta at Help:Unified login.

Why is this relevant to Commons? Because it’s the most likely wiki that editors are likely to use after their “home wiki”. SUL can reasonably be expected to indirectly promote uploading at Commons for Wikimedians, as another barrier to doing so is removed. (Spare a thought especially for the Spanish and Portuguese Wikipedians, who have disabled local uploads entirely.)

#1. Image search

AKA inbuilt Mayflower. Mayflower exists, is open source, and rocks the socks of everyone who uses it. All that’s needed is for some bright spark to specialpage-ize it, and then a little fairy dust to have that as the default search engine/page to be used within Commons.

If you need to be convinced, it’s easy: default MediaWiki search vs Mayflower

#2. Multilingual categories/tagging

Commons is a multilingual project, but since category redirects don’t work as desired, any given category can only work if everyone uses the same name. The category needs to “work”, so that a visitor can go there and expect to find all the media relevant to that concept. But the redirect/alias bizzo also needs to “work”, so a user can tag /categorise their files using their native language.

The urgency of this task is the great shame of a multilingual project having to enforce a single language description on its users. Seriously uncool.

#3. Rating system

Someone did contact me about making progress in coding this up, but I haven’t heard a progress report lately, so it’s definitely something I need to follow up. As Commons grows, it becomes the case that for any given query there may be dozens or even hundreds of relevant files. So having a rank-by-quality or rank-by-average-rating option in the search engine can make a dramatic improvement to the search results.

People love rating stuff, so hey, free data on quality. Off the top of my head I can’t think of any other image database that has a rank-by-rating option but I would be pretty surprised if no one had done it yet.

#4. SVG editing as text
#5. SVG display – pick language labels
#6. SVG display – animated SVGs

These three are naturally related, and arise from the project that’s currently occupying my thoughts (if not my time). SVGs are like the wiki version of an image, as I recently said, because they are so easy to edit. You can open a SVG in a text editor and twiddle with it and save it, and you’ve got a brand new SVG.

But, that’s kind of annoying if MediaWiki wants you to download the file first, and then re-upload it again. Instead, it would make more sense to be able to edit an SVG in a wiki page — exactly the same in fact. Have the edits be recorded in the image history just like text page revisions. It would be a little bit tricky because you would still want to retain the ability to upload a new version of the file, but is surely doable.

From there, it should only be a small hop-step-and-a-leap to a special page extension that allowed one to easily translate text labels inside SVG diagrams.

Take for example this diagram of a biycle. There are currently five different files: Bicycle diagram-en.svg, Bicycle diagram-es.svg, Bicycle diagram-fi.svg, Bicycle diagram2-fr.svg, Bicycle diagram-pl.svg. But there is no need to have five different files. Instead, it would be better to condense all the labels within a single document and extend the image syntax to allow something like this:

[[Image:Bicycle diagram.svg|thumb|language=en]]

So the main part of this request is for the image syntax extension. A tool could be hacked up fairly easily for easy label-translating on the toolserver, I think, although of course it would be preferable within MediaWiki natively.

Lastly, animated SVGs! They’re possible, although admittedly I’ve only ever seen one in existence. GIFs are just so crappy. :( Would be awesome to be bleeding edge on this one.

#7. Gallery preview

Gallery preview exists as JavaScript (and you can install it on Commons now via Special:Preferences > Gadgets), but it would be great to have as a default behaviour. It’s just so nifty! And it encourages browsing around more than the category links at the bottom of the page, I think.

#8. InstantCommons

The idea of InstantCommons is to let any MediaWiki wiki use Commons media as easily and transparently as the Wikimedia wikis do — that is, ‘‘as if the media were uploaded locally’‘. Such a feature would be of immediate interest to Wikitravel and Wikia, both non-Wikimedia projects, and really be a huge leap forward in Commons’ success at sharing free content.

Current status is unknown, but there’s some code in SVN.

#9. Native CheckUsage

As a consequence of #8, this one becomes much more pressing. CheckUsage exists on the toolserver and tells you in which projects a Commons image is being used. Indirectly that tells you how many people you’re likely to piss off if you delete the image without delinking it first.

This is basic necessary functionality for the Commons community. It would be like if you removed the ability to unblock users. We have the ability to do the damage, but we also need the ability to survey the scene and minimise it. So, it is an uncomfortable situation that we rely on half-hacked-up tools for such a critical task, and it would only be moreso the case if InstantCommons was enabled.

#10. ImportFreeImages

This one’s a gimme. The extension exists, it’s already had a decent workout on Wikia, all that’s needed is some code review and a switch-flick.

ImportFreeImages allows the user to search Flickr and transfer an image locally all within MediaWiki. Because Flickr enables Creative Commons licensing, it is a major source of freely-licensed media. But there are two problems. One is that Flickr also allows non-free licensing, so we have major headaches in teaching people the fine distinction between tiny icons. The second is just that it’s annoying to have to manually save the image locally, upload it again, make sure you copy all the relevant author info and so on, and asking people to do that leaves a lot of room for mistakes.

So ImportFreeImages saves all those problems, and because you can restrict which licensed-images you want it to show from Flickr, you can solve the licensing confusion as well. It acts as a filter on Flickr, and just makes the whole thing a breeze for the user. So — awesome.

There are at least 55,000 images from Flickr in Commons at the moment. (Around 2.5% of the total.) It’s common enough, and causes enough confusion, that the community has built a plethora of tools to try and make it easier:

(The main reason Commons instituted the flickrreview system was because Flickr lets people change their licenses without any kind of historical display, which is seriously uncool, as far as trying to figure out if a stated license was ever valid goes.)

If we had ImportFreeImages, we could more or less forbid people from manually uploading Flickr images, and goodbye Flickr hassle!

So that’s my list. There are other things that I want, such as structured data, but I don’t really see it as likely to happen by the end of 2008. 2010, maybe. These ones all seem within reach (OK, with the exception of #2, but you have to dream big, right?). If there’s anything you think I missed, drop me a note and let’s hear it.

20 December, 2007 • ,

Comment

wikimedia commonswikipedialinkscommunitycreative commonswmfconferenceswikimaniaflickrlinux.conf.aumediawiki
(see all tags)

free culture

wikimedia...

...& other free content projects

interesting folk