A brief presentation I gave for Melhack last week:
There is a huge amount of rich data in Wikipedia and other MediaWiki collections, naturally, but as there is no API evangelist you have to do a bit of digging to figure this out. Regular readers may recall that I am quite a fan of the API and what it means for reusers.
MediaWiki derives its structure from links, templates and categories. You don’t need to do very much to develop something quite powerful. This site called culturalcartography.net only uses links, for example. They skinned up their MediaWiki and used it to develop the bulk of the site’s content, then wrote their own SVG interface that calls their MediaWiki API and presents the link web in a dynamic and interesting way.
Using Bryan’s quite excellent mwclient Python module, you can login, view a page’s current content, make your change and then view it in less than 10 lines. Really: see for yourself. (My password is not ‘password’ BTW. :)) In theory it could be one line less, since you shouldn’t need to log in. But for some reason mwclient gave me a LoginError when I tried to do it without logging in.
There is probably also nice wrapper modules already written for your favourite language.
What kind of imaginative interfaces for editing Wikipedia can you imagine existing? Now we can build them! :)
I decided to write a script to convert Wikipedia’s main page Did you know? (DYK) updates into identi.ca friendly messages. The result is enwpdidyouknow. If you use identi.ca, you can subscribe and receive regular DYK goodness. If not, you can still subscribe to the RSS feed, although it will seem pretty weird as it is broken into messages of less than 160 characters. I should make it also produce just a regular atom or RSS feed without the message length limitation.
It’s run for 24 hours now and it seems to be working OK. It updates in batches because that’s how Template:did_you_know is updated (except by humans). When a message has to broken into 2, it posts them virtually together, but it always leaves a 2 minute gap between different messages, to stop flooding a little bit.
I put some info, including my source code, here: http://dyk2identica.modernthings.org/. It’s really rough and ready. No one will be too surprised to hear that by far the hardest bit was figuring out how to correctly parse the wikisyntax. :)
I should probably move it all to the toolserver. I haven’t figured out what license it is yet. Suggestions welcome.
Wow. Wikis just gave me another lesson in awesome. I love it.
While thinking about the problem of Zemanta attribution strings, I mused that we really needed to develop a “Commons API”. There is a MediaWiki API for Commons, but there are more project-specific pieces of information we would like to provide. The big three are
- Deletion markers (warning to machines: don’t use this file)
- License info for a given file
- Author/copyright holder attribution string for a given file
So I made a bit of a start at Commons:API, thinking we could use the wiki pages to write psuedocode algorithms for the different problems. Already I knew at least I, Bryan and Duesentrieb had run across these problems before, and definitely others too. Therefore it made sense to combine our individual algorithms together and define a single, strongest-possible-permutation version and recommend others to use it. I imagined we could describe the algorithm in psuedocode and let people provide implementations in various programming languages. Versioning would kind of suck but hey, an imperfect solution is better than nothing.
However, a perfect solution is even better than both! I barely raised the topic when Magnus actually implemented it (warning: seriously alpha).
First, Magnus is one of my wiki-heroes. You could not ask for a more responsive developer, so it is just delightful when he chimes in on a “what if” discussion. Cool new shiny things are never far away. (Surely one of the strangest things to ever grace the toolserver is still Flommons, the “Flickr-like Commons” interface. Cut away the cruft!) And he is a lovely chap to boot. He tirelessly tweaks and prods any number of “what about…” or “why not move this here?” queries.
My pythonfu is not strong enough that I could code something like this up as he does, in half an hour, but I could probably practice and make some effort and manage it in a period of time. I recognise the neat or nifty factor in creating stuff that was previously just a “what if”. Programming rocks.
Secondly, I love how responsive a wiki community can be. Sure, for every five ideas you might have, four will garner a lukewarm response at best, but every now and then one will strike a chord and get some momentum. “Build it and they will come”; wikis can also obey “name it and they will build it”. [Of course, I’m hardly the first person to suggest Commons needs an API.]
Thirdly, thinking about the other Wikimedia projects — and indeed a good many third-party MediaWiki installs — it is obvious that all the projects may like the chance to define their own API. If nothing else, to define the “deletion markers” and the featured content (rather like another of Magnus’ tools, Catfood – category image RSS feed).
So, what does that suggest… that suggests wiki users need a flexible method of defining new parts of the API. Special:DefineAPI? Probably not, too subject to change.
Extensions can define API modules. So perhaps we should develop Extension:WikimediaCommonsAPI? If every project wanted to do this it may get a bit messy, but most projects wouldn’t bother I imagine.
Again we run up against the need for Commons to have a more structured database, rather than just store all information about an image in a big text blob.
At any rate, I hope we can set the current “pre-alpha” API up as a serious toolserver or svn project with multiple contributors. Wikimedia Commons is lucky to have attracted a relatively techy community of contributors, with a number of people MediaWiki- or toolserver- savvy. Let’s see how we go.
- The FLOSS Posse is soon running a Wikiversity course, Composing free and open online educational resources. The course is for “teachers and teacher-students who do not have prior knowledge or skills related to free and open education resources.” Participants will have to write blog posts as part of the course, and their posts will be corralled together at the jaiku channel #oercourse (you don’t need a jaiku account to subscribe to the feed).
- “action=edit” has been added to the MediaWiki API! At the moment it’s only enabled on the test wikipedia, so bot and tool authors should get busy giving it a workout. Very exciting! (via wikitech-l)
(Correction: not enabled on test.wikipedia. try this random testwiki.)
- This is brill: become a Public domain donor!
Two nights ago I went to the first Freebase user meeting outside the US. (You can tell I’m setting myself up for a, “I was there when…”)
So, what is Freebase? It claims to be a “database of everything”. There are several points of comparison with Wikipedia. Where Wikipedia is an “encyclopedia”, Freebase wants to be “everything”. It is far more structured than Wikipedia (which anyone who’s ever wrangled with an esoteric template might appreciate). Like Wikipedia, it’s a free content project: data derived from Wikipedia is GFDL (natch) and everything else is CC-BY. They have a very excellent and well-documented API — they’re not afraid to share. Bring on the mash-ups!
There are several more differences worth discussing. Currently, Freebase is alpha and invitation-only for write permission (ie an account). No worries, give it time.
More importantly, the back-end. Freebase is built on Metaweb’s closed-source back-end that is going to remain that way. Apparently they intend to release some kind of regular data dump, and even allegedly would have no problem with someone taking that entire data set and throwing it into MySQL or what-have-you and setting up a total project fork.
If it was free software, there would be a right to fork. But this is only free content. Is there any kind of corresponding “right to fork” for a free content community? Should there be?
If not, maybe this joke from Evan about “crowdsourcing” is just a truth:
The other reason that I would wait until I had an entire data dump downloaded on my own disk before really barracking for Freebase is because I read their TOS:
5. API USE
We provide access to portions of the Site and Service through an API; for purposes of this Terms of Service, such access constitutes use of the Site and Service. You agree only to use the API as outlined in documentation provided by us on the Site. You may not use the API or any other features of the Site or Service to duplicate or copy the Site or Service.
Bummer. Although — here’s a thought — I wonder if that conflicts with the CC-BY?
(clause 8.e from CC-BY-3.0)
This License constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You.
It’s not quite viral freedom, but almost as good. It seems to me this nice clause would render their TOS impotent.
So, interesting to see what will happen there. It’s Wiki[p|m]edia that convinced me (and taught me) about the absolutely vital right to fork. That is an incredible freedom which is vastly underappreciated by the journalists who are generally impressed with Wikipedia’s “freeness” (meaning no ads, or free access). And as a project leader, any kind of project, that is what keeps you on your toes. Maybe it is a good benchmark for deciding if you want to be a contributor to a particular project. If management gets too heavy, you can keep them in line by threatening to exercise your right to fork. Yeah!
Back to Freebase… another related, interesting aspect will be watching the development of their community and how it will be managed. Where Wikipedia was pretty grass-roots, it seems like Freebase is top-heavy, for the moment at least. Letting go, giving up control and trusting the unwashed masses is a very difficult psychological moment for anyone (who’s not a Wikimedian). Trying to get those same unwashed masses to behave themselves is a whole other kettle of fish. When I first contemplated this for Freebase two night s ago I was filled with cynicism, until I remembered… The thing about Wikipedia is that it only works in practice. In theory, it can never work.
I should make that my mantra. Every time I get cynical about something, think about that idea again. It only works in practice.