Tag results

Wiki[mp]edia data sources & the MediaWiki API

A brief presentation I gave for Melhack last week:

Wiki[mp]edia data sources & the MediaWiki API
View more documents from Brianna Laugher.

I wrote a bit on my techiturn blog about what I worked on in my 24 hour hack.

There is a huge amount of rich data in Wikipedia and other MediaWiki collections, naturally, but as there is no API evangelist you have to do a bit of digging to figure this out. Regular readers may recall that I am quite a fan of the API and what it means for reusers.

13 November, 2009 • , , ,

Comment [1]

The prettiest MediaWiki you've ever seen

MediaWiki derives its structure from links, templates and categories. You don’t need to do very much to develop something quite powerful. This site called culturalcartography.net only uses links, for example. They skinned up their MediaWiki and used it to develop the bulk of the site’s content, then wrote their own SVG interface that calls their MediaWiki API and presents the link web in a dynamic and interesting way.

I found this out from a paper called Building an SVG interface to MediaWiki (full paper) at SVG Open, which is currently on in Nuremberg.

Pretty skin for MediaWiki.

The edit box: see, it’s really MediaWiki underneath!

The custom-built SVG interface view that calls on the MediaWiki page of the same name, showing the link web between this page and others in the same wiki.

28 August, 2008 • , , ,

Comment [5]

Write API enabled on Wikimedia sites!

Brion announced that the MediaWiki’s ‘write’ API has been enabled for Wikimedia wikis. This means you can now edit Wikipedia and her friends without opening your browser. :)

Using Bryan’s quite excellent mwclient Python module, you can login, view a page’s current content, make your change and then view it in less than 10 lines. Really: see for yourself. (My password is not ‘password’ BTW. :)) In theory it could be one line less, since you shouldn’t need to log in. But for some reason mwclient gave me a LoginError when I tried to do it without logging in.

Check it!

There is probably also nice wrapper modules already written for your favourite language.

What kind of imaginative interfaces for editing Wikipedia can you imagine existing? Now we can build them! :)

26 August, 2008 • ,

Comment [2]

WP:DYK + identi.ca -> enwpdidyouknow

I decided to write a script to convert Wikipedia’s main page Did you know? (DYK) updates into identi.ca friendly messages. The result is enwpdidyouknow. If you use identi.ca, you can subscribe and receive regular DYK goodness. If not, you can still subscribe to the RSS feed, although it will seem pretty weird as it is broken into messages of less than 160 characters. I should make it also produce just a regular atom or RSS feed without the message length limitation.

It’s run for 24 hours now and it seems to be working OK. It updates in batches because that’s how Template:did_you_know is updated (except by humans). When a message has to broken into 2, it posts them virtually together, but it always leaves a 2 minute gap between different messages, to stop flooding a little bit.

I put some info, including my source code, here: http://dyk2identica.modernthings.org/. It’s really rough and ready. No one will be too surprised to hear that by far the hardest bit was figuring out how to correctly parse the wikisyntax. :)

I should probably move it all to the toolserver. I haven’t figured out what license it is yet. Suggestions welcome.

Wikipedia + MediaWiki API + mwclient + enwp.org service + identica API = new article fun :)

18 August, 2008 • , ,

Comment [1]

APIs: Ask, and ye shall receive

Wow. Wikis just gave me another lesson in awesome. I love it.

While thinking about the problem of Zemanta attribution strings, I mused that we really needed to develop a “Commons API”. There is a MediaWiki API for Commons, but there are more project-specific pieces of information we would like to provide. The big three are

  1. Deletion markers (warning to machines: don’t use this file)
  2. License info for a given file
  3. Author/copyright holder attribution string for a given file

So I made a bit of a start at Commons:API, thinking we could use the wiki pages to write psuedocode algorithms for the different problems. Already I knew at least I, Bryan and Duesentrieb had run across these problems before, and definitely others too. Therefore it made sense to combine our individual algorithms together and define a single, strongest-possible-permutation version and recommend others to use it. I imagined we could describe the algorithm in psuedocode and let people provide implementations in various programming languages. Versioning would kind of suck but hey, an imperfect solution is better than nothing.

However, a perfect solution is even better than both! I barely raised the topic when Magnus actually implemented it (warning: seriously alpha).

First, Magnus is one of my wiki-heroes. You could not ask for a more responsive developer, so it is just delightful when he chimes in on a “what if” discussion. Cool new shiny things are never far away. (Surely one of the strangest things to ever grace the toolserver is still Flommons, the “Flickr-like Commons” interface. Cut away the cruft!) And he is a lovely chap to boot. He tirelessly tweaks and prods any number of “what about…” or “why not move this here?” queries.

My pythonfu is not strong enough that I could code something like this up as he does, in half an hour, but I could probably practice and make some effort and manage it in a period of time. I recognise the neat or nifty factor in creating stuff that was previously just a “what if”. Programming rocks.

Secondly, I love how responsive a wiki community can be. Sure, for every five ideas you might have, four will garner a lukewarm response at best, but every now and then one will strike a chord and get some momentum. “Build it and they will come”; wikis can also obey “name it and they will build it”. [Of course, I’m hardly the first person to suggest Commons needs an API.]

Thirdly, thinking about the other Wikimedia projects — and indeed a good many third-party MediaWiki installs — it is obvious that all the projects may like the chance to define their own API. If nothing else, to define the “deletion markers” and the featured content (rather like another of Magnus’ tools, Catfood – category image RSS feed).

So, what does that suggest… that suggests wiki users need a flexible method of defining new parts of the API. Special:DefineAPI? Probably not, too subject to change.

Extensions can define API modules. So perhaps we should develop Extension:WikimediaCommonsAPI? If every project wanted to do this it may get a bit messy, but most projects wouldn’t bother I imagine.

Again we run up against the need for Commons to have a more structured database, rather than just store all information about an image in a big text blob.

At any rate, I hope we can set the current “pre-alpha” API up as a serious toolserver or svn project with multiple contributors. Wikimedia Commons is lucky to have attracted a relatively techy community of contributors, with a number of people MediaWiki- or toolserver- savvy. Let’s see how we go.

01 April, 2008 • , , ,

Comment [1]

Links for 2008-03-04

(Correction: not enabled on test.wikipedia. try this random testwiki.)


(via cc-au)

04 March, 2008 • , , , ,


Freebase, Wikipedia and the right to fork

Screenshot of Freebase personal type definition, 'free content collection'

Two nights ago I went to the first Freebase user meeting outside the US. (You can tell I’m setting myself up for a, “I was there when…”)

It was organised by Kirrily Robert, who’s taken enough with her “new crack habit” to set up a specialised blog just for it.

So, what is Freebase? It claims to be a “database of everything”. There are several points of comparison with Wikipedia. Where Wikipedia is an “encyclopedia”, Freebase wants to be “everything”. It is far more structured than Wikipedia (which anyone who’s ever wrangled with an esoteric template might appreciate). Like Wikipedia, it’s a free content project: data derived from Wikipedia is GFDL (natch) and everything else is CC-BY. They have a very excellent and well-documented API — they’re not afraid to share. Bring on the mash-ups!

There are several more differences worth discussing. Currently, Freebase is alpha and invitation-only for write permission (ie an account). No worries, give it time.

More importantly, the back-end. Freebase is built on Metaweb’s closed-source back-end that is going to remain that way. Apparently they intend to release some kind of regular data dump, and even allegedly would have no problem with someone taking that entire data set and throwing it into MySQL or what-have-you and setting up a total project fork.

If it was free software, there would be a right to fork. But this is only free content. Is there any kind of corresponding “right to fork” for a free content community? Should there be?

If not, maybe this joke from Evan about “crowdsourcing” is just a truth:

The other reason that I would wait until I had an entire data dump downloaded on my own disk before really barracking for Freebase is because I read their TOS:


We provide access to portions of the Site and Service through an API; for purposes of this Terms of Service, such access constitutes use of the Site and Service. You agree only to use the API as outlined in documentation provided by us on the Site. You may not use the API or any other features of the Site or Service to duplicate or copy the Site or Service.

Bummer. Although — here’s a thought — I wonder if that conflicts with the CC-BY?

(clause 8.e from CC-BY-3.0)

This License constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You.

It’s not quite viral freedom, but almost as good. It seems to me this nice clause would render their TOS impotent.

So, interesting to see what will happen there. It’s Wiki[p|m]edia that convinced me (and taught me) about the absolutely vital right to fork. That is an incredible freedom which is vastly underappreciated by the journalists who are generally impressed with Wikipedia’s “freeness” (meaning no ads, or free access). And as a project leader, any kind of project, that is what keeps you on your toes. Maybe it is a good benchmark for deciding if you want to be a contributor to a particular project. If management gets too heavy, you can keep them in line by threatening to exercise your right to fork. Yeah!

Back to Freebase… another related, interesting aspect will be watching the development of their community and how it will be managed. Where Wikipedia was pretty grass-roots, it seems like Freebase is top-heavy, for the moment at least. Letting go, giving up control and trusting the unwashed masses is a very difficult psychological moment for anyone (who’s not a Wikimedian). Trying to get those same unwashed masses to behave themselves is a whole other kettle of fish. When I first contemplated this for Freebase two night s ago I was filled with cynicism, until I remembered… The thing about Wikipedia is that it only works in practice. In theory, it can never work.

I should make that my mantra. Every time I get cynical about something, think about that idea again. It only works in practice.

11 October, 2007 • , , , , ,

Comment [1]

wikimedia commonswikipedialinkscommunitymediawikiconferenceslinux.conf.auwmfcreative commonswikimaniapoty2008australiawikimedia chapterswikimedia australiavideo
(see all tags)

free culture


...& other free content projects

interesting folk