GeoScience Australia goes CC-BY

Following the Australian Bureau of Statistics, it appears that Goescience Australia is making the leap and going CC-BY.

In the press release, their CIO says

Our agency is custodian of a vast range of valuable geological and spatial datasets that are used by the public sector and private sector industries in the exploitation of resources, management of the environment, safety of critical infrastructure and the resultant well-being of all Australians. The Creative Commons licence has created a more efficient process for them to access this valuable information.

Exactly, right?

Although looking around their website, it seems like various bits of their data you need to specially order or buy. I wonder if that will be changing as they update their website.

I’m not really up on “map stuff” but I am sure the attendees of the recent FOSS4G conference (Free and Open Source Software for Geospatial) in Sydney will be pleased about this.

Wiki[mp]edia data sources & the MediaWiki API

A brief presentation I gave for Melhack last week:

Wiki[mp]edia data sources & the MediaWiki API
I wrote a bit on my techiturn blog about what I worked on in my 24 hour hack.

There is a huge amount of rich data in Wikipedia and other MediaWiki collections, naturally, but as there is no API evangelist you have to do a bit of digging to figure this out. Regular readers may recall that I am quite a fan of the API and what it means for reusers.

[guest] AntWeb goes CC-BY-SA

Waldir has previously guest-blogged here and I am happy to welcome him back for his second post. Congrats on helping make this cool project happen! —Brianna

Image by AntWeb, licensed CC-BY-SA-3.0

written by Waldir Pimenta

Did you know that the most venomous insect in the world is an ant? That’s right. One sting from the Maricopa Harvester Ant is equivalent to twelve honey bee stings — the required amount to kill a 4.5 pound rat.

I found that over a year ago, through University of Florida’s Book of Insect Records. I immediately headed to Wikipedia to see what it had to say about it, but to my surprise there was no such article! I thus started one from scratch, using some information I found in several ant-related websites. Eventually people started adding information to the article, up to the point that it contained a fairly good collection of information about this fascinating species. But still one thing was missing — something that single-handedly could make the article ten times more useful: an image.

So, when searching for images to illustrate it, I found the fantastic images from AntWeb, a project from The California Academy of Sciences, which aims to illustrate the enormous diversity of the ants of the world. I was especially happy to find that they were using a Creative Commons license — but soon after I was disappointed to find that the specific one they used (CC-BY-NC) was not appropriate for Wikipedia (or, more generally, free cultural works, and thus discouraged by Creative Commons itself).

So I sent them an email suggesting them to change the license. When they replied, I found out that they actuallly had been internally discussing license issues for quite a while. I kept in touch, and made sure to let them know the advantages of having their work showcased in such high-traffic websites as Wikipedia, Commons or WikiSpecies.

I like to think that my two cents helped in their decision, some time later, to not only change their license to CC-BY-SA, but also upload all their images to Commons themselves! This was part of their overall mission: “universal access to ant information”. Before, the AntWeb project focused only on digitization of content and development of the web portal; but now they also decided to “export” AntWeb content to improve access. Putting the images and associated metadata in Commons was an example their outreach initiatives.

This was very welcome by the community, and there was a lot of input on how best to perform the mass upload in order to make the images easy to find and be used to illustrate articles and other relevant pages. The process took several days, but finally, over 30,000 images were uploaded, full with EXIF tags, taxonomic data, and geographic information when available.

This is just the beginning, though! As usual in the wiki world, you can help! There are articles to be illustrated in the various Wikipedia language versions (Magnus’ FIST tool comes in handy for finding them!). There are WikiSpecies pages to be illustrated. There are categories in Commons to be created to allow the ant category tree to be navigated and have every ant image reachable through it. And more importantly, there are these great news to spread and let people who are interested in ants know that they can now count on what’s possibly the greatest online repository of free, high-quality ant images.

Many thanks to Brian Fisher, AntWeb Project Leader, who coordinated the license change process, Dave Thau, AntWeb Software Enginer, who wrote the upload script and performed the upload, and to all the AntWeb staff for their outstanding work!

Why the reporting on Wikipedia is so bad

So Jimmy Wales has a piece on the Huffington Post about the bad reporting of flagged revs. Frankly, of all the things I would ping traditional media on, confused reporting of a complex new editing approval system is one the last things — I have yet to see anyone in the community explain clearly and concisely the system under consideration, so I think it is asking rather a lot that outsiders should grok it when we are struggling with it.

Of particular interest was

I believe that the underlying facts about the Wikipedia phenomenon — that the general public is actually intelligent, interested in sharing knowledge, interested in getting the facts straight — are so shocking to most old media people that it is literally impossible for them to report on Wikipedia without following a storyline that goes something like this: “Yeah, this was a crazy thing that worked for awhile, but eventually they will see the light and realize that top-down control is the only thing that works.”

Hmm. So it’s that the Wikipedia story is all sunshine and light, and they’re all cynical hacks? I think more likely, is the fact that they simply don’t understand how Wikipedia works.

In musing about Software Freedom Day, I watched a video of a talk by Bill Thompson in which he talked about the “‘10 cultures’ problem” (see Wikipedia for reference, or just watch the video – he gives a detailed explanation), by which he means the divide between those who understand how technology works, and how to work it (in theory, if not practice), and those who do not. (Yes, the title is a binary joke. Did you get it? Then you’re on this side of the divide.)

The fact that we can still see stories published about some article on Wikipedia being wrong, says to me that those stories are written by people who simply don’t understand how Wikipedia works. That is not to defend Wikipedia containing wrong information at any given time. But it is to say, the focus is not in the interesting, important parts. As Bill Thompson puts it, in a debate about national ID cards, it’s like focusing the argument on the physical card itself, rather than the national identity register.

I would like to see reporting on wrongful Wikipedia blocks – cf. reporting on when people are wrongly barred from voting. And no I’m not saying Wikipedia is a democracy, or should be one. But when the promise is engaging and empowering people around the world to develop the sum of all knowledge, and when the impact is what it is (top 5 website), then yes, it is right to have the scrutiny of traditional media all over it.

I mean, it is all there for them to find, too. But they don’t know how, is my guess.

Thinking about chapters and WIGs

So, since at least the chapters meeting in April, and especially at Wikimania, I have been Thinking about Chapters. What is a chapter, what should a chapter do, how should it operate, all that fun stuff. What I realised during my chapters panel is that we for one thing, we have a dearth of terminology. So here are some modest steps towards clearer definitions.

A Wikimedia interest group (WIG) — any group, with a common characteristic or theme, that aims to further the Wikimedia mission.

Now, borrowing the idea of features from linguistics, we can start to map out some identifiable WIGs.

Features are also typically arranged into sets of features. The first set should be geographic.


V: Virtual
W: Whole world
R: Region covering multiple nations
C: Country
S: (Sub-national) region

So typically a WIG would only have one of these features… so possibly they are all a single feature? Normally features are binary.

Another set of features could be scope of interest.


P: Project (a single project, such as Wikibooks)
L: Language
T: Theme (e.g. linguistics)

A third set of features could be characteristics, or functions. My initial guess is that these are only relevant for non-virtual WIGs.


LB: Legal body
F: Performs fundraising
M: Organises meet-ups or community-oriented events
O: Performs outreach events
P: Partners with other organisations or groups
L: Performs political lobbying

So a typical national chapter, would look like this:

- - - + -
- - - 

(This means, rather than being focused on a single project, language or theme, they are at least nominally interested in all/many of them. While in practice some countries tend towards monolingualism, so it may appear that some chapters are focused on a single language. But that is generally not an explicit part of their being.)

 + + + + + +

So these sets of features are not independent… a value in the geographic features will have implications for the other sets. But that is OK I think.

A typical mailing list/project community (say, French Wikibooks) could also be defined thus:

+ - - - -
+ + - 
 - - - - - -

So… what features am I missing so far? Have I listed any that are redundant? I’m sure I’m not thinking enough outside the box just yet.

