Daily-image-l is a mailing list that I started in February 2007. It is the Commons counterpart to Wikipedia’s daily-article-l which I believe is still lovingly crafted by hand. (Here’s today’s, and apparently it has expanded to encompass Wiktionary and Wikiquote’s daily offerings.)
Of course, mailing people an actual image each day is quickly going to end up as quite a bit of resources, especially considering the recipients won’t necessarily even look at it. So I decided to just send people a link and the brief description, and let them choose whether or not to click through. Keeping with Wikimedia Commons’ multilinguality I also decided to make daily-image-l multilingual, and all the different language captions come in a single email. Here is today’s, as rendered by Gmail:
I guess I initially crafted the daily-image-l emails by hand, but you don’t need too many pattern recognition skills to see that this email is something a machine can do. It’s a Python script that uses
wget to get the HTML of the page for the descriptions (hacking it apart with regexes), the MediaWiki API for the license and category info, and
sendmail to bundle it off to Mailman. And if I was doing it today? I don’t think it would be much different.
Once I got the thing working consistently, it really didn’t require much maintenance. It just does its thing. I just try to notice when something goes wrong. “Noticing” is more difficult than you might think, namely because of this:
Mailman -owner spam!
In theory, if you admin a Mailman mailing list with the address
foo at lists.bar.com, the list subscribers will be able to reach you by emailing
foo-owner at lists.bar.com. However in practice, if your list is even remotely public or remotely old, anything you write to this address will never be seen by the list admin, because 99% of what they receive to it is spam.
In case you missed it, two of these messages are valid:
From: [removed] To: <email@example.com> Date: Tue, 7 Jul 2009 11:39:59 +0200 Subject: help non mi arrivano più le mail con l'immagini del giorno la ringrazio anticipatamente
(this is buried in the message with the subject “Notifica errore non riconosciuta”)
From: [removed] To: <firstname.lastname@example.org> Date: Tue, 7 Jul 2009 07:04:50 +0200 Subject: AW: Datenschutz-Warnung von Mailman Sehr geehrter "Mailman", obwohl ich selbst es war, der diesen erneuten Abonnementsantrag für die Mailingliste email@example.com gestellt habe, betrachte ich dieses Schreiben nicht ganz als gegenstandslos. Denn seit ca. 5 Tagen funktioniert die tägliche Mail für das "Bild des Tages" bei mir nicht mehr. Die tägliche Mail bleibt einfach aus. Was ist da los? Vielleicht können Sie weiterhelfen. Vielen Dank und freundliche Grüße [name]
I don’t know why people like to write to the bounce address. Or are they just hitting reply, trying to post to the mailing list, and then getting this bounce, which for some reason is forwarded to -owner? And beyond this I don’t speak Italian or German anyway.
Luckily, eventually, someone leaves a comment in the right place, which is http://commons.wikimedia.org/wiki/Commons_talk:Daily-image-l. Although I don’t check that every day, RSS comes to the rescue!:
which leads me to inspect the July archive and indeed something has gone awry. It’s 8th July but there are only posts for the first 3 days. (And I actually paid attention it might not have been such a surprise.)
Now this script is run from the toolserver — the stable toolserver, in fact. Stable toolserver was set up to run allegedly “stable” projects with multiple caretakers or maintainers. I agreed to be a maintainer for the poty project with Bryan (this software was used to conduct Picture of the Year voting in 2007 and 2008). Because I already had a stable toolserver account (as opposed to regular toolserver), it seemed easiest to set up my daily-image-l script on stable as well. Bryan agreed to be a maintainer for that (and he actually did make some improvements :)) and stable project potd was born.
But I basically haven’t touched it for what seems like years. So to try and find out what had happened this time I had to dredge all the bits and pieces back together from the depths of my memory:
- remember my username on the stable toolserver. (damn 8-character limit!)
- remember that I had given my ssh key from my webhost server, so I had to SSH to that first before going to the stable toolserver. (this was from the days when I had multiple non-networked machines and giving them all the same SSH key seemed difficult.)
- remember that I had to ‘become’ the role account for the project in order to find and edit the files
- remember the name of my project…(another 8-character limit)
- — correctly…
- figure out that you have to set a value for EDITOR before
crontab -ewill do anything meaningful
- make polite requests to the nearest admin to install emacs so I don’t have to figure out how to use nano…
An hour or so later and I am pretty sure daily-image-l will return to its regular programming. (So to speak.)
While all this useful information is fresh in my brain, I think I will try and put a copy of this measly script in SVN. daily-image-l now has over 2,500 subscribers, which is pretty neat considering the MEAN amount of work I do on it each day is 0 seconds. Better to put it in source control before any crisis hits.
I think I’m trying to teach myself something. The moral is: authoring code is finite, but maintenance is forever. Do yourself a favour and document how all the bits bolt together. Because if you don’t have a sysadmin at your beck and call trying to piece it all together from stray emails will be really irritating!
You would think that if anyone could appreciate limited value to security through obscurity, it would be Wikipedia editors. —After all, Wikipedia is destroying the similar notion of “authority through obscurity” or “reliability through obscurity”. There’s a very clear parallel between the open source software development model and the Wikipedia editorial process. And yet… it is not the case.
The latest drama on en.wp is about intentionally adding hundreds of useless edits to the [[Main Page]] to make it undeletable. Deleting the main page is a hallmark of an administrator gone rouge, you see. I think it’s kind of cool that you can earn the cred to be able to delete the front page of a top ten website. Apparently for some people this is too much temptation.
In part these useless edits were added by an already-contentious bot, which performs a variety of routine tasks. The issue came up about what was the “contingency plan” given that this bot account was blocked.
I don’t know that we have a contingency plan for such things. The bot system is like the wild west. Everyone runs their own code and there is very little [sic?] redundancy. — Carl
The bot owner responded:
As for the source for my bots, I am willing to share it with people that I can trust. I wrote RfC bot and gladly handed that code out to a user that I know is responcible [sic]. I have also written code for other users and they have abused it, since then I only give it to people I can trust. — βcommand
Simetrical, one of the developers, responded:
Of course, all this would be an excellent argument for requiring that all bots on Wikipedia be entirely open-source, and that this be periodically verified by someone attempting to run the bot on a test wiki and making sure it actually works as advertised. Why Wikipedia has not yet agreed on this I’m not sure, except to the extent that it seems never to be able to agree on anything. (Yes, yes, anti-vandal bots’ source code will be open, I’m sure that will be a great aid to the huge number of vandals who are also programmers and malicious enough to spend hours analyzing twisty heuristic-based source code. The idea of security through openness is that they’ll be outnumbered by the group that’s identical but willing to help out by sharing any exploits they find.) Without open-source bots, it seems to me Wikipedia is asking to have major bot contributors get annoyed with the project and leave, or just disappear for any reason, seriously inconveniencing everyone. Actually, this has happened in the past, if I’m not wrong. How is it that The Free Encyclopedia is relying so heavily on non-free software? If not for the bots and scripts that are permitted to be closed, you could come close to saying that the only proprietary software used in creating and serving the encyclopedia is routing software. — Simetrical
There is somewhat similar code anarchy on the toolserver, with limited collaboration leading to multiple tools performing the same function, because with early versions the developer loses interest, some database configuration changes and the tool becomes permanently broken because it has no maintainer.
Recently a stable toolserver was introduced, which requires a project have at least two maintainers before it can be hosted there, in an attempt to alleviate some the described problems. It has not had very enthusiastic uptake yet.
In a similar vein I found it odd to be asked to contemplate a Windows toolserver just this week. Apparently the toolserver is considered exempt from the strict free software requirements of the Foundation proper because it is hosted by the German chapter. Or something. I do not find it very convincing.
An essential part of the Wikimedia Foundation’s mission is encouraging the development of free-content educational resources that may be created, used, and reused by the entire human community. We believe that this mission requires thriving open formats and open standards on the web to allow the creation of content not subject to restrictions on creation, use, and reuse.
At the creation level, we want to provide the editing community with freely-licenced tools for participation and collaboration. Our community should also have the freedom to fork thanks to freely available dumps. The community will in turn create a body of knowledge which can be distributed freely throughout the world, viewable or playable by free software tools.
We, the community, clearly have some catching up to do. People in glass houses not throwing stones and all that! Closed source should not be acceptable for bots or toolserver tools.