Randall Munroe, the guy behind xkcd, said about translating his webcomic: "Translating humor is often difficult between groups that speak the same language, let alone totally different cultures. So it's inherently a hard problem."
That problem seems tractable with a wikipedia-style solution: let the Crowd Source Translations. I've put together a RESTful service to facilitate that solution, and the xkcd example pages to provide a live test of that service.
The intention is to post the first to /r/xkcd and with their leave post the following five before stopping. This would let me
- Enjoy xkcd while on-the-clock
- Goof off on reddit
- Load and test the back-end server in live use
The 'xkcd' example pages can be found through http://www.csi18n.com/xkcd.
The code for the pages can be found at https://github.com/Csi18nAlistairMann/xkcd
The xkcd example pages makes use of JSON and XHR2, a relatively recent technology: Firefox has supported it since 2012, Chrome since 2013, Internet Explorer 10 since the same sort of time. Safari v6.1.2 (Feb 2014) or later seems necessary on Apple Macs, iOS v8 (Sept 2014) seems necessary on Apple fondleslabs. Chrome on a stock Android 4.4.2 is fine, although the inbuilt browser fails XHR2.
Once each new comic is uploaded, the following facilities are available:
Clicking the Globe at the top right hides/shows the service menu.
- Hyperlinks on/off. The textual content on each page is usually clickable to reveal the translation menu. Setting hyperlinks to off turns off that facility.
- Username/Password. When editing individual translations, these fields inform the credentials supplied to the server. Use "Join" underneath to obtain an account; also you may use username
testbut watch out! Anyone can edit 'test05' content!
- APIKey. You should not need to change this unless you copy these pages to your own site. The APIKey may be used to manage access from faulty clients, and new keys can also be found via "Join" underneath.
- Default language. The pages will try to serve content appropriate to the visitor's language; that language is supplied by the browser and the Accept-Language header. This is not always satisfactory, so this field allows for it to be changed. To read the original xkcd translations, the field should include "en"; to read in British English include "en-GB", French should include "fr". The field observes Accept-Language's rules, so you can include multiple languages in preference order: "da,en-AU;q=0.5;de=0.001" would say "Give me Danish, Australian English if you've no Danish, and German in the absence of either".
- Cache timeout. Used in tracking down caching problems: clicking it starts a ten-minute countdown during which caching will be sidestepped both in the browser and on the server. An [X] allows for the countdown to be cancelled; reloading the page will do the same.
- [Join]. Translators and developers are encouraged to create their own accounts on the csi18n server; password resets can also be started through here.
- [Documentation]. You'll end up on this page!
Translations are normally shown as hyperlinks - click each brings up the translation menu on the right-hand side of the page.
- See more. If there are multiple translations available, only one is shown - but click here and all the others matching the user's default language (above) are visible along with the uploader's ID and its language as described by that uploader. If one of these is clicked on, that translation will be preferred: at the user's next visit, he'll see that translation specifically rather than have another semi-randomly chosen. Those preferences are used to skew the semi-random choice towards translations that have broader acceptance. Note that language preference ("English before French") takes precedence over translation preference ("French translation C, instead of A, B or D")
- Offer another. Use this to offer a brand new translation. The newmark is used to determine which translation is required, for these xkcd pages the visibility should normally be 'anonymous'. Make sure the Content-Language field matches the code of the translation's language! That is, if that field is "da,en-AU;q=0.5;de=0.001", the translation will never be returned - unless the visitor's language is "%22da%2Cen-AU%3Bq%3D0.5%3Bde%3D0.001%22"!
- Edit. If you did accidentally mistype the translation or its language, Edit provides the opportunity to correct it. Editing someone else's translation is the same as creating one's own matching translation and editing that. If a translation gives a
409 Conflict, then some user has it as a preference and edits are now prohibited. If it must be changed nevertheless, this translation should be deleted then created again.
- Delete. The service is to no longer offer this translation.
- Options/Head. Rerun the request using HTTP's 'Options' or 'Head' method. Options asks the server: what are my options with this resource? Head asks the server: without sending the resource, what would happen if I requested it?
These last few are implemented on the server, but have not been implemented on the xkcd example pages
- Bump/Unbump. To bump/unbump is a form of moderation. It causes anonymous or public translations to have their visibility changed to private or personal, and is a capability reserved to the page's creators (for xkcd, that's user #92) or their moderators.
- Lock/Unlock. To lock a translation is to have the service always return that particular translation. With two extremely limited exceptions: if a user has a personal translation, or if the translation has been 451d.
- History. View the edits made to that translation to date.
Some html markup is honoured on the xkcd pages. These are:
for line breaks, , for italics, , for bold and &#x for numeric character references.
Note that the back end server does not escape html, the xkcd pages themselves do this.
- If visitor arrives by permalink, the language in that link is used. For example, in the link below the language is 'emoticon':
- If visitor arrives with a link that's not a permalink, the language in that link is also used. For example:
the translations returned are semi-randomly chosen among those described as being "en-US" by their uploaders. Interesting aside:
Will see translations drawn from all languages present, subject to authorisation. Refreshing is likely to see the same translations returned until the browser cache expires.
- If visitor changes Globe | Default Language, that value is used
- If Globe | Default Language is empty, then it's filled in from the user's own Accept-Language header.
Internet Explorer v11 on Windows 8.1 on British laptops use the sole language code of "en-GB". Safari 8.0.3 on British Macs use the sole language code of "en-us" - not even the recommended standard of "en-US". As I upload the XKCDs with language "en" unless specifically required otherwise, there would be a risk English-speakers on both sides of the Atlantic could not actually receive ... English.
If a form of English is asked for, but broad English itself is not, the XKCD pages will add "en;q=0.001" to the user's languages: "... and if all else fails, ask for English"
- Text instead of graphics. Randall uses text as picture elements, flowing around other elements. Until the back-end server supports binaries, the example pages use text as html text elements. This shows up particularly in flowcharts, where square html elements look crap in a rhombus, and arbitrarily shaped text (as with the "Yes, I want to look at something else" text) isn't directly possible.
- Image titles. Randall uses image titles as one of his easter eggs: hovering the mouse over each image reveals a further message. Image titles are not clickable: thus a reader could not directly choose to translate it from within the browser. The example pages handle this by making each image title a clickable but hidden element, revealed when rolling the mouse over an image map. in Jurassic World, hovering the pointer over the dinosaur reveals the message.
- Anigifs Where Randall has used an Animated GIF - such as for #1264 Slideshow - the csi18n service cannot easily help: the GIF format does not support hyperlinks. The whole would need to be reworked into a different form to be usable.
Recursive machine translation
It can be insightful and occasionally hilarious to test machine translators by feeding them English, then feeding the response through other languages and finally back into English.
I tried this with #1495 Hard Reboot and #1469 UV. In both cases, the original English had to be forced to lower case and re-capitalised as Google Translate gets confused by all caps.
An example permalink:
"hard_reboot" indicates to the server which xkcd webcomic to use.
"92" is the Subscriber ID of the uploader providing the following translations.
"recursivetranslation-via-de" is the language of the translations. The language does not have to be an IETF language code.
"925,926,927,928,929,930,931" are the record numbers for each translation used, in the order of which they'll be needed. Here they are consecutive, but won't always be so.