Facilities I'd like to offer in a later version

From Alistair Mann / csi18n
Revision as of 23:25, 30 March 2015 by Alistair Mann (Talk | contribs) (Preferences: Directly expose resources)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

A wish list of facilities I'd like to add.

Antispam nonce

A facility wherein the user can upload a nonce of his choosing to be added to any automated email sent to him. He can then set his antispam filter to look for and act on the presence of that nonce.

My filters currently have six seven different rules to route Linkedin email. The emails left over are phishing emails, but I had to check each to be sure. By adding a nonce of my choice, I could use just one rule, and I could be sure the remainder were fakes.

At time of writing, password recovery is the only use of automated emails.


csi18n currently supports resources with a limited set of text-based mime types, such as text/plain; text/html etc.

It would be useful to store binary types too. Examples:

  • image/png. A specifically created overlay on top of a cartoon could render alternative language text 'as if' it was part of the original.
  • audio/webm. Hear/record/distribute a translation of that TED lecture dialog in your language - no reading skills required of your audience.
  • video/mp4. I could imagine intertitles making use of this, I could imagine public service announcements using it too.
  • application/vnd.ms-powerpoint. Do you use these presentations as a teaching aid? How useful would it be that alternate language versions were available?

With binaries, people would also be able to upload suggested translations of (say) Glaswegian English from video into a voiceover ... which would see the translation with broader acceptance bubble to the top.

Docker access

Not strictly speaking part of this project, but it would be very interesting to hide servers behind docker-based caching servers.

Fixed IP

Who knows what the current IP is, or how long it will last? Not me, for sure. DDNS can only take things so far.

At the moment, the server can be down for upto half an hour after an IP address change: icmp is used to check every 15 secs, the demopage server will check if it needs to adapt it's firewall to that change every 15 seconds, but ddns itself may not complete the changes for some time. Even if fixed, it would take 15 minutes for other DNS servers to pick up my 900s TTL, so improving DDNS < getting Fixed IP.

Instant report for objectionable content

While content can be moderated within the service, End-user reports of such content happens outside the service, by email and the like.

It would be useful if users could have instant report links: This could be handled with a link for each translation on whose visit that link is added to an investigation queue.


It's astonishing that IPv6 is still not supported at the consumer level. It is available from datacentres, so I'd expect to offer IPv6 access as and when service can be provided from such a place.

Local caching

All requests received are routed directly to the origin server. For testing purposes as well as performance reasons, I'd prefer to route traffic first to a local cache, even a load balancer if warrantied.

Local servers

There is just the one server right now. It would be useful to have several around the world, with replication between them, leaving software clients to work out which gives best service.

Meta-data mining server

It would be useful to have a server dedicated to meta-data mining, and searches on the data that mining produces.

Searches on meta-data. Meta-data is data about data. Which country did the request come from? What time was it made? What was the answer? What resources did it take? For this service, it raises the possibility of analysing, for example: for product X, which languages are most in demand? In country Y, which product was most in demand? Which of the users contributing translations generate the most widely acceptable content?

Separate server. These kinds of queries are better conceptually - and so logically - separated from the service itself for two reasons: Security and Impact.

  • Security: meta-data is often found in logs to which the core service cannot and/or should not get read access. To provide access to these from within the service is to increase the risk of data breach, inadvertent or otherwise.
  • Impact: relative to the core work of the service, meta-data searches are relatively resource-intensive when the result is relatively time-insensitive: it should take microseconds to do core work, but analysis could well be drawn down hours before a human needs it. To provide access to meta-data searches from within the service is therefore to prioritise people who don't mind waiting at the expense of those who do.

All this considered it would be useful to migrate the meta-data mining to a physically separate server: the logs are securely transferred to it, the search itself be initiated on the main server, but the actual processing and response gets returned from this second, data-mining, server.

At the time of writing, only manual data-mining is undertaken on the main server.

Misc notes on meta-data searches

Meta-data searches seem to involve either data on entry or exit to the system.

  • An entry example: the service itself does not record URLs for which requests are received; however the web-server front-end does.
  • An exit example: the service doesn't itself record status responses to each URL such as "404". Nor is this necessarily known, as the service could respond "501" even after the "404" was organised. Again, the web-server front end could log these.
  • A processing example (these seem to be far more rare): the service can record how long is spent in various subroutines in order to better discover bottlenecks. Such analysis should happen off the main server.

Preferences: Directly expose resources

Although a CRUD system, preferences only expose Create. Retrieve, Update and Delete are all hidden from the user agent, which is undesirable.

Reports per Language/Newmark etc

Requests to the service can request a variety of acceptable languages, whereas the response can come with a single language chosen from among them. All this data is useful: first choice languages passed over for a later choice represents an opportunity lost. Time between first request for a newmark in a particular language and a translation being available is useful data about contributiness. Numbers of requests for a particular language vs a different one suggests something about how desirable the product is to that market.

Collecting facts can be put together with judicious logging; it'd be good to form them into reports.

RFC7230-7 Compliance

HTTP/1.1, as used by this service, was defined by RFC2616. Since June 2014 it is now defined by RFCs7230 thru 7237. I would like to revisit the service to make it comply with those new RFCs.


It would be of use to form some kind of a ranking system such that interested parties can discover higher-quality uploaders. Or perhaps discover lower-quality uploaders, or both. Similarly, it would be useful for uploaders to see subscribers who best treat them.


Security cannot be taken too seriously. I would commit to:

Inside-Out escalator

Were there funds to do so, I'd contract someone to Blue Team the existing security. I've done the best I can, but I doubt that's enough. Blue Teaming would happen regularly, with successive reviews getting more and more demanding.

Outside-In attack platform

It'd be interesting to offer up a server 'A' to be attacked and a Internet-accessible server 'B' from which Red team attacks could be launched against 'A', over a private network between the two. A DoS against the server 'A' or an attack against server 'B' would not count as a succesful attack.

Translation machines

Where a client declares she would be happy for a lower quality translation in the absence of a human translation, it makes sense to silently route her request through to one or more online translation machines; there's no technical barrier to the machine translation being copied until a human later edits it.

User to user contact

If one user wants to contact another user, it would be useful to facilitate that. For instance, suppose user X provides consistently good translations into Arabic, and so user Y wants to invites user X to a further project.

Such a service could be a cross between fingering and pseudonymous remailing: user X can elect to expose data or not, and take advantage of the service forwarding a message without revealing Xs contact details; user Y can send a message but does not know more about X than X has chosen to reveal. X can at that point take the approach to some other medium like email, or block Y altogether.


The existing implementation requires client software to specifically poll for updates. An advantage of WebSockets is that change-polling can be shifted to the server, thus saving bandwidth and improving responsiveness.

With WebSocket support, csi18n would provide a more seamless background for realt-time collaborative endeavours.