Staś Małolepszy

Frequently asked questions about L20n

A good Q&A after a talk is a great way to gauge if the audience was able to relate to the topic and an opportunity to identify new pitch opportunities.

I'm typing this on a train from Poznań to Warsaw, on my way back from a meet.js event organized by Michał Maćkowiak. It was a great pleasure to attend the event and to have an opportunity to talk about L20n to Polish developers.

Whenever I try to explain L20n to people, I first try to understand what they already know about localization. I found that it's useful to ask if a developer has ever shipped a multilingual product, or if an end-user has ever seen anything similar to the image below, where all but one word of the message was truncated to fit the screen (there's a good chance they have).

Another great way to better know your audience is to listen to questions. Good questions are a sign that the audience was able to relate to the problem L20n is trying to solve. (To see why the solution is compelling, one has to first acknowledge the problem.)

I was really happy with the the questions I got after my talk at meet.js. They were insightful and inspiring and each one brought something more to the conversation. Below I try to recap my answers to recurring and common themes.

Only localizers know what they need

Question: Do you expect localizers to have programming experience?

No. We're realistic. We expect localizers to be experts in localization, not programming. We're not imposing any complex syntax nor require to write macros and dict-like data structures. We do give localizers the freedom to get as complex as they need, when they need it. After all, only the localizers know what is it that they need, and L20n doesn't make any assumptions about the language it's used to localize into. It's grammar-agnostic to the extreme.

Fortunately, more often than not there is no need to get any more complex than the baseline. When it comes to simple cases, L20n is pleasantly simple :

    <identifier "A translated value">
    <multiline """
      Multiline strings are easy to read and edit 
      thanks to the triple-quote syntax.
    """>

Localizing Mozilla projects has taught us that as many as 90-95% of messages found in the UI are simple key-value pairs that won't require any advanced knowledge of L20n's features.

For instance, the file that we use to measure the performance of real-life usage of L20n, consists of over 500 strings, only 4 of which are not key-value pairs. The file is based on the localization file of Firefox OS's largest app, Settings.

So why develop L20n at all? Because it's the remaining 5% that make or break the UI of your app. They're too important to ignore and they are responsible for the experience of the user.

L20n was created to cater to these 5%, and admittedly, L20n examples can get rather complex (see L20n Tinker for an example). We're trying to show off advanced features to explain what's possible with the framework and how it handles common challenges of the UI design.

The vast majority of languages and use cases probably won't even need them.
L20n's motto say it best:

Keep simple things simple. Make complex things possible.

Common helpers for each language

Question: Will there be plural macros built into L20n for each language?

We may consider adding common rules for all languages at some point in the future, but for the 1.0 release we decided to keep the framework lean and leave them out.

Instead, we're hoping that more experienced localizers, preferably with some programming knowledge, will step in and create common L20n files to be included by others in their own projects. Such files would include commonly-used helpers like plural macros, translations of the names of the months and the days of the week, formal and informal greetings depending on the time of day et cetera.

Consider the following L20n file, common.lol:

     <plural($n) {
       $n == 0 ? "zero" :
         $n == 1 ? "one" :
           "many" }> 

You could use the above plural macro in another file by using L20n's handy import statement.

    import('common.lol')

    <unread[plural($unreadMessages)] {
      zero: "No unread messages.  Good job!",
      one: "You have one unread message.",
     *many: "You have {{ $unreadMessages }} unread messages."
    }>

Thanks to this approach, the localizer keeps the control over even such basic features of the localization as the plural selector, which allows for great flexibility (if needed).

We'd love to see such common files created organically by the community and shared among localizers in a decentralized manner.

Context data and for-loops

Question: If the context data is global, how do I built multi-item UIs?

The interface between L20n's context and the data that developer wants to expose to it indeed makes the data available context-wide. In other words, if the developer defines the context data as follows:

    {
      "currentUser": {
        "name": "Jane",
        "gender": "feminine"
      }
    }

…it will be available to the localizer as $currentUser and can be used in any entity. This is an important design concept of L20n which goes back to the idea that only the localizers can know what they need to properly localize a message in the UI. English may not require the user's gender to build grammatically-correct sentences, but other languages certainly do.

To achieve the same in other localization frameworks, the developer would need to pass the currentUser object in every call the the framework, just in case the requested string makes use of that information in some language.

Sometimes, however, you want to build the UI out of repeating blocks and exposing data needed by each block would lead to naming conflicts. In such cases, you can pass data on a per-entity basis as a second argument to Context::get and Context::getEntity. Consider the following JavaScript code:

    document.l10n.addEventListener('ready', function() {
      // ... 
      notifications.forEach(function(notification, i) {
        nodes[i].textContent = document.l10n.get('notification', { 
          // pass notification.fromUser as $fromUser to the requested entity
          fromUser: notification.fromUser
        });
      });
      // ... 
    });

Notice how the developer only asks for the value of the notification entity and passes notification.fromUser as $fromUser. Here's what the corresponding L20n code could look like:

    <notification "{{ fromUser.name }} wants to add you as a friend">

Or assuming $fromUser has a gender property:

    <notification[$fromUser.gender] {
      feminine: "{{ fromUser.name }} wants you to add her to your friends",
      masculine: "{{ fromUser.name }} wants you to add him to your friends",
     *unknown: "{{ fromUser.name }} wants you to add them to your friends"

It's still possible, of course, to use the context-wide $currentUser variable in the translation. In the following example in Polish, the "wants you to add" part needs to be accorded with the gender of the recipient, so it makes use of both $fromUser passed to it directly and $currentUser available context-wide.

    <notification[$fromUser.gender, $currentUser.gender] {
      feminine: {
        feminine: "{{ fromUser.name }} chce, żebyś dodała ją do znajomych",
        masculine: "{{ fromUser.name }} chce, żebyś dodał ją do znajomych",
       *unknown: "{{ fromUser.name }} chce zostać twoją znajomą"
      },
      masculine: {
        feminine: "{{ fromUser.name }} chce, żebyś dodał go do znajomych",
        masculine: "{{ fromUser.name }} chce, żebyś dodał go do znajomych",
       *unknown: "{{ fromUser.name }} chce zostać twoim znajomym"
      },
     *unknown: "{{ fromUser.name }} chce dodać cię do znajomych"

See how this example can progressively get more complex? It's interesting to realize that the unknown case for both genders is, in fact, enough to get the message across in a gramatically correct sentence. Using more advanced featurs of L20n, however, allows us to create a much more user-friendly interface which seamlessly adapts to the user's gender. As Charles Eames famously put it:

The details are not the details. They make the design.

L20n is declarative

Question: Can L20n automatically create declensions of unknown nouns?

It's tempting to try to create a framework that—with enough rules codified in it—would be able to automatically decline nouns or conjugate verbs. This has sadly proven to be extremely difficult a task in practice which has been keeping linguists and programmers busy for years. Natural languages have too many rules, too many exceptions and are not context-free.

I could imagine tools built into L20n that would help create heuristics for grammatical inflection. For instance, we could add a method to check if a word or an n-tuple starts or ends with a given string of characters. This would likely lead, however, to localizations becoming state machines, imperatively trying to guess the correct inflection. Unfortunately, state machines don't model the intricacies of natural languages well.

L20n takes a declarative approach to localization, working around grammar when needed. Looking at examples of L20n code, it's important to remember that they are not trying to codify the entire logic of the language's grammar, but instead, they aim at declaring possible messages that the user might see in the interface.

In one of the examples I demoed during my talk at meet.js, I built a local entity to store declensions of common city names. The entity only defined a nominative and a locative as these were the only cases that mattered in the use case. Furthermore, even though the locative case in Polish works with such prepositions as in, inside, on, ontop and about, the example only made use of in. I ended up declaring my own case which I dubbed locativeIn, which included the preposition. This allowed me to easily handle exceptions where the Polish word for in changes slightly to make the whole expression sound better (w Warszawie—in Warsaw, but we Wrocławiu—in Wrocław).

Redundant by design

Question: Do I need to repeat the entire sentence just to change one preposition?

The last example that I showed was a tiny website that I created during one coffee break that counted down days and hours to my vacation. The L20n resource file for this example is just one macro and one string, and yet it's 34 lines long. We designed L20n to be like this on purpose.

The goal was the encourage some repetition and redundancy in the syntax in order to allow for more customization. All for the sake of being able to create better UI.

ICU's MessageFormat and its JavaScript implementation handle redundancy by allowing branching locally inside of localizable strings. While this is technically possible (or almost possible) in L20n by using local entities, I actually think that some redundancy can be healthy and can help improve the readability of L20n code.

If you want to learn more, I posted my toughts on MessageFormat in the tools-l10n mailing list a few months ago.

Published on 30.06.2013
Permalink: http://informationisart.com/19

Staś Małolepszy

Thoughts about the Internet, the information society, Mozilla and human-computer interactions.

Latest notes