Staś Małolepszy

Syntax changes to L20n data types

We're introducing a few changes to the l20n syntax in hope of making the data types easier to use and to tell apart.

Objects in l20n can be grouped according to the data type they represent. On the highest level we have:

  • entities -- localizable objects that correspond to messages displayed on the screen or store abstracted data to be used and possibly resued in other entites,
  • macros -- named expressions which can accept primitively-typed arguments, where 'primitive' refers to strings, arrays and hashes,
  • context data provided by the developer at runtime -- objects of primitive type which are used to complete the translation, for instance the user's name, the number of seconds remaining in a download etc.

When referencing these objects, in old l20n code you'd simply write the name of the object with no other modifiers.

We found that this was confusing in some ways, mostly because you'd never know if you're referring to an entity located in the same file or to a piece of data provided by the developer. It also posed challenges related to naming conflicts. What if the developer passes a variable called username to the context, but there already is an entity called username in the l20n file (possibly added by a different developer)?

Let's look at some examples and see how the introduced changes solve these problems.

Example 1

In this example we define a plural macro which takes one argument. Then, based on the return value of the macro called on the value of numOpenTabs (which is a number passed by the developer) we select the right text to show in the quitConfirm entity. We also make a call to the brandName entity.

The old syntax:

    <plural(n) { n == 1 ? 'one' : 'many' }>
    <brandName "Firefox">
    <quitConfirm[plural(numOpenTabs)] {
      one: "Close one open tab and quit {{ brandName }}.",
      many: "Close {{ numOpenTabs }} open tabs and quit {{ brandName }}.",

The new syntax:

    <plural($n) { $n == 1 ? 'one' : 'many' }>
    <brandName "Firefox">
    <quitConfirm[plural($numOpenTabs)] {
      one: "Close one open tab and quit {{ brandName }}.",
      many: "Close {{ $numOpenTabs }} open tabs and quit {{ brandName }}.",

Notice how the variable from the developer is now referred to by $numOpenTabs. The $ prefix now indicates that the target object comes from the developer and is guaranteed to be of a primitive type (number, string, list, hash).

The reference to the brandName entity remains unchanged: a plain brandName, no prefixes.

Lastly, the arguments in the macro are prefixed with $ as well. The rationale is threefold:

  1. we didn't want to complicate the syntax with yet another prefix specific to macro arguments; this left us with a choice of using the plain identifier, i.e. n, or using the $ prefix, i.e. $n,
  2. we want users to know that whenever they see a plain identifier, like foo, it's an enity's name; this meant we needed a prefix for macro arguments,
  3. macro arguments can only be primitive values, so using $ is consistent with the syntax for variables from the developer; if you see something like $ you can instantly tell it's a syntax error -- primitive types don't have attributes (the double dot .. is an attribute access operator).

However, even if we settled for the same syntax as for the context variables, it's important to understand that they are not the same! Macro arguments are locally scoped and may override context variables in case of a name conflict. In other words, in the following code snippet $username inside of the macro is different from $username in the hello entity and is, in fact, a reference to the argument passed to the macro when the macro is called.

    <hello "Hello, {{ $username }}.">

    <byeMacro($username) { "Bye, {{ $username }}." }>
    <loggedOut "You've been logged out. {{ byeMacro('Dave') }}">

On the other hand, $username in the hello entity is a reference to a context variable provided by the developer.

A sidenote: we also considered using the colon (:) both as the prefix for entites and as the attribute access operator. You'd then write :brandName:accesskey to get the value of the accesskey attribute of brandName. While we liked that syntax it didn't meet our criterion from point 2 above. We also quickly ran into conflicts with the ternary condition expression operator, ?:.

Example 2

Let's see how the changes help avoid ambiguity and improve the readability of l20n code.

In the following (slightly contrived, but bear with me, please) example let's assume we're writing firmware for androids, which are sufficiently similar to humans to justify the use of he and she pronouns instead of it.

In some models, brandName will be defined as such:

    <brandName {
        nom: "Number Eight",
        gen: "Number Eight's"
      gender: "female"

In others, it will looks like this:

    <brandName {
        nom: "Number One",
        gen: "Number One's"
      gender: "male"

The rest of the localization code is shared by all models. Here's what it looks like.

The old syntax:

    <firmwareUpdateComplete[brandName..gender] {
      male: "{{ brandName.nom }} has been updated.  He will now reboot.",
      female: "{{ brandName.nom }} has been updated.  She will now reboot.",

    <greeting[user.gender] {
      male: "Hello, Mr. {{ user.lastname }}",
      female: "Hello, Ms. {{ user.lastname }}"

Both entities take an index, and both look at the gender of a certain object: brandName and user. Where is user defined though? And why is it that in the old l20n syntax, we'd write brandName..gender with double dots, but user.gender with a signle dot? What's going on here?

The answer is that brandName is an entity and as such, it can have attributes in addition to having a value. The single dot and the double dot are two different operators, serving different purposes:

  • selecting hash values: brandName.nom means, 'the value of brandName is a hash; take the member of that hash whose key is nom;
  • selecting attributes: brandName..gender means, 'take the value of the attribute called gender which is defined on the brandName entity.

What's up with user.gender, then? It's a reference to context data available on runtime. This data is defined as a JSON structure, which means that it can only be of the primitive type. For example, the context data might be defined like so:

        "user": {
            "firstname": "Joe",
            "lastname": "Doe",
            "gender": "male"

We can see that user is a simple hash, so we're using the single dot to access its members: user.gender and user.lastname. It's not an entity, so it can't have attributes.

This is precisely where things can get confusing. How is the localizer supposed to know that user is a context variable provided by the developer? How is she supposed to know, on the other hand, that brandName isn't? Finally, how do we handle name conflicts?

Enter the new syntax.

The new syntax:

    <firmwareUpdateComplete[brandName..gender] {
      male: "{{ brandName.nom }} has been updated.  He will now reboot.",
      female: "{{ brandName.nom }} has been updated.  She will now reboot.",

    <greeting[$user.gender] {
      male: "Hello, Mr. {{ $user.lastname }}",
      female: "Hello, Ms. {{ $user.lastname }}"

The localizer can easily tell the difference between brandName and $user. It is now clear where each of the two is defined. The problem of name conflicts (and scoping) is entirely non-existent.

Everyone wins.

A new data type: globals

As we were solidifying the above changes, we came to a realization that this was a perfect opportunity to address another data type-related idea we have been throwing around. Globals. Exposed by the l20n library variables holding interesting data about the runtime. Think @os, @time, @region and @screenWidth.

I'll go into details in my next post.


The discussion takes place in the l20n newsgroup. Please post your thoughts and questions there. Thanks!

Published on 20.04.2012

Staś Małolepszy

Thoughts about the Internet, the information society, Mozilla and human-computer interactions.

Latest notes