Posts tagged ‘i18n’


So in my last post about Internationalization, I covered some non-obvious things that you should consider when adding translation capabilities to your code.

Today, let’s add to that by covering some non-obvious translation functions. You’re probably not using these, since they don’t get talked about as much. But there’s probably places where you should be using them, so knowing about them is the first step. And knowing is half the battle.

Basic functions, again

Last time I talked about these functions:

  • __()
  • _e()
  • _x()
  • _ex()
  • _n()

Let’s cover the ones I didn’t talk about.

Escaping output

In practice, you tend to use these mostly when outputting things onto the main page or in the admin. But, one thing you also use a lot when outputting text is the standard escaping functions. These are things like esc_html(), which outputs text in a way that makes it “safe” to go onto a webpage, without being interpreted as HTML. If the text comes from user input, then this is a good idea.

Now, if you think about it, then the text you have may be translated in some other file, which you don’t control either. So escaping that text might be a good idea too. If somebody snuck bad code into a translation file, a user might get bad things displayed without being able to easily find it.

So you could write something like echo esc_html(__('text','text-domain')), but that’s a bit wordy. Let’s talk about some shortcuts.

The esc_html__() function is the equivalent of esc_html(__(...)). It does the escaped html and the double-underscore translation all in one go. Similarly, the esc_html_e() function does the same thing, but it echoes the result, just like the _e() function would. And there’s also esc_html_x(), which is the equivalent of combining esc_html() and _x().

Along with those three are the three identical equivalents for attributes: esc_attr__(), esc_attr_e(), and esc_attr_x(). As the name implies, these combine the translation functions with esc_attr(), which is the escape function specifically intended when you’re outputting text into html attributes.

Also note there’s no shortcut for the equivalent of _ex(). It’s just not used that much, or at least not enough to need something special for it. Use an echo esc_html_x() instead.

There are no shortcuts for the other escaping functions as yet, but these can save a few keystrokes and make your code that much more readable.

The Numerical No-op

So we’ve got some shortcuts for escaping with those three functions, but where’s the love for _n()?

One of the problems with _n() is that it tends to require the strings to be in the same place that the PHP variable is. For all the other functions, you could have a big file of strings in an array, and then reference those strings by name or something elsewhere because they don’t require any PHP variables. Nothing about them is computed at the time of the output.

But not so with _n(), that $number to decide which string to use means that the strings have to be right there, they can’t be translated separately and referenced.

This is where _n_noop() comes in. The _n_noop() function basically takes the singular and plural strings for something, along with the text domain, and stores them in an array so that they can be referenced later by a function named translate_nooped_plural().

Perhaps an example is in order. Let’s go back to the tacos:

$string = sprintf( _n('You have %d taco.', 'You have %d tacos.', $number, 'plugin-domain'), $number );

What if we wanted those strings somewhere else? Like in a big file with all of our strings. Here’s a way to separate the strings from the _n() call:

$taco_plural = _n_noop('You have %d taco.', 'You have %d tacos.', 'plugin-domain');
$string = sprintf( translate_nooped_plural( $taco_plural, $number) , $number );

Now, that $taco_plural can be defined anywhere. Note that it contains no references to PHP variables. It’s basically static and unchanging. This allows us to separate it, then reference it elsewhere for the actual translation. The translate_nooped_plural() function performs the same job as _n() does, choosing which string to use based on the $number of tacos. The sprintf then pushes the $number into the chosen string, replacing the %d with the number.

Thus, that lets us extract the translatable strings out and put them anywhere we choose.

Also of note: The _nx_noop() function is a cross between _n_noop() and _x(). It takes a context for the translators as the third argument, and the domain becomes the fourth argument. Useful if you need to explain to the translators the context surrounding the pluralization choice.

Numbers and Dates

The number_format_i18n() function is functionally equivalent to the PHP number_format function. It lets you format numbers with commas at the thousands mark and so forth, except that it also takes localization into account. Not everybody uses commas for thousands and periods for decimals. This function will do the translation appropriately for that aspect.

The date_i18n() function is functionally equivalent to the PHP date function. It will handle all the same string formatting parameters as date() will, but it will cause output to be translated for month names, day-of-week names, and so forth. Of note is that it doesn’t change the format requested. If some places put days before months, for example, it won’t handle that. But it will output the month name in the native language (if the translation pack has the right month name in it). So you may want to run the date formatting string through __() as well, to let translators adjust the date format accordingly.

Wrap up

And that’s pretty much all the rest of the translation functions that I didn’t cover before. I may have forgotten a few useful ones here or there. Feel free to comment about anything I missed, or what you see most often, especially if you’re doing translations yourself.

Shortlink:

Fun fact of the day: about 37% of WordPress downloads are for non-English, localized versions.

So as a plugin or theme author, you should be thinking of localization and internationalization (L10N and I18N) as pretty much a fact of life by this point.

Fun total guess of the day: based on my experience in browsing through the thing, roughly, ohh… all plugins and themes in the directory are doing-it-wrong in some manner.

Yes friends, even my code is guilty of this to some degree.

It’s understandable. When you’re writing the thing, generally you’re working on the functionality, not form. So you put strings in and figure “hey, no biggie, I can come back and add in the I18N stuff later.” Sometimes you even come back and do that later.

And you know what? You probably still get it wrong. I did. I still often do.

The reason you are getting it wrong is because doing I18N right is non-obvious. There’s tricks there, and rules that apply outside of the normal PHP ways of doing things.

So here’s the unbreakable laws of I18N, as pertaining to WordPress plugins and themes.

Note: This is not a tutorial, as such. You are expected to already be translating your code in some way, and to have a basic grasp on it. What I’m going to show you is stuff you are probably already doing, but which is wrong. With any luck, you will have much slapping-of-the-head during this read, since I’m hoping to give you that same insight I had, when I finally “got it”.

Also note: These are laws, folks. Not suggestions. Thou shalt not break them. They are not up for debate. What I’m going to present to you here today is provably correct. Sorry, I like a good argument as much as the next guy, but arguing against these just makes you wrong.

Basic I18N functions

First, lets quickly cover the two top translation functions. There’s more later, and the laws apply to them too, but these are the ones everybody should know and make the easiest examples.

The base translation function is __(). That’s the double-underscore function. It takes a string and translates it, according to the localization settings, then returns the string.

Then there’s the shortcut function of _e(). It does the same, but it echoes the result instead.

There’s several functions based around these, such as esc_attr_e() for example. These functions all behave identically to their counterparts put together. The esc_attr_e() function first runs the string through __(), then it does esc_attr() on it, then it echo’s it. These are named in a specific way so as to work with existing translation tools. All the following laws apply to them in the exact same way.

So, right down to it then.

Law the First: Thou shalt not use PHP variables of any kind inside a translation function’s strings.

This code is obviously wrong, or it should be:

$string = __($string, 'plugin-domain');

The reason you never do this is because translation relies on looking up strings in a table and then translating them. However, that list of strings to be translated is built by an automated process. Some code scans your PHP code, without executing it, and pulls out all the __()’s it finds, then builds the list of strings to be translated. That scanning code cannot possibly know what is inside $string.

However, sometimes it’s more subtle than that. For example, this is also wrong:

$string = __("You have $number tacos", 'plugin-domain');

The translated string here will be something like ‘You have 12 tacos’, but the scanning code can’t know what $number is in advance, nor is it feasible to expect your translators to translate all cases of what $number could be anyway.

Basically, double quoted strings in translation functions are always suspect, and probably wrong. But that rule can’t be hard and fast, because using string operations like ‘You have ‘.$number.’ tacos’ is equally wrong, for the exact same reason.

Here’s a couple of wrongs that people like to argue with:

$string = __('You have 12 tacos', $plugin_domain);
$string = __('You have 12 tacos', PLUGIN_DOMAIN);

These are both cases of the same thing. Basically, you decided that repetition is bad, so you define the plugin domain somewhere central, then reference it everywhere.

Mark Jaquith went into some detail on why this is wrong on his blog, so I will refer you to that, but I’ll also espouse a general principle here.

I said this above, and I’m going to repeat it: “that list of strings to be translated is built by an automated process“. When I’m making some code to read your code and parse it, I’m not running your code. I’m parsing it. And while the general simplistic case of building a list of strings does not require me to know your plugin’s text domain, a more complicated case might. There are legitimate reasons that we want your domain to be plain text and not some kind of variable.

For starters, what if we did something like make a system where you could translate your strings right on the wordpress.org website? Or build a system where you could enlist volunteer translators to translate your strings for you? Or made a system where people could easily download localized versions of your plugin, with the relevant translations already included?

These are but a few ideas, but for all of them, that text domain must be a plain string. Not a variable. Not a define.

Bottom line: Inside all translation functions, no PHP variables are allowed in the strings, for any reason, ever. Plain single-quoted strings only.

Law the Second: Thou shalt always translate phrases and not words.

One way people often try to get around not using variables is like the following:

$string = __('You have ', 'plugin') . $number . __(' tacos', 'plugin-domain');

No! Bad coder! Bad!

English is a language of words. Other languages are not as flexible. In some other languages, the subject comes first. Your method doesn’t work here, unless the localizer makes “tacos” into “you have” and vice-versa.

This is the correct way:

$string = sprintf( __('You have %d tacos', 'plugin-domain'), $number );

The localizer doing your translation can then write the equivalent in his language, leaving the %d in the right place. Note that in this case, the %d is not a PHP variable, it’s a placeholder for the number.

In fact, this is a good place to introduce a new function to deal with pluralization. Nobody has “1 tacos”. So we can write this:

$string = sprintf( _n('You have %d taco.', 'You have %d tacos.', $number, 'plugin-domain'), $number );

The _n function is a translation function that picks the first string if the $number (third parameter to _n) is one, or the second one if it’s more than one. We still have to use the sprintf to replace the placeholder with the actual number, but now the pluralization can be translated separately, and as part of the whole phrase. Note that the last argument to _n is still the plugin text domain to be used.

Note that some languages have more than just a singular and a plural form. You may need special handling sometimes, but this will get you there most of the time. Polish in particular has pluralization rules that have different words for 1, for numbers ending in 2, 3, and 4, and for numbers ending in 5-1 (except 1 itself). That’s okay, _n can handle these special cases with special pluralization handling in the translator files, and you generally don’t need to worry about it as long as you specify the plural form in a sane way, using the whole phrase.

You might also note that _n() is the one and only translation function that can have a PHP variable in it. This is because that third variable is always going to be a number, not a string. Therefore no automated process that builds strings from scanning code will care about what it is. You do need to take care than the $number in _n is always a number though. It will not be using that $number to insert into the string, it will be selecting which string to use based on its value.

Now, using placeholders can be complex, since sometimes things will have to be reversed. Take this example:

$string = sprintf( __('You have %d tacos and %d burritos', 'plugin-domain'), $taco_count, $burrito_count );

What if a language has some strange condition where they would never put tacos before burritos? It just wouldn’t be done. The translator would have to rewrite this to have the burrito count first. But he can’t, the placeholders are such that the $taco_count is expected to be first in the sprintf. The solution:

$string = sprintf( __('You have %1$d tacos and %2$d burritos', 'plugin-domain'), $taco_count, $burrito_count );

The %1$d and such is an alternate form that PHP allows called “argument swapping“. In this case, the translator could write it correctly, but put the burritos before the tacos by simply putting %2$d before %1$d in the string.

Note that when you use argument swapping, that single-quoted string thing becomes very important. If you have “%1$s” in double quotes, then PHP will see that $s and try to put your $s variable in there. In at least one case, this has caused an accidental Cross-Site-Scripting security issue.

So repeat after me: “I will always only use single-quoted strings in I18N functions.” There. Now you’re safe again. This probably should be a law, but since it’s safe to use double-quoted strings as long as you don’t use PHP variables (thus breaking the first law), I’ll just leave you to think about it instead. 🙂

Law the Third: Thou shalt disambiguate when needed.

When I say “comment” to you, am I talking about a comment on my site, or am I asking you to make a comment? How about “test”? Or even “buffalo”?

English has words and phrases that can have different meanings depending on context. In other languages, these same concepts can be different words or phrases entirely. To help translators out, use the _x() function for them.

The _x() function is similar to the __() function, but it has a comment section where the context can be specified.

$string = _x( 'Buffalo', 'an animal', 'plugin-domain' );
$string = _x( 'Buffalo', 'a city in New York', 'plugin-domain' );
$string = _x( 'Buffalo', 'a verb meaning to confuse somebody', 'plugin-domain' );

Though these strings are identical, the translators will get separated strings, along with the explanation of what they are, and they can translate them accordingly.

And just like __() has _e() for immediate echoing, _x() has _ex() for the same thing. Use as needed.

Finally, this last one isn’t a law so much as something that annoys me. You’re free to argue about it if you like. 🙂

Annoyance the First: Thou shalt not put unnecessary HTML markup into the translated string.

$string = sprintf( __('<h3>You have %d tacos</h3>', 'plugin-domain'), $number );

Why would you give the power to the translator to insert markup changes to your code? Markup should be eliminated from your translated strings wherever possible. Put it outside your strings instead.

$string = '<h3>'.sprintf( __('You have %d tacos', 'plugin-domain'), $number ).'</h3>';

Note that sometimes though, it’s perfectly acceptable. If you’re adding emphasis to a specific word, then that emphasis might be different in other languages. This is pretty rare though, and sometimes you can pull it out entirely. If I wanted a bold number of tacos, I’d use this:

$string = sprintf( __('You have %s tacos', 'plugin-domain'), '<strong>'.$number.'</strong>' );

Or more preferably, the _n version of same that I discussed above.

Conclusion

Like I said at the beginning, we’ve all done these. I’ve broken all these laws of I18N in the past (I know some of my plugins still do), only to figure out that I was doing-it-wrong. Hopefully, you’ve spotted something here you’ve done (or are currently doing) and have realized from reading this exactly why your code is broken. The state of I18N in plugins and themes is pretty low, and that’s something I’d really like to get fixed in the long run. With any luck, this article will help. 🙂

Disclaimer: Yes, I wrote this while hungry.

Shortlink: