Fun fact of the day: about 37% of WordPress downloads are for non-English, localized versions.
So as a plugin or theme author, you should be thinking of localization and internationalization (L10N and I18N) as pretty much a fact of life by this point.
Fun total guess of the day: based on my experience in browsing through the thing, roughly, ohh… all plugins and themes in the directory are doing-it-wrong in some manner.
Yes friends, even my code is guilty of this to some degree.
It’s understandable. When you’re writing the thing, generally you’re working on the functionality, not form. So you put strings in and figure “hey, no biggie, I can come back and add in the I18N stuff later.” Sometimes you even come back and do that later.
And you know what? You probably still get it wrong. I did. I still often do.
The reason you are getting it wrong is because doing I18N right is non-obvious. There’s tricks there, and rules that apply outside of the normal PHP ways of doing things.
So here’s the unbreakable laws of I18N, as pertaining to WordPress plugins and themes.
Note: This is not a tutorial, as such. You are expected to already be translating your code in some way, and to have a basic grasp on it. What I’m going to show you is stuff you are probably already doing, but which is wrong. With any luck, you will have much slapping-of-the-head during this read, since I’m hoping to give you that same insight I had, when I finally “got it”.
Also note: These are laws, folks. Not suggestions. Thou shalt not break them. They are not up for debate. What I’m going to present to you here today is provably correct. Sorry, I like a good argument as much as the next guy, but arguing against these just makes you wrong.
Basic I18N functions
First, lets quickly cover the two top translation functions. There’s more later, and the laws apply to them too, but these are the ones everybody should know and make the easiest examples.
The base translation function is __(). That’s the double-underscore function. It takes a string and translates it, according to the localization settings, then returns the string.
Then there’s the shortcut function of _e(). It does the same, but it echoes the result instead.
There’s several functions based around these, such as esc_attr_e() for example. These functions all behave identically to their counterparts put together. The esc_attr_e() function first runs the string through __(), then it does esc_attr() on it, then it echo’s it. These are named in a specific way so as to work with existing translation tools. All the following laws apply to them in the exact same way.
So, right down to it then.
Law the First: Thou shalt not use PHP variables of any kind inside a translation function’s strings.
This code is obviously wrong, or it should be:
$string = __($string, 'plugin-domain');
The reason you never do this is because translation relies on looking up strings in a table and then translating them. However, that list of strings to be translated is built by an automated process. Some code scans your PHP code, without executing it, and pulls out all the __()’s it finds, then builds the list of strings to be translated. That scanning code cannot possibly know what is inside $string.
However, sometimes it’s more subtle than that. For example, this is also wrong:
$string = __("You have $number tacos", 'plugin-domain');
The translated string here will be something like ‘You have 12 tacos’, but the scanning code can’t know what $number is in advance, nor is it feasible to expect your translators to translate all cases of what $number could be anyway.
Basically, double quoted strings in translation functions are always suspect, and probably wrong. But that rule can’t be hard and fast, because using string operations like ‘You have ‘.$number.’ tacos’ is equally wrong, for the exact same reason.
Here’s a couple of wrongs that people like to argue with:
$string = __('You have 12 tacos', $plugin_domain);
$string = __('You have 12 tacos', PLUGIN_DOMAIN);
These are both cases of the same thing. Basically, you decided that repetition is bad, so you define the plugin domain somewhere central, then reference it everywhere.
Mark Jaquith went into some detail on why this is wrong on his blog, so I will refer you to that, but I’ll also espouse a general principle here.
I said this above, and I’m going to repeat it: “that list of strings to be translated is built by an automated process“. When I’m making some code to read your code and parse it, I’m not running your code. I’m parsing it. And while the general simplistic case of building a list of strings does not require me to know your plugin’s text domain, a more complicated case might. There are legitimate reasons that we want your domain to be plain text and not some kind of variable.
For starters, what if we did something like make a system where you could translate your strings right on the wordpress.org website? Or build a system where you could enlist volunteer translators to translate your strings for you? Or made a system where people could easily download localized versions of your plugin, with the relevant translations already included?
These are but a few ideas, but for all of them, that text domain must be a plain string. Not a variable. Not a define.
Bottom line: Inside all translation functions, no PHP variables are allowed in the strings, for any reason, ever. Plain single-quoted strings only.
Law the Second: Thou shalt always translate phrases and not words.
One way people often try to get around not using variables is like the following:
$string = __('You have ', 'plugin') . $number . __(' tacos', 'plugin-domain');
No! Bad coder! Bad!
English is a language of words. Other languages are not as flexible. In some other languages, the subject comes first. Your method doesn’t work here, unless the localizer makes “tacos” into “you have” and vice-versa.
This is the correct way:
$string = sprintf( __('You have %d tacos', 'plugin-domain'), $number );
The localizer doing your translation can then write the equivalent in his language, leaving the %d in the right place. Note that in this case, the %d is not a PHP variable, it’s a placeholder for the number.
In fact, this is a good place to introduce a new function to deal with pluralization. Nobody has “1 tacos”. So we can write this:
$string = sprintf( _n('You have %d taco.', 'You have %d tacos.', $number, 'plugin-domain'), $number );
The _n function is a translation function that picks the first string if the $number (third parameter to _n) is one, or the second one if it’s more than one. We still have to use the sprintf to replace the placeholder with the actual number, but now the pluralization can be translated separately, and as part of the whole phrase. Note that the last argument to _n is still the plugin text domain to be used.
Note that some languages have more than just a singular and a plural form. You may need special handling sometimes, but this will get you there most of the time. Polish in particular has pluralization rules that have different words for 1, for numbers ending in 2, 3, and 4, and for numbers ending in 5-1 (except 1 itself). That’s okay, _n can handle these special cases with special pluralization handling in the translator files, and you generally don’t need to worry about it as long as you specify the plural form in a sane way, using the whole phrase.
You might also note that _n() is the one and only translation function that can have a PHP variable in it. This is because that third variable is always going to be a number, not a string. Therefore no automated process that builds strings from scanning code will care about what it is. You do need to take care than the $number in _n is always a number though. It will not be using that $number to insert into the string, it will be selecting which string to use based on its value.
Now, using placeholders can be complex, since sometimes things will have to be reversed. Take this example:
$string = sprintf( __('You have %d tacos and %d burritos', 'plugin-domain'), $taco_count, $burrito_count );
What if a language has some strange condition where they would never put tacos before burritos? It just wouldn’t be done. The translator would have to rewrite this to have the burrito count first. But he can’t, the placeholders are such that the $taco_count is expected to be first in the sprintf. The solution:
$string = sprintf( __('You have %1$d tacos and %2$d burritos', 'plugin-domain'), $taco_count, $burrito_count );
The %1$d and such is an alternate form that PHP allows called “argument swapping“. In this case, the translator could write it correctly, but put the burritos before the tacos by simply putting %2$d before %1$d in the string.
Note that when you use argument swapping, that single-quoted string thing becomes very important. If you have “%1$s” in double quotes, then PHP will see that $s and try to put your $s variable in there. In at least one case, this has caused an accidental Cross-Site-Scripting security issue.
So repeat after me: “I will always only use single-quoted strings in I18N functions.” There. Now you’re safe again. This probably should be a law, but since it’s safe to use double-quoted strings as long as you don’t use PHP variables (thus breaking the first law), I’ll just leave you to think about it instead. 🙂
Law the Third: Thou shalt disambiguate when needed.
When I say “comment” to you, am I talking about a comment on my site, or am I asking you to make a comment? How about “test”? Or even “buffalo”?
English has words and phrases that can have different meanings depending on context. In other languages, these same concepts can be different words or phrases entirely. To help translators out, use the _x() function for them.
The _x() function is similar to the __() function, but it has a comment section where the context can be specified.
$string = _x( 'Buffalo', 'an animal', 'plugin-domain' );
$string = _x( 'Buffalo', 'a city in New York', 'plugin-domain' );
$string = _x( 'Buffalo', 'a verb meaning to confuse somebody', 'plugin-domain' );
Though these strings are identical, the translators will get separated strings, along with the explanation of what they are, and they can translate them accordingly.
And just like __() has _e() for immediate echoing, _x() has _ex() for the same thing. Use as needed.
Finally, this last one isn’t a law so much as something that annoys me. You’re free to argue about it if you like. 🙂
Annoyance the First: Thou shalt not put unnecessary HTML markup into the translated string.
$string = sprintf( __('<h3>You have %d tacos</h3>', 'plugin-domain'), $number );
Why would you give the power to the translator to insert markup changes to your code? Markup should be eliminated from your translated strings wherever possible. Put it outside your strings instead.
$string = '<h3>'.sprintf( __('You have %d tacos', 'plugin-domain'), $number ).'</h3>';
Note that sometimes though, it’s perfectly acceptable. If you’re adding emphasis to a specific word, then that emphasis might be different in other languages. This is pretty rare though, and sometimes you can pull it out entirely. If I wanted a bold number of tacos, I’d use this:
$string = sprintf( __('You have %s tacos', 'plugin-domain'), '<strong>'.$number.'</strong>' );
Or more preferably, the _n version of same that I discussed above.
Like I said at the beginning, we’ve all done these. I’ve broken all these laws of I18N in the past (I know some of my plugins still do), only to figure out that I was doing-it-wrong. Hopefully, you’ve spotted something here you’ve done (or are currently doing) and have realized from reading this exactly why your code is broken. The state of I18N in plugins and themes is pretty low, and that’s something I’d really like to get fixed in the long run. With any luck, this article will help. 🙂
Disclaimer: Yes, I wrote this while hungry.