Fun fact of the day: about 37% of WordPress downloads are for non-English, localized versions.

So as a plugin or theme author, you should be thinking of localization and internationalization (L10N and I18N) as pretty much a fact of life by this point.

Fun total guess of the day: based on my experience in browsing through the thing, roughly, ohh… all plugins and themes in the directory are doing-it-wrong in some manner.

Yes friends, even my code is guilty of this to some degree.

It’s understandable. When you’re writing the thing, generally you’re working on the functionality, not form. So you put strings in and figure “hey, no biggie, I can come back and add in the I18N stuff later.” Sometimes you even come back and do that later.

And you know what? You probably still get it wrong. I did. I still often do.

The reason you are getting it wrong is because doing I18N right is non-obvious. There’s tricks there, and rules that apply outside of the normal PHP ways of doing things.

So here’s the unbreakable laws of I18N, as pertaining to WordPress plugins and themes.

Note: This is not a tutorial, as such. You are expected to already be translating your code in some way, and to have a basic grasp on it. What I’m going to show you is stuff you are probably already doing, but which is wrong. With any luck, you will have much slapping-of-the-head during this read, since I’m hoping to give you that same insight I had, when I finally “got it”.

Also note: These are laws, folks. Not suggestions. Thou shalt not break them. They are not up for debate. What I’m going to present to you here today is provably correct. Sorry, I like a good argument as much as the next guy, but arguing against these just makes you wrong.

Basic I18N functions

First, lets quickly cover the two top translation functions. There’s more later, and the laws apply to them too, but these are the ones everybody should know and make the easiest examples.

The base translation function is __(). That’s the double-underscore function. It takes a string and translates it, according to the localization settings, then returns the string.

Then there’s the shortcut function of _e(). It does the same, but it echoes the result instead.

There’s several functions based around these, such as esc_attr_e() for example. These functions all behave identically to their counterparts put together. The esc_attr_e() function first runs the string through __(), then it does esc_attr() on it, then it echo’s it. These are named in a specific way so as to work with existing translation tools. All the following laws apply to them in the exact same way.

So, right down to it then.

Law the First: Thou shalt not use PHP variables of any kind inside a translation function’s strings.

This code is obviously wrong, or it should be:

$string = __($string, 'plugin-domain');

The reason you never do this is because translation relies on looking up strings in a table and then translating them. However, that list of strings to be translated is built by an automated process. Some code scans your PHP code, without executing it, and pulls out all the __()’s it finds, then builds the list of strings to be translated. That scanning code cannot possibly know what is inside $string.

However, sometimes it’s more subtle than that. For example, this is also wrong:

$string = __("You have $number tacos", 'plugin-domain');

The translated string here will be something like ‘You have 12 tacos’, but the scanning code can’t know what $number is in advance, nor is it feasible to expect your translators to translate all cases of what $number could be anyway.

Basically, double quoted strings in translation functions are always suspect, and probably wrong. But that rule can’t be hard and fast, because using string operations like ‘You have ‘.$number.’ tacos’ is equally wrong, for the exact same reason.

Here’s a couple of wrongs that people like to argue with:

$string = __('You have 12 tacos', $plugin_domain);
$string = __('You have 12 tacos', PLUGIN_DOMAIN);

These are both cases of the same thing. Basically, you decided that repetition is bad, so you define the plugin domain somewhere central, then reference it everywhere.

Mark Jaquith went into some detail on why this is wrong on his blog, so I will refer you to that, but I’ll also espouse a general principle here.

I said this above, and I’m going to repeat it: “that list of strings to be translated is built by an automated process“. When I’m making some code to read your code and parse it, I’m not running your code. I’m parsing it. And while the general simplistic case of building a list of strings does not require me to know your plugin’s text domain, a more complicated case might. There are legitimate reasons that we want your domain to be plain text and not some kind of variable.

For starters, what if we did something like make a system where you could translate your strings right on the wordpress.org website? Or build a system where you could enlist volunteer translators to translate your strings for you? Or made a system where people could easily download localized versions of your plugin, with the relevant translations already included?

These are but a few ideas, but for all of them, that text domain must be a plain string. Not a variable. Not a define.

Bottom line: Inside all translation functions, no PHP variables are allowed in the strings, for any reason, ever. Plain single-quoted strings only.

Law the Second: Thou shalt always translate phrases and not words.

One way people often try to get around not using variables is like the following:

$string = __('You have ', 'plugin') . $number . __(' tacos', 'plugin-domain');

No! Bad coder! Bad!

English is a language of words. Other languages are not as flexible. In some other languages, the subject comes first. Your method doesn’t work here, unless the localizer makes “tacos” into “you have” and vice-versa.

This is the correct way:

$string = sprintf( __('You have %d tacos', 'plugin-domain'), $number );

The localizer doing your translation can then write the equivalent in his language, leaving the %d in the right place. Note that in this case, the %d is not a PHP variable, it’s a placeholder for the number.

In fact, this is a good place to introduce a new function to deal with pluralization. Nobody has “1 tacos”. So we can write this:

$string = sprintf( _n('You have %d taco.', 'You have %d tacos.', $number, 'plugin-domain'), $number );

The _n function is a translation function that picks the first string if the $number (third parameter to _n) is one, or the second one if it’s more than one. We still have to use the sprintf to replace the placeholder with the actual number, but now the pluralization can be translated separately, and as part of the whole phrase. Note that the last argument to _n is still the plugin text domain to be used.

Note that some languages have more than just a singular and a plural form. You may need special handling sometimes, but this will get you there most of the time. Polish in particular has pluralization rules that have different words for 1, for numbers ending in 2, 3, and 4, and for numbers ending in 5-1 (except 1 itself). That’s okay, _n can handle these special cases with special pluralization handling in the translator files, and you generally don’t need to worry about it as long as you specify the plural form in a sane way, using the whole phrase.

You might also note that _n() is the one and only translation function that can have a PHP variable in it. This is because that third variable is always going to be a number, not a string. Therefore no automated process that builds strings from scanning code will care about what it is. You do need to take care than the $number in _n is always a number though. It will not be using that $number to insert into the string, it will be selecting which string to use based on its value.

Now, using placeholders can be complex, since sometimes things will have to be reversed. Take this example:

$string = sprintf( __('You have %d tacos and %d burritos', 'plugin-domain'), $taco_count, $burrito_count );

What if a language has some strange condition where they would never put tacos before burritos? It just wouldn’t be done. The translator would have to rewrite this to have the burrito count first. But he can’t, the placeholders are such that the $taco_count is expected to be first in the sprintf. The solution:

$string = sprintf( __('You have %1$d tacos and %2$d burritos', 'plugin-domain'), $taco_count, $burrito_count );

The %1$d and such is an alternate form that PHP allows called “argument swapping“. In this case, the translator could write it correctly, but put the burritos before the tacos by simply putting %2$d before %1$d in the string.

Note that when you use argument swapping, that single-quoted string thing becomes very important. If you have “%1$s” in double quotes, then PHP will see that $s and try to put your $s variable in there. In at least one case, this has caused an accidental Cross-Site-Scripting security issue.

So repeat after me: “I will always only use single-quoted strings in I18N functions.” There. Now you’re safe again. This probably should be a law, but since it’s safe to use double-quoted strings as long as you don’t use PHP variables (thus breaking the first law), I’ll just leave you to think about it instead. 🙂

Law the Third: Thou shalt disambiguate when needed.

When I say “comment” to you, am I talking about a comment on my site, or am I asking you to make a comment? How about “test”? Or even “buffalo”?

English has words and phrases that can have different meanings depending on context. In other languages, these same concepts can be different words or phrases entirely. To help translators out, use the _x() function for them.

The _x() function is similar to the __() function, but it has a comment section where the context can be specified.

$string = _x( 'Buffalo', 'an animal', 'plugin-domain' );
$string = _x( 'Buffalo', 'a city in New York', 'plugin-domain' );
$string = _x( 'Buffalo', 'a verb meaning to confuse somebody', 'plugin-domain' );

Though these strings are identical, the translators will get separated strings, along with the explanation of what they are, and they can translate them accordingly.

And just like __() has _e() for immediate echoing, _x() has _ex() for the same thing. Use as needed.

Finally, this last one isn’t a law so much as something that annoys me. You’re free to argue about it if you like. 🙂

Annoyance the First: Thou shalt not put unnecessary HTML markup into the translated string.

$string = sprintf( __('<h3>You have %d tacos</h3>', 'plugin-domain'), $number );

Why would you give the power to the translator to insert markup changes to your code? Markup should be eliminated from your translated strings wherever possible. Put it outside your strings instead.

$string = '<h3>'.sprintf( __('You have %d tacos', 'plugin-domain'), $number ).'</h3>';

Note that sometimes though, it’s perfectly acceptable. If you’re adding emphasis to a specific word, then that emphasis might be different in other languages. This is pretty rare though, and sometimes you can pull it out entirely. If I wanted a bold number of tacos, I’d use this:

$string = sprintf( __('You have %s tacos', 'plugin-domain'), '<strong>'.$number.'</strong>' );

Or more preferably, the _n version of same that I discussed above.

Conclusion

Like I said at the beginning, we’ve all done these. I’ve broken all these laws of I18N in the past (I know some of my plugins still do), only to figure out that I was doing-it-wrong. Hopefully, you’ve spotted something here you’ve done (or are currently doing) and have realized from reading this exactly why your code is broken. The state of I18N in plugins and themes is pretty low, and that’s something I’d really like to get fixed in the long run. With any luck, this article will help. 🙂

Disclaimer: Yes, I wrote this while hungry.

Shortlink:

113 Comments

  1. Good read. You should add using number_format_i18n() and swap out your %d for %s.

  2. I’m glad and happy to tell “i’m doing it right since first time :D” thanks the codex, i’m a codex_reader 🙂
    See you Otto !

  3. Great, Otto. You’ve finally cleared up _x() for me 😀

  4. Probably one of your first articles I’ve read where I felt more confident coming out of it than I did going in to it. Good read.

  5. There’s a lot of what ifs regarding not using variables for the text domain. Until WP.org or someone prominent actually writes something that a developer actually wants his code to be supported on, then it’s easier for the developer to continue using a variable, constant or static method call. Whatever their preference, a simple search and replace can make it compatible, *if* that time comes. In the meantime, there’s an advantage to thinking along the 37 Signals philosophy and build for the Now, not for a future that might never happen.

    • So, basically your argument is that you’re just going to keep doing it wrong, because you just can’t be bothered to do it right. I mean, you could do your search and replace right now, fix the bug (and it is a bug, BTW), and make your code conform to the actual standard… but no, you’re going to do keep on doing it wrong instead.

      What you don’t get is that your continued refusal to fix your code’s bugs makes it impossible for other people to try to write code that expects your code to be standardized. We can’t develop these tools that would be super-cool-as-shit because we end up having to try to code around your code’s deficiencies. We want to be inclusive, and we’re having to make our code try to deal with your broken code instead.

      If we do develop something along these lines, what we’ll probably have to do is to write code that recognizes your code being broken and then eliminate it from the system. We have to spend time to code around your bugs instead of spending time writing cool things. Thanks for that.

      • So, which “super-cool-as-shit” tools have been built in the mean time? 🙂

        • None, because we’re not there yet. Language packs have not been a priority for core in the last couple versions, and that is sort of a prerequisite for building this sort of thing into the directory and APIs. But it’s getting there, slowly.

          The point is still valid, however. If you’re not using static strings for that identifier, then no scanning tool will work to build potential language packs properly.

      • DEFINE("IRONY","Common Code Smells
        ...
        Excessive use of literals: these should be coded as named constants, to improve readability and to avoid   programming errors. Additionally, literals can and should be externalized into resource files/scripts where possible, to facilitate localization of software if it is intended to be deployed in different regions.
        "
        )
        

        From Wikipedia – Code Smell

        A law that code should be written in a way that is difficult to maintain and likely to introduce actual bugs (as opposed to breaking a convention), instead of requiring the externalisation/centralisation of strings into dedicated resource files (like the rest of the programming world would do), is so very, well, PHP.

        I would have thought any upcoming cool and awesome tools would work a lot better running over dedicated files in a specified format, than parsing code that (as you assert, almost always) breaks the required convention.

  6. Thanks Otto thinking to non english users and to translators. It’s quite annoying to have to run throw all some files to correct and make them ready to translation because the developper doesn’t follow those few rules. A must read and put to pratice article

  7. Otto,
    Would you mind shading some lights on this?–
    For a site which doesn’t need translation at all, Does it make performance a bit better by removing the I18N functions?
    Another question–Does it make things worse if don’t load text-domain and leave the I18N functions all over the site?

    • Regarding speed:
      When a string is sent off to be translated, first the domain for those translations is loaded. This is saved for later, so each domain only gets loaded once, no matter how many strings are translated in that domain. A class called “Gettext_Translations” (which extends the “Translations” class) takes that loaded domain and does the actual string lookup. It’s pretty fast, since it’s just doing some array work to find the proper translation string.

      If the translations don’t exist for a particular language and domain and such, then a class called “NOOP_Translations” gets used instead. This implements the same logic, but without doing any of the array handling. It’s just returning the string as-is. So it’s super quick. So there’s basically no realistic performance penalty for not loading the domain or not having the language files needed.

      In theory, yes, not translating is faster. In practice, the difference will be so small that it’s a bit difficult to measure. If you’re doing one-off or custom code, then may not need to do I18N. If you’re releasing code, then you kinda have to do it, unless you don’t mind alienating a third of the potential userbase.

  8. What a nice service, you seem to write stuff when Im arming up to do stuff in this exact topic! 🙂

    The http://codex.wordpress.org/I18n_for_WordPress_Developers page consequently uses examples like this..
    $hello = __("Hello, dear user!");
    – breaking the I will always only use single-quoted strings in I18N functions.

    Some of us are not that educated in PHP, but stand on the shoulders of others (well, thats civilization for ya) and when examples are bad, thats just how the world goes – unless you write up and make us aware of the errors – so thanks otto.

    • Thing is that double quotes work perfectly well, as long as you don’t break the first rule and use a PHP variable in them. It’s just somewhat safer to always use single quotes, since that ensures that your PHP-trained mind is always remembering that variables won’t be expanded inside the translation functions, and ensures that if you do need to use sprintf with argument swapping, that you don’t accidentally include the $s or $d variables.

  9. Thank you for bring the i18n topic to your readers. Lacking localization is the weak spot of wordpress. In fact, most of plugin authors never localize their code.

    Your Law the First is new to me. I’m happy my own plugins are following your law.

    Additionally to what John James says, people should use date_i18n().

  10. Hi Otto!

    It’s a good thing that you wrote this post. I was actualy convinced that puting a variable inside __() is a good practice, since wordpress.org suggests that:

    http://codex.wordpress.org/Translating_WordPress

    take a look under “Localization Technology”, they say “__($message)” which suggests that you should put variable there. Thanks for clearing this up.

    cheers!

  11. Thank you Otto, very useful information. I honestly had no idea about the variables for strings and domains – good to know that.

  12. Wow! Where do I line up to be flogged for some of my mistakes. Thanks for posting this. It clears up a lot of questions I had about translating our themes. Now, I am off to make some necessary fixes.

  13. Thank you Otto for touching the subject of internationalization!

    As I wrote when I plussed your article: Finally someone with real WordPress authority has written a very informative article on the laws of internationalization (i18n) of WordPress themes and plugins!

    Read it, bookmark it, use it!

  14. […] In it he lays out the 3 laws of internationalization. […]

  15. […] Internationalization: You’re probably doing it wrong […]

  16. Thanks for this. I’m just about to embark on internationalizing my plugins and this information will be incredibly beneficial.

    Can I just check one thing, however. If, say, I have a line of output that has some HTML part way through – let’s say a word is emphasized with bold – would you accept this an exception? For example, is this okay?

    <code_e( 'Part way through the sentence I suddenly emphasis a word’ );

    Thanks,
    David.

    • Ok, will try that example again…

      _e( 'Part way through the sentence I suddenly emphasis a word' );
    • Do you ever have one of those days? Okay, last attempt to get the example right…

      _e( 'Part way through the sentence I <strong>suddenly</strong> emphasis a word' );
    • Yes, that’s okay, because the emphasis in another language might make more sense on a different word.

      • Are you saying that upon translation the emphasis will be lost anyway?

        David.

        • No, but the emphasis in English might not make sense on the same word/phrase in another language. Other languages have different rules of behavior and ways of expressing ideas, so by putting the strong into the translation phrase, you’re basically giving whoever’s doing the translation a chance to change the emphasis around as appropriate.

          This makes sense for emphasis of words inside phrases, but perhaps not for variables inside phrases. In my last example in the post, I talk about making a bold number of tacos. In that case, I want the emphasis on that variable number, regardless of the language. For a word in a phrase that isn’t subject to variation, then I’d want the translator to be able to change the emphasis as appropriate.

          Anything in that first set of quotes is what the translator will be translating, and putting HTML in there will let the translator change it.

    • Taking this one step further. If the line for translation includes a link, what’s the recommended way of dealing with this?

      As has already been stated the word(s) being used in the link may not be as appropriate after translation, so you want this to not be fixed. However, bearing in mind that you may be passing the translation over to a third party to do for you, is a full link HTML appropriate? Should you use the HTML or use something such as SPRINTF?

      David.

      • It depends on the context, actually.

        If the link is separated from any text surrounding it, then it would make sense to only translate the text of the link and not the HTML of the link itself.

        If the link is embedded into a paragraph, then pulling the link out separately would be bad, since you’re breaking law number two up there.

        You could sprintf it into place if you didn’t want the translator to be able to change the link itself. But this might not work for the resulting language if he had no meaningful set of words in the resulting paragraph to wrap the link around.

        So, it’s kind of a toss-up, depending on specific circumstances.

  17. On a more generic note, when I recently looked into translation of my plugins I was surprised to struggle to find adequate documentation. There’s plenty on how to make the internationalisation changes required – however, the actual translation always appeared to be lacking.

    Some instructions told you how to generate a .POT file but only that but most others told you how to create .PO and .MO files only.

    I guess what I was after was something a bit more complete – covering all the files, what they’re actually used for (something else that is rarely mentioned) and how to generate each type.

    David.

    • POT = Portable Object Template. Generating a POT file can be done with a utility called “xgettext”, although there are many other programs that can do the same basic thing. xgettext comes with the gettext package. The POT file is what you give to a translator to fill out.

      PO – Portable Object. POT and PO files are the same basic thing, it’s just that the POT file has blank spaces for the translations, and the PO file is what you get after it’s been translated into a language (somebody filled in all the blank spaces). They will probably use a tool like POEdit to produce the PO file from the POT file.

      MO – Machine Object. The MO file is what you get when you “compile” the PO file. It’s basically a binary version of the PO file. The “msgfmt” command in the gettext tools is used to compile this file. The MO file is the only thing that actually matters to WordPress, and it’s the only part that you have to ship with the code.

      The file naming scheme is important for WordPress as well. For plugins, the file is expected to be named “textdomain-locale.mo” where “textdomain” is the domain of the plugin and “locale” is the locale string. For example, my SFC plugin uses “sfc” as the text domain. So a German translation would be “sfc-de_DE.mo”. Some locale strings are five characters, some are just two (although we should standardize and make them all five, IMO).

      For themes, the text domain isn’t used by WordPress in the filename scheme, it just looks for locale.mo in the theme’s base directory, or whatever path is specified by the load_theme_textdomain call.

  18. Otto, thanks again for a great post. I love that you write just like you speak.

  19. Thanks for the article, I have thought about starting a wordpress site that is in german, I need to look for some plugins!

  20. Hey,

    I was wondering if you could help me out with a small problem. I am interested in a multilingual homepage. The content for the homepage is inputted from the theme options. I tested the following code and it worked with a .mo file. After reading your article I realized it was a very good idea. Could you tell how I should do it?

    Thank you

    echo __($options['cta_text'],'theme-options');

    • That doesn’t work. You’ve broken rule one.

      How do you build the POMO files from that? The text isn’t in the __() function, so any scanning tools won’t be able to know what text to translate.

      • I understand that I broke rule one with that. Sorry I forgot to write “not” in the second last sentence. I meant not a very good idea. Sorry!

        I created a custom POMO for it. There are only six custom fields creating only six strings all in the same file.
        e.g
        #: home.php
        msgid “Call to Action English”
        msgstr “Call-to-Action Deutsch”

        The custom po file is in child theme so that it does not interfere with the main POMO files. The reason for this is that everyone is going to input their own text and will need a way to translate it if they want a multilingual site.

        If you look in the image you see that I have entered Call To Action English. So the text placed in the home.phph file with the code I used above and then this text was translated.
        https://www.dropbox.com/s/v7hke904zu4ssgx/Capture.JPG

        • Well, it doesn’t really make any sense to use this type of functionality to translate “content”. If the creator of the website is translating it themselves anyway, then just give them a place to type in the translated text and save it in the database.

          The whole point of the POMO files concept is that the end-user doesn’t need to translate the text themselves. They download the pre-made translation file, set a few settings, and it all translates. If the end-user has to put in their own translations, then you can save it in the DB same as any other text they’re putting in.

  21. I bow humbly before you, since I have broken at leats one rule that I know of. Your article prompted me to go back to my older plugins and I found that I did indeed use PLUGIN_DOMAIN as domain name constant in one of them.

    Moreover, I found your explanations on the _n() and _x() functions very enlightening. Thanks for this great write-up!

  22. Hello Otto.

    First, thank you to recall all these laws.

    3.4 update is just done, but translated versions of defaults themes are not included in the automatic update. An old ticket already described this: http://core.trac.wordpress.org/ticket/18960

    Could you have a look on that?

    Thank you.

  23. Hi Otto,
    Thanks for a good & solid explanation. I am guilty of breaking above laws (putting an end to that).

    Thanks again.

  24. Otto, could you elaborate a little more on on to best handle these types of strings, maybe give a couple examples of best practices? I’m helping to update the strings in the Thematic Theme and we’d like to not be “doing it wrong”. For example we have a line in our post’s footer that looks like

    $string = '<a class="comment-link" title="Post a comment" href="#respond">Post a comment</a> or leave a trackback : <a href="' . get_trackback_url() . '" title ="Trackback URL for your post" rel="trackback">Trackback URL</a>';
    

    Are there 4 discrete strings here that I can sprintf into place, or should I try to maintain the entire string as a unit, complete with its markup?

    • I’d try to keep the 4 phrases separate but intact as possible. They’re all related, but not part of the same phrase, and the HTML doesn’t need to be altered by the translation process.

      I’d probably do something like this:

      $string = 
      sprintf( __('%sPost a comment%s or leave a trackback', 'theme-domain'), 
      	sprintf('<a class="comment-link" title="%s" href="#respond">', esc_attr__('Post a comment', 'theme-domain')), 
      	'</a>') 
      . ' : '.
      sprintf('<a href="' . get_trackback_url() . '" title ="%s" rel="trackback">%s</a>',
      	esc_attr__('Trackback URL for your post', 'theme-domain'),
      	__('Trackback URL', 'theme-domain')
      );
      

      Essentially you have two separate strings here, each with a substring.

      The first string is “Post a comment”, and it’s a whole phrase in the title. On line 3 you can see where I’m pushing it into the initial anchor link. Note that I use the esc_attr__ function to translate it, because it’s going into an attribute. This isn’t strictly necessary for hardcoded strings, but it’s a good thing to get in the habit of doing anyway. Where a thing is going to be in the end matters.

      The second string is “Post a comment or leave a trackback”, with just the first part linked. I elected to wrap the linked words in %s’s and use sprintf to bring the link itself in there. This gives the translator the ability to move the link around on the words, but not actually change the link itself. Of course, this required a nested sprintf, since the link contained the first translatable string.

      The third and fourth strings were relatively straightforward, as you can see. I translated them independently, then sprintf’d them into place in the link structure. No special handling was needed since the phrases were whole and not broken up by anything. Again, I uses esc_attr__ for the one that was going into an html attribute.

      The colon in the middle was just concatenated in, and the whole thing was assigned to $string.

      • Otto,

        wow! that is a great and thorough answer. I really appreciate it and think it will go a long way in our string clean up. I didn’t know about sub-strings, but they seem like they’ll help keep markup out of the translatable part of the string. Thanks a lot!

        cheers,
        -kathy

      • Thanks for that walkthrough. I didn’t know about esc_attr__ before. I am also involved in the string updating of Thematic and I wonder about rtl languages. Wouldn’t it be better to include the colon and last trackback url in the first string, so that the order could be swapped around if needed? Like

        $string = 
        sprintf( __('%sPost a comment%s or leave a trackback: %s', 'theme-domain'), 
        	sprintf('<a class="comment-link" title="%s" href="#respond">', esc_attr__('Post a comment', 'theme-domain')), 
        	'</a>',
        	sprintf('<a href="' . get_trackback_url() . '" title ="%s" rel="trackback">%s</a>',
        		esc_attr__('Trackback URL for your post', 'theme-domain'),
        		__('Trackback URL', 'theme-domain')
        	)
        );
        
        • Sure, that’s a valid consideration. However, in that first string, I’d specify argument swapping values, because the argument order to sprintf doesn’t change even if the RTL order does.

          So it would be this for the first string:

          __('%1$sPost a comment%2$s or leave a trackback: %3$s', 'theme-domain')
          

          This didn’t matter so much for the link order bits, because the link will always have the a in front of the /a, but if you’re including orders that can change, then specifying the numbers is a clue to the translator as to what’s going on.

          • Yes of course. And I suppose also include a translators note so they know that 1 and 2 can’t change order. Thanks again, now off to do some multiple sprintf exercises. 🙂

            • Translators tend to be smart, but yes. If you need to include a note, use the _x translation function.

              _x('%1$sPost a comment%2$s or leave a trackback: %3$s', '1s and 2s are the a href link wrappers, do not reverse them', 'theme-domain')
              
  25. Hi Otto,

    Another translation question for you. A user who wanted to work on the german translation of thematic asked us:

    “Should we include HTML-Entities for umlauts?”

    And since I didn’t have the faintest idea I was hoping you could illuminate me. Should translators use UTF-8 or HTML entities for accents and special characters?

    Thanks,
    -Kathy

    • Most likely, I’d say HTML entities if the output is going to be in the HTML code and you’re not escaping it in some manner. UTF-8 is a valid option if it a) works and b) works everywhere. The kicker there is B, of course, it might not work on every site that you attempt to apply it to, while entities will almost surely always display correctly.

  26. […] Internationalization Fun September 19, 2012, 5:10 pm So in my last post about Internationalization, I covered some non-obvious things that you should consider when adding translation capabilities to […]

  27. I have a few questions that as far as I can tell, no one has discussed them before, and although some experimentation might provide me with something working, it might not be appropriate at all (I’m one of those that were using a constant instead of a string domain). Here it goes then:

    1) How legal is it to use multiple domains within a file/theme/plugin? E.g. a widget bundled in a couple of themes. Its file copied across unchanged, so, why not load its own .mo files since they may be already translated?
    2) If the above is legal, how to load the separate .mo files?
    3) Is it legal to use gettext calls without a domain, as long as the strings are registered by WordPress itself? E.g. it’s counter-productive for everyone involved having to translate the word “Post” over and over.

    Thanks,
    Anastis

    • Any thoughts on my questions above? 1 and 3 are my main concerns.
      About 2) of course the .mo files file can be loaded by load_textdomain(), just asking if there is any other “proper” way.

    • 1) A theme/plugin should only contain its own strings, and one domain. Sharing domains across multiple-things leads to confusion. What happens when there is a conflict? You’ll run into versioning issues. Strings change over time. Things like that.

      3) No, because of the same issue. What happens when WordPress changes strings? Tying your plugin/theme’s translations to the core translations isn’t the best idea. Yes, it’s duplicated work, but it’s not all that bad. Better to keep them separate because different things can be different sometimes.

      Also, your example of “Post” isn’t a good one. What does “Post” mean? It’s uncertain whether you’re using it as a noun or a verb, even. Confusion like that can come from using the wrong core string as well.

      • Thanks for your response.

        Indeed, “Post” without _x() isn’t a good example, but you got my point.

        I never thought of what would happen if the dependency of the translations changed, so yeah, your answer makes sense and covers me completely.

        Cheers!

      • Exactly what I was looking for, thanks! Maybe worth posting a seperate article about this topic? I see there are lot’s of questions about this specific topic which sometimes you also answered at places like stackexchange.

  28. […] –  Pig Latin : If you activate this plugin will believe that WordPress is broken. All words are jumbled or letters inserted, nothing is understood. Used to check if our theme is ready for translation. Those words appear correctly, it means that we have not prepared for translation. Then I recommend reading this article . […]

  29. I was curious how the combined form of the tacos and burritos would best be written taking _n() and argument swapping into account? As websites have become so dynamic in content and as most clients want message output to be spoken language, these situations seem to happen with increasing frequency. Thanks so much for the article. It is incredibly helpful.

    • Yeah, that’s a bit of a problem. _n only accounts for a single pluralization, not two separate ones. Which means that when you combine two pluralizations in a single phrase, you just can’t translate it without breaking one of the laws.

      Generally speaking, when a situation like that crops up, you should sit down and rethink your phrasing instead. If it’s necessary to combine things like that, then maybe a list format would be a better presentation to the user or some such thing.

  30. While explaining how to do it right, maybe you should add that it would be much beneficial if people add the two extra rows to the docblocks in a plugin or theme :
    * Text Domain: my-domain
    * Domain Path: /lang

    • I see these a fair amount, but they’re not standard by any means. Nothing in the core reads or uses these fields. Nothing on the WordPress.org website reads or uses them either. It doesn’t matter whether you add them or not.

      • As much as I recognize you as a high authority on all matters “wordpress” (and i really do..) , I found this otherwise (maybe wrongly).
        In my experiments , core does read it , and in fact , in has some influence on the $domains global in i10n, or at least on the ORDER in which the plugin / theme domain is being read.
        Also, without those – the DESCRIPTION of the plugin, the name , or any other thing in the DocBlocks , can not be translated (by core).
        I will be happy to send you some code to verify this .

        • Okay, you’re correct. Those elements are indeed read by core and used for the translation of the header parts of the plugin/theme. That behavior was added in 3.4, and so I’ve never really seen or used it.

          However, they are not required elements by any means and translation of the rest of the plugin works perfectly fine without them. The only need for them is for translating those header elements. If you do not desire to translate those, then these two extra headers are wholly unnecessary.

  31. […] addition to this, Otto has written two very in-depth articles about the pitfalls of i18n as well as an overview of the lesser known translation […]

  32. Very nicely structured article, I enjoyed reading it a lot! From my experience, I would like to recommend some tools that would help the internalization process a lot. This: https://poeditor.com/ is a very nice localization online software that offers a few nice features such as setting a reference language for translations or using automatic translation from google or bing. For wordpress it has an extra plugin which you can download from the wordpress plugin list, which allows you to work directly from your blog.

  33. […] Internationalization: You’re probably doing it wrong […]

  34. Thanks Otto for the great work in this post and several plugins to WP.

    After preparing all the strings for localization and having my Theme/Plugin ready, I usually make the first PO file (rename it to a POT file) using Codestyling Localization plugin, and then, send the POT file to Transifex. It’s basically an online PO editor with team management for several languages. With one account I can manage, coordinate and/or translate several projects simultaneously. Powerful and easy to use, a must-have tool to any serious multi-language project.

    It’s free to open-source projects, and paid for private projects.

    http://www.transifex.com

  35. Hi,,

    Wery good article.. But problems to start: I have wp 3.6-fi, Poedit 1.5.7, I use nplurals=2; plural=n != 1; and UTF-8. I try to modify one Finnish wording in fi.po and that is wery fresh from just installet package, when I try to save it I get:

    21:08:06: C:\Tarmo\Ilmailu\SIK\WEB\all_wp_config_files\lanquages\fi.po:4375: `msgid’ and `msgstr’ entries do not both begin with ‘\n’
    21:08:06: C:\Program Files (x86)\Poedit\bin\msgfmt.exe: found 2 fatal errors ?

    Plase tell me what there is happening 🙂

  36. […] Internationalization: You’re probably doing it wrong […]

  37. I came here to learn how to handle HTML tags in the translation string but I learnt a LOT of other things.

    You work wonders even when you’re hungry 😀

  38. […] will be translated. All three of them do the exact same thing. Check out the WordPress Codex, read this post from Otto, and check out the video from Lisa Sabin-Wilson (below) to learn how to properly […]

  39. Solid post Otto. I love the tip on shifting HTML markup from the translation string to the arguments.

  40. […] as it is), or even language packs. I strongly recommend you read Otto’s take on it all. He said it much better than I ever […]

  41. I don’t understand the need for a Domain since everything will be related to the WordPress blog or the host website.

    • The textdomain provides separation between different pieces of the code.

      A plugin or theme can have strings, and those strings need to be translated separately because they’re separate modules. Their strings won’t be in the translation files for the main WordPress core code. Having a domain allows their files to be separated and thus allows their strings to have their own translations.

      Considering that it is infeasible to download a translation file which contains every possible string for every possible piece of code, then it makes sense to only download the strings for the pieces of code you’re actually running.

  42. […] WordPress Plugins and Themes: Don’t Get Clever” and Otto’s post “Internationalization: You’re probably doing it wrong” Here are some quotes from Mark’s […]

  43. […] Have you know? 45% WordPress downloads are for non-English. Making your Theme translation ready is strongly recommended. Read introduction about Internationalization in WordPress Handbook first. I prefer Otto’s the pitfalls of i18n. […]

  44. Great article Otto:-) but still it leaves me with questions.

    I have a theme with many settings in admin area. To give some numbers – there are 800 strings that only appear in admin panel, and 100 of strings that appear in front-end of site.

    Users are usually interested in translating only these 100 front-end strings. So to make their live easier I have used 4 different wrapper functions for __() and _e(), so I can later distinguish them while collecting strings with Poedit. They looked something like
    frontend__($string){ __($string,'theme_frontend'); }
    frontend_e($string){ _e($string,'theme_frontend'); }
    backend__($string){ __($string,'theme_backend'); }
    backend_e($string){ _e($string,'theme_backend'); }

    It breaks rules but gave me two separate POT files. Now I want to do it correct way and I would like to know limits of what can I achieve.

    Question:
    1. Can I use separate text domains to in such context?
    2. If yes, then do you know any tool that can collect strings with respect of text domain? Poedit can’t do it, I have even asked Poedit author about it, and only thing he advised is to put this different translation groups in different directories, but it is not so easy 🙂

    Thanks for any opinion 🙂

  45. […] which you can use depending on what you need. For further reading, I definitely suggest post by Otto in which he covers the topic in more […]

  46. I have read your article, and I have a question of a translator TO a foreign (to you) language.
    What to do when you need to translate a string in a theme which is NOT inside the php but in an HTML expression, and is therefore not read by the translator? In addition my site is bilingual, so while I may just go in and translate directly in the child theme’s file, I cannot do it here: I need both the original and the translated versions.
    Example:
    `Written by <a href="” title=””> `
    There is a string there “Written by”.
    How to get it into the translation?
    Thank you,
    Vera

  47. Hi, I see you are using sprintf everywhere, why not printf?

    • Because of the nature of WordPress filters. Generally speaking, you’re not creating output immediately, but returning a string from a function hooked in somewhere, or things of that nature. So, you end up using sprintf more often than printf. Obviously, you would use whichever is appropriate, it’s just that sprintf is more common.

      • I will understand one day what is the exact difference, for now that is too complicated I think. But what I did noticed is that WooCommerce is using printf in their woocommerce email template files. As you probably know we can overwrite this ourselves by creating the same files in our child theme folder which I did. I replaced printf with sprintf and that simply doesn’t output anything anymore.

        Also I followed exactly your advice in the comments about links, but the emails can’t handle this as well for some reason.

        <p><?php printf( __( 'Heb je een vraag of opmerking? Neem contact met ons op via %sinfo@keratinehaarproducten.nl%s of bezoek onze %sklantenservice%s en %sinformatie%s paginas.', 'wpbootstrap-child' ),
        printf( '<a href="mailto:%s">', esc_attr__( 'info@keratinehaarproducten.nl', 'wpbootstrap-child')), '</a>',
        printf( '<a href="%s">', esc_attr__( 'http://keratinehaarproducten.nl/klantenservice/', 'wpbootstrap-child' )), '</a>',
        printf( '<a href="%s">', esc_attr__( 'http://keratinehaarproducten.nl/informatie/', 'wpbootstrap-child' )), '</a>' ); ?></p>

        It will end up really funny, the second and third link not working at all, %s replaced by numbers for some reason as well. Hope you can help but it seems I have to break the rules for now again and put the HTML back into the string again :-(.

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Need to post PHP code? Wrap it in [php] and [/php] tags.