I’ve been playing with the concept of creating a multilingual site and after polling the experts I started toying with php’s implementation of gettext. What a ball-ache.
Ok, so some things you need to know before we start.
- L10n stands for Localisation or Localization… the abbreviation is therefore obvious due to the words length and different spellings.
- gettext is a GNU standard. I shouldn’t need to explain why standards are cool, but needless to say, there are plenty of tools to make using gettext a lot easier than a “roll-your-own” solution.
- The PHP implementation of gettext is good when it’s working. But when it’s not working it’s like a scorned girlfriend — it will *not* tell you what is wrong. You have to figure it out yourself.
- Your solution might not be *exactly* the same as mine because gettext relies on system locales, which in turn are structured differently on pretty much every linux distribution… having said that I’m sure the stuff you’ll read here will get you going.
Firstly, why gettext? Besides the fact that it’s a standard, gettext is good because:
- gettext’s database files (.mo) are indexed and compiled.
- PHP’s implementation is written in C or C++ by programmers who are probably better than you and I at writing efficient searches.
- The gettext domain (your strings) are cached by the implementation so it really is quite fast.
Ok, so let me run through quickly how it’s meant to work, starting with some code
echo gettext("Hello World!");
Line for line:
- setlocale tells php which locale to use; in this example I’m using af_ZA (Afrikaans)
- bindtextdomain tells php which domain to look for and where your gettext locale folders are. A “domain” is really just a collection. In this case we’re arbitrarily calling it “messages”.
- textdomain is telling php which domain to use from now on. (seems redundant I know but I assume you might be able to bind multiple text domains)
- gettext will look to see if it can find a translation for the locale you set earlier (af_ZA) for this index. gettext’s index is the original string in the original language. In this case we originally used the phrase “Hello World!”. If the locale can’t be found or there isn’t a translation for this index in the locale, gettext will return the original language — in this case “Hello World!”.
- What is really important to note at this point is that there are a million or so variations of this code on the Internet. This is the stripped down version that works fine under PHP5 and Apache2.
Next we have to create our locale directory structure. This is what it looks like:
/locale /af_ZA /LC_MESSAGES messages.po (You'll create these later) messages.mo (You'll create these later)
Obviously you would create one per language you are wanting to support. I therefore have one for en_ZA and one for af_ZA.
Next we need to create our .po file. The .po file is the unindexed, uncompiled “language” file. Basically it has human readable plaintext in it. If you had a hoard of translators working for you you would send them your .po file/s, which they would add their translations to and then send back.
The important stuff in a .po file is:
msgid "Hello World!"
Now, the cool thing about using gettext is that there are tools to generate a messages.po file from .php files automatically. The following command will scan all php files looking for references to gettext and will generate messages.po file for you.
xgettext -n *.php
The next step is to copy your messages.po file into your LC_MESSAGES folders. Once you’ve copied them you can make the change to the af_ZA one.
msgid "Hello World!"
msgstr "Hello WÃªreld!"
Next you need to compile both your messages.po files by running the following command in the respective directories. This command will output a binary, indexed messages.mo file.
Once you’ve done that you should be able to rerun your code and it should give you the afrikaans version… although it won’t because here’s what you don’t get told. gettext will only work for locales it recognises, and even though you selected South Africa when you installed your ubuntu, it still doesn’t know what Afrikaans is. Enter the myriad of confusion solved, as usual, by one simple command.
This will, if you’re on a newish debian-ish box, enable the af_ZA locale and then, after restarting Apache (remember it caches), your app should be speaking Afrikaans fluently.
4 thoughts on “L10n, gettext, php5 and Afrikaans”
If I’d been on IRC, I’d’ve warned you about the last one. Made us have to use a totally different solution in KnowledgeTree, since we couldn’t rely on our users being able to run that command on their systems (ie, they might not have root).
Is that the only reason you chose to use something else?
What was the solution you used?
How is the language implementation. By that I mean, is the translation & sentence structure pretty accurate?
Obviously we are not talking 100% but if we can get sentence structure and basic translation correct then the potential for any site to reach any market is limitless.
Mind shedding some light on any results you have seen.
There were other benefits to doing it ourselves, but the main reason was that the locales only worked if the system understood the locale.