C Internationalization made easy through GNU gettext
<kiko at async.com.br>

August 2000 (Updated March 2003)

GNU Gettext is ridiculously easy to use and works wonders; however, there is little documentation on how to use it beyond the rather verbose GNU gettext manual. I outline here a simple tasklist for internationalizing your applications so you don't have to suffer too much to learn how to use it. The tasklist is for C (and thus, C++), but if your language has support for gettext it is bound to function very similarly.

Gettext has been built into glibc2/libc6, so there is no need for linking with libintl (-lintl) any more if you're on a Linux glibc system such as Slackware 7. If you're a platform other than linux you will need to link with libintl, which is included with gettext; look for the latest version at here.

  1. Alter your source to include setlocale(), bindtextdomain() and textdomain(). These lines require some commenting:
    setlocale(LC_ALL,"");
    LC_ALL is a catch-all Locale Category (LC); setting it will alter all LC categories. There are other, specific, categories for translations; for example LC_MESSAGES is the LC (LC) for message translation; LC_CTYPE is the category that indicates the character set supported.

    By setting the locale to "", you are implicitly assigning the locale to the user's defined locale (grabbed from the user's LC or LANG environment variables). If there is no user-defined locale, the default locale "C" is used.

    bindtextdomain("foo","/usr/local/share/locale/");
    This command binds the name "foo" to the directory root of the message files. This is used to specify where you want your locale files stored; using the standard /usr/local/share/locale or /usr/share/locale is a good idea. "foo" should correspond to the application name; you will use it when setting the gettext domain through textdomain(), and it corresponds to the name of the file to be looked up in the appropriate locale directory.

    The bindtextdomain() call is not mandatory; if you choose to install your file in the system's default locale directory it can be omitted. Since the default can change from system to system, however, it is recommended.

    textdomain("foo");
    This sets the application name as "foo", as cited above. This makes gettext calls look for the file foo.po in the appropriate directory. By binding various domains and setting the textdomain (or using dcgettext(), explained elsewhere) at runtime, you can switch between different domains as desired.

  2. Mark strings for extraction in your C source:

    Substitute string references such as

    printf("foo");
    for code using gettext():
    printf(gettext("foo"));
    To make things simple, _() (the underscore function) is often defined as shorthand for gettext():
    #define _(str) gettext(str)
    printf(_("foo"));

    [This has impact on the way you call xgettext; check for this on next section]

  3. Extract these strings using xgettext: xgettext scans your source code and creates .po files that contain the messages to be translated based on the strings in your source code. It does this by checking which strings are wrapped in gettext() in your source; if you use a macro for gettext() such as _(), you must invoke xgettext specifying another keyword with the argument -k as in "xgettext -k_".
    xgettext -k_ foo.c -o foo.po
    This will create the file foo.po file with the messages marked in your sourcefile. Based on the source file

        #include <libintl.h>
    
        #define _(str) gettext(str)
    
        int main() {
            setlocale(LC_MESSAGES,"");
            setlocale(LC_CTYPE,"");
            bindtextdomain("foo","/usr/local/share/locale");
            textdomain("foo");
    
            printf(_("foo_in_english"));
    
            printf(_("bar_in_english"));
        }
    	
    the generated .po file will look like
    
        # [ommiting comments and meta-definitions]
    
        #: foo.c:10
        msgid "foo_in_english\n"
        msgstr ""
        
        #: foo.c:12
        msgid "bar_in_english\n"
        msgstr ""
    
    	
    This file should be used as a template for all translations you will perform. The .po file is a simple key-value database: each msgid field contains the initial (default) string for the C (default) locale, and msgstr contains the translated string. Gettext is smart in using msgid as a key to access the message translation; this reduces enormously the work you'd have in modifying the source code and indexing the translations.

  4. Make a copy of this .po file for each language you want, and translate the strings into the target languages. Your translated strings should look like this:
    
        # [Omitted target language headings, etc]
    
        #: foo.c:10
        msgid "foo_in_english\n"
        msgstr "foo_in_target_language\n"
    
        #: foo.c:12
        msgid "bar_in_english\n"
        msgstr "bar_in_target_language\n"
    	
    Note here that you should *not* have a msgid of "". gettext("") returns the header information in the po-file and that isn't what you want.
  5. Generate .mo files with msgfmt; this does the generation of the machine-dependant message catalogs.
    msgfmt foo.po -o foo.mo
    If you have trouble running msgfmt, you might want to use the -v option to it, which increases verbosity and shows which errors might have happened in more detail.

  6. Place the message file in the proper place. The directory hierarchy is as follows:
    <LOCALE_ROOT>/<LL_CODE>/LC_MESSAGES/
    <LOCALE_ROOT> is the directory you set to your domain in your bindtextdomain() call. LL_CODE is the ISO 639 code for the language you're providing a translated catalog. The part is the domain name you've set using textdomain(). In our example, the text domain is "foo" and the locale root is /usr/local/share/locale; a German (ISO 639 code "de") message catalog should be copied to

    /usr/local/share/locale/de/LC_MESSAGES/foo.mo
    As a sidenote - if you're developing a standalone test and don't want to install the message file, simple use bindtextdomain to point elsewhere. If you use a non-absolute directory it will base it on your current path:
    bindtextdomain("foo","intl");
    will have gettext() search for the message catalogs in the hierarchy rooted inside the directory intl on your current working directory.

  7. Test your set up by setting LC_MESSAGES to an ISO code for a language you've translated to, recompiling and running your program. It should work, and if it doesn't, 'strace -eopen' is your friend. The most common mistake I've encontered is placing the file in the wrong place, and stracing will certainly help you see where libc is looking.

  8. For other languages, basic items to check are:

    1. Is there gettext support in the API?
    2. Is there an easy way to extract strings from the source code (In other words, is there the equivalent of xgettext for the language?)

    There shouldn't be much difference towards C gettext other than that.

    glade and libglade now support i18n as well, and even include a libglade-xgettext for extracting strings from the XML; building an internationalized GTK interface is now a trivial task.

    PHP (3.0.7 onwards) also includes a full set of calls to gettext (including a _() gettext() alias), making internationalizing Websites and applications straightforward and portable.

    Python has full gettext support. Version 2.0 onwards includes full Unicode and intl support inbuilt; if you are using 1.5.2 or earlier you can use Martin von Loewis' standalone intl module.. For PyGTK issues on internationalization see the PyGTK FAQ