The Language Database Mechanism

The jstools applications (and other well-behaved applications built with the jldb package) support localisation of text (labels, command names, prompts, menu choices, etc.) presented by the application. That is, just because the author of the application wrote it in English (or German) doesn't mean you have to use it in English (or German) - it can be customised to display it's messages and labels in any language.

This document describes the language database mechanism implemented by version 0.1 of the jldb package. This document focusses on the use of language databases by the jstools applications, which require Tk, but the jldb package itself does not require Tk, so this mechanism can be used for non-graphical applications as well.

Introduction

When jstools applications display natural-language strings, they look them up with a key in a database, called the language database. They then display the result. So in order to make jedit display strings in Romanian, you need a Romanian natural-language database for jedit. (Actually, you also need a natural-language database for the strings displayed by the jstools libraries, which jedit uses, and a font capable of displaying Romanian accented characters.)

Assuming the application used the jldb mechanism consistently for displaying strings to the user, this database can be provided by the author or distributor of the application, by the person who installed it at your site, or by you, the end user of the application. Moreover, if you already have a database that has most of what you want, but you want to make a few changes in it, you don't need to create an entirely new database, you can just create one that has the changes you want in it and inherit most of your strings from the `parent' database. (You could handle differences between Australian and US English this way - or you could just change a few strings to match terminology used at your site, or to make them more concise or more descriptive.)

The Language Database Hierarchy

Language databases are organised into a tree-structured hierarchy according to how specific they are. This hierarchy is reflected in their names.

The least specific form of a language database name is just the official ISO 639 two-letter abbreviation for a language, such as en for English or fr for French. Optionally, a two-letter country abbreviation can be appended, separated by a period, such as es.ar for Argentine Spanish or la.fr for Latin as spoken in Gaul. (The country abbreviation is the same as a two-letter top-level Internet domain name, which mostly agrees with the ISO 3166 standard, except for uk instead of gb for Great Britain.) This can optionally be followed by another period and an arbitrary string to identify a more specific language practise, such as en.us.mit for English as used at the Massachusetts Institute of Technology. The process of tacking on more specific identifiers can continue indefinitely, so in theory you could specify US English as spoken by the President as en.us.dc.whitehouse.president, but in practise I suspect three levels - language, country, and perhaps an additional specifier for an organisation, a region within a country, or an individual - will be enough for most people.

When an application starts, it tries to read in your preferred language database, and all more general `parent' databases, starting with the most general database. For instance, if your preferred database is en.uk.susan, then the application will first try to read in the en database, then the en.uk database, and finally the en.uk.susan database. That way, strings specified in en.uk override strings in en, and strings in en.uk.susan override strings in either en or en.uk. Only those strings which differ between UK English and other varieties of English need be specified in en.uk, and Susan only needs to specify those strings she wants to change in en.uk.susan.

It's perfectly fine for not all language databases in the hierarchy to exist. For instance, if Sam's language preference is en.us.mit.sam and the are language databases called en and en.us.mit.sam for a particular application, but no databases called en.us or en.us.mit, then both existing databases will be read, and strings in en.us.mit.sam will override definitions in en.

If no language database at all is found for an application, it will use defaults specified by the application's author.

Where Databases Are Found

Language databases are searched for in the following directories, in order: where app is the name of the application or package, and jstools_library is the place where the jstools library files are installed, typically /usr/local/jstools/lib. [This list isn't quite correct at the moment, but the important point is looking in the user's home directory as well as in a site-wide location.]

The directory path above is searched for each database (e.g., for de and then again for de.ch), but when a particular database is found in one of these directories, it is used, and the remaining directories are skipped for this database (although they may be searched again for a more specific database). An example will make this clearer:

An Example

Suppose you are running an application called notebook and you have a database called en.us.bill in your ~/.tk/jldb/notebook directory and there's also a system-wide English-language database for the application in /usr/local/jstools/lib/jldb/notebook/en (but no en.us database). If your language database preference is en.us.bill, then when it starts, the notebook application will first look for a database named en. It will check your ~/.tk/jldb/notebook and ~/.tk/jldb/default directories, but won't find anything. It will then check /usr/local/jstools/jldb/notebook and find an en database there, so it will read it in. It won't check /usr/local/jstools/lib/jldb/default, because it's already found a database. Then it will search for a database named en.us, but (for our example) no such database will be found in any of the directories. Finally, it will search for a database called en.us.bill. This will be found in the first directory checked: ~/.tk/jldb/notebook, so it will be read in and none of the other directories will be checked. The net result is that, presumably, most of the strings in the application are defined in the file /usr/local/jstools/lib/jldb/notebook/en, but some of them are overridden by definitions in the file ~/.tk/jldb/notebook/en.us.bill. (Alternatively, you could have copied the entire en database for notebook to your directory and modified it, but it's better just to override the strings you want to change.)

Application and Library Databases

Actually, even the above is a slight simplification. When an application that uses the jstools libraries starts up, it actually reads in two sets of databases: one for the jstools libraries and one for the application itself. The library database is searched using jstools as the application/package name (i.e., it's searched for in ~/.tk/jldb/jstools, ~/.tk/jldb/default, jstools_library/jldb/jstools, and jstools_library/jldb/default). Strings from both sets of files are loaded into the application. The jstools library database is read first, so strings in any of the application-specific databases will override strings in any of the jstools library databases. (E.g., a string in notebook/en can override a string in jstools/en.us.dc.whitehouse.) It is intended, however, that there should be no conflicts - the jstools databases are for strings used by procedures in the jstools library, and the application-specific databases are for strings used by the application code itself.

Creating a Database for a New Language

To create a database for a new language for a jstools application, you should copy an existing database (provided by the application author) to a new name, and modify the natural-language strings. You should use the two-letter language code in the ISO 639 international standard for your language as the name of your new database. You will then need to choose the appropriate database as your preferred language in the jstools Global Preferences panel in order to see the new strings. You need to do this both for the application's own database and for the database for the jstools libraries.

For instance, if you wanted to translate the strings used in jdoc into Welsh, you would copy the provided English-language database from (perhaps) /usr/local/lib/jstools/jldb/jdoc/en to /usr/local/lib/jstools/jldb/jdoc/cy (because cy is the two-letter code for Welsh in ISO 639) and replace the English strings with Welsh ones. (Alternatively, you can copy it to ~/.tk/jldb/jdoc/cy, in case you don't have write permission under /usr/local.) You'd also need to copy and modify the jstools library database, (e.g.) /usr/local/lib/jstools/jldb/jstools/en, unless you'd already done so for a different application.

For details of the syntax used in language databases, you should see the documentation for the ::jldb::set_strings procedure in the jldb package. To summarise, however, the bulk of the database file consists of a series of entries, where each entry starts with a key. The key is fixed and should not be changed, even if it looks like a natural-language string, because it's how the application looks things up in the database. The second element of an entry is a string, which is the natural-language expression corresponding to the key in this particular language. (The string can contain embedded references to Tcl variables or Tcl commands, which will be substituted before the string is used.)

Some keys have up to three additional values as well as the string: a numeric underline position in the string (used to indicate an Alt-key shortcut in a menu entry), a Tk event specification used to create an accelerator binding (not necessarily involving the Alt key), and a short string to indicate the accelerator binding in a menu entry. All of these are optional. Again, for more details, see the documentation for the jldb package.

If you translate strings for any of the jstools applications (or the library) into any other languages, I would be very grateful to receive copies, so I can incorporate support for languages other than English into future releases.

Customising the Strings Displayed in an Existing Language

To customise the strings displayed in an existing language database for a particular language group (whether that's your country, your region, your site, or just your own personal preferences), you create a new language database (named as specified above in The Language Database Hierarchy) which contains just those strings you want to override. For instance, if you were adapting an application that had something to do with cars, which was provided with an English language database (which happens to contain American English strings), you might take the en database (English, which variety not defined) containing:
        {tires {Tires}}
        {hood {Hood}}
        {seats {Seats}}
        {paint {Paint Job}}
        {windows {Windows}}
        {radio {Radio}}
        {red {Red}}
        {black {Black}}
        {silver {Silver}}
        {gray {Gray}}
        {gold {Gold}}
        {yellow {Yellow}}
and create a new database en.uk specifying just those strings you want to change:
        {tires {Tyres}}
        {hood {Bonnet}}
        {gray {Grey}}
Now, assuming your new database is installed system-wide, anyone whose language preference is en (or en.us, assuming there isn't a separate en.us database - or en.za, for that matter) will see the original strings in the database distributed with the application, but anyone whose language preference is en.uk will see the modified versions of those strings that differ.