Internationalization with Qt
The internationalization and localization of an application are the processes of adapting the application to different languages, regional differences, and technical requirements of a target market.
- Internationalization means designing an application so that it can be adapted to various languages and regions without engineering changes.
- Localization means adapting internationalized applications for a specific region or language by adding locale-specific components (such as date, time, and number formats) and translating text.
The need for internationalization ranges from spelling changes to enabling the application to operate in different languages and to use different input techniques, character encoding, and presentation conventions.
All input controls and text drawing methods in Qt offer built-in support for all supported languages. The built-in font engine correctly renders text that contains characters from a variety of different writing systems at the same time.
For more information about | See |
---|---|
Internationalizing source code | Writing Source Code for Translation |
Configuring and deploying translations, as well as using existing Qt module translations | Localizing Applications |
Using the Qt translation tools | Qt Linguist Manual |
The following video shows how to internationalize and localize a simple example application:
Qt Classes for Internationalization
The following classes support internationalizing of Qt applications.
Compares strings according to a localized collation algorithm | |
Can be used to speed up string collation | |
Converts between numbers and their string representations in various languages | |
Base class for encoding and decoding text | |
State-based decoder for text | |
State-based encoder for text | |
Conversions between text encodings | |
State-based decoder | |
State-based encoder | |
Internationalization support for text output |
See Writing Source Code for Translation for more information about how to use the classes in applications.
Languages and Writing Systems
Qt supports most languages in use today.
Input controls, such as the Qt Quick TextInput type and QLineEdit, QTextEdit, and derived classes, as well as display controls, such as the Text type and QLabel class handle the following special features of the different writing systems:
- Line breaks
Some of the Asian languages are written without spaces between words. Line breaking can occur either after any character (with exceptions) as in Chinese, Japanese and Korean, or after logical word boundaries as in Thai.
- Bidirectional writing
Arabic and Hebrew are written from right to left, except for numbers and embedded English text which is written left to right. The exact behavior is defined in the Unicode Technical Annex #9.
- Non-spacing or diacritical marks, such as accents or umlauts in European languages
Some languages, such as Vietnamese, make extensive use of these marks and some characters can have more than one mark at the same time to clarify pronunciation.
- Ligatures
In special contexts, some pairs of characters get replaced by a combined glyph forming a ligature. Common examples are the
fl
andfi
ligatures used in typesetting US and European books.
Qt's text engine supports different writing systems that work on all platforms if the fonts for rendering them are installed.
You do not need to know about the writing system used in a particular language, unless you want to write your own text input controls. In some languages, such as Arabic or languages from the Indian subcontinent, the width and shape of a glyph changes depending on the surrounding characters. To take this into account in C++ code, use QTextLayout. Writing input controls also requires some knowledge of the scripts they are going to be used in. Usually, the easiest way is to subclass QLineEdit or QTextEdit.
Encoding
Encoding is relevant both for application source files and the text files that the application reads or writes.
Encoding Source Code
QML documents are always encoded in UTF-8 format. Since Qt 6, 8-bit UTF-8 is the predominant encoding also in Qt C++.
The lupdate
tool extracts UI strings from your application. It expects all source code to be encoded in UTF-8 by default.
However, some editors, such as Visual Studio, use a different encoding by default. One way to avoid encoding issues is to limit any source code to ASCII, and use escape sequences for translatable strings with other characters, for example:
label->setText(tr("F\374r \310lise"));
QString::toUtf8() returns the text in UTF-8 encoding, which preserves Unicode information while looking like plain ASCII if the text is wholly ASCII. To convert Unicode to local 8-bit encoding, use QString::toLocal8Bit(). On Unix systems, this is equivalent to toUtf8()
. On Windows, the system's current code page is used.
For converting from UTF-8 and local 8-bit encoding to QString, use the QString::fromUtf8() and QString::fromLocal8Bit() convenience functions.
Encoding Text Input/Output
Use QTextStream::setEncoding() to set common encoding for text streams.
If you need some other legacy encoding, use the QTextCodec class from the Qt5Compat module.
When an application starts, the locale of the machine determines the 8-bit encoding used for external 8-bit data. QTextCodec::codecForLocale() returns a codec that you can use to convert between this locale encoding and Unicode.
The application may occasionally require encoding other than the default local 8-bit encoding. For example, an application in a Cyrillic KOI8-R locale (the de-facto standard locale in Russia) might need to output Cyrillic in the ISO 8859-5 encoding. Code for this would be:
QString string = ...; // some Unicode text QTextCodec *codec = QTextCodec::codecForName("ISO 8859-5"); QByteArray encodedString = codec->fromUnicode(string);
The following code demonstrates the conversion from ISO 8859-5 Cyrillic to Unicode:
QByteArray encodedString = ...; // some ISO 8859-5 encoded text QTextCodec *codec = QTextCodec::codecForName("ISO 8859-5"); QString string = codec->toUnicode(encodedString);
For a complete list of supported encodings see the QTextCodec documentation.
Operating and Windowing Systems
Some of the operating systems and windowing systems that Qt runs on only have limited support for Unicode. The level of support available in the underlying system has some influence on the support that Qt can provide on those platforms, although in general Qt applications need not be too concerned with platform-specific limitations.
Unix/X11
- Qt hides locale-oriented fonts and input methods and provides Unicode input and output.
- Most Unix variants use filesystem conventions such as UTF-8 by default. All Qt file functions allow Unicode, but convert filenames to the local 8-bit encoding, as this is the Unix convention.
- File I/O defaults to the local 8-bit encoding, with Unicode options in QTextStream.
- Some older Unix distributions contain only partial support for some locales. For example, even if you have a
/usr/share/locale/ja_JP.EUC
directory, you cannot display Japanese text unless you install Japanese fonts and the directory is complete. For best results, use complete locales from your system vendor.
Linux
- Qt provides full Unicode support, including input methods, fonts, clipboard, and drag-and-drop.
- The file system is encoded in UTF-8 on all modern Linux distributions. File I/O defaults to UTF-8.
Windows
- Qt provides full Unicode support, including input methods, fonts, clipboard, drag-and-drop, and file names.
- File I/O defaults to Latin1, with Unicode options in QTextStream. However, some Windows programs do not understand big-endian Unicode text files even though that is the order prescribed by the Unicode standard in the absence of higher-level protocols.
Related Topics
Localizing Qt and Qt Quick apps into multiple languages. | |
Using Qt Linguist to internationalize your Qt application | |
Using Qt translation tools: lupdate, lrelease, and Qt Linguist | |
Text ID based internationalization provides support for large scale projects with many target locales and many texts to translate | |
A summary of the translation rules for plural forms produced by Qt's translation tools. | |
Writing source code that enables the localization of applications. |