Boost.Locale
|
atoi
because they may not use the "ordinary" digits 0..9 at all. You can't assume that "space" characters are frequent because in Chinese the space character does not separate words. The text may be written from Right-to-Left or from Up-to-Down, and so on.Unicode is not limited to wide strings. Both std::string
and std::wstring
can hold and process Unicode text. More than that, the semantics of std::string
are much cleaner in multi-platform applications, because all "Unicode" strings are UTF-8. "Wide" strings may be encoded in "UTF-16" or "UTF-32", depending on the platform, so they may be even less convenient when dealing with Unicode than char
based strings.
There is common assumption that UTF-16 is the best encoding for storing information because it gives the "shortest" representation of strings.
In fact, it is probably the most error-prone encoding to work with. The biggest issue is code points that lay outside of the BMP, which must be represented with surrogate pairs. These characters are very rare and many applications are not tested with them.
For example:
So UTF-16 can be used for Unicode, in fact ICU and many other applications use UTF-16 as their internal Unicode representation, but you should be very careful and never assume one-code-point == one-utf16-character.