On Sat, Feb 24, 2007 at 07:00:42PM +0100, Jean-Marc Lasgouttes wrote: > Bug 1247 is about recognizing properly letters in non-latin1 text. > This is fixed on linux using 32bits wchar_t. However, on windows > another strategy has to be found, as discussed here > http://bugzilla.lyx.org/show_bug.cgi?id=1247 > > Enrico, Abdel, could you have a look?
I think that your idea of using Qt is the right one. When I run the attached test program (linked to QtCore), I obtain the following output: $ test-qt Using ISO 8859-1: ¿ is NOT a letter. à is a letter and toUpper(à) = À Using ISO 8859-2: ¿ is a letter and toUpper(¿) = Z à is a letter and toUpper(à) = R I think that this is the correct result, apart the fact that the toUpper() characters are not represented correctly for me in the ISO-8859-2 case, as I have a latin1 locale. -- Enrico
#include <iostream> #include <QTextCodec> #include <QString> using std::cout; using std::endl; int main() { QByteArray const localString = "¿à"; char const *encoding[2] = { "ISO 8859-1", "ISO 8859-2" }; for (int j = 0; j < 2; ++j) { QTextCodec *codec = QTextCodec::codecForName(encoding[j]); QString s = codec->toUnicode(localString); cout << "Using " << encoding[j] << ":" << endl; for (int i = 0; i < s.length(); ++i) { cout << localString.at(i) << " is "; if (s.at(i).isLetter()) { QString const upper = s.at(i).toUpper(); cout << "a letter and toUpper(" << localString.at(i) << ") = " << upper.toLocal8Bit().at(0) << endl; } else cout << "NOT a letter." << endl; } } return 0; }