> It is actually UTF8 to Unicode from everything I've been able to read. > utf8towcs is, from what I have read, supposed to represent every > Unicode character as a single wchar_t which is supposed to be wide > enough to hold the entire Unicode point value in a single space. If > I'm mistaken and someone knows otherwise, I'd appreciate knowing. So
You are (a little) mistaken. wchar_t is defined differently on different platforms. On Windows, it is only 16 bits, which actually *isn't* wide enough to hold all Unicode characters. I believe on modern *nix systems, it's defined as 32 bits. I do not believe that clucene actually bothers itself with any characters that do not happen to fit in 16 bits, so they just don't work, at least on Windows. _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page