> It is actually UTF8 to Unicode from everything I've been able to read.
>  utf8towcs is, from what I have read, supposed to represent every
> Unicode character as a single wchar_t which is supposed to be wide
> enough to hold the entire Unicode point value in a single space.  If
> I'm mistaken and someone knows otherwise, I'd appreciate knowing.  So

You are (a little) mistaken. wchar_t is defined differently on
different platforms. On Windows, it is only 16 bits, which actually
*isn't* wide enough to hold all Unicode characters. I believe on
modern *nix systems, it's defined as 32 bits. I do not believe that
clucene actually bothers itself with any characters that do not happen
to fit in 16 bits, so they just don't work, at least on Windows.

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to