Thanks again Matthew. Writing quick for lack of time right now. In general, we avoid the use of wchar_t because it is define differently on different systems, making its intended use (as a unicode character) holder at best essentially useless for anything other than UTF-16, and at least confusing and ambiguous.
I could probably look this up, but since you know where everything is in clucene by now... What EXACTLY is TCHAR defined as (i.e. what is sizeof(TCHAR))? Same on all platforms? What does lucene_utf8towc return? TCHAR? wchar_t? What I'm trying to determine is: Is clucene expecting UTF-16 (which can represent 15 bits of unicode glyph space in 2 bytes, reserving the upper bit as a multicode indicator, and if set then moves to 4+ bytes after 15 bits)? ... or is clucene just saying 16 bits of unicode glyph space is good enough for government work; we're not gonna worry about the rest? >From the pros in the definition of the method you gave, it sounds like knowing the sizeof the return value for lucene_utf8towc might tell us the answer. Thanks again for doing the legwork. -Troy. Matthew Talbert wrote: >>> We have methods to convert to both UTF-16 and UTF-32 in our engine, >>> which don't need a fixed length buffer, so I would like to replace: >>> >>> lucene_utf8towcs(wcharBuffer, content, MAX_CONV_SIZE); >>> >>> with a call to our code, if we can nail down exactly what clucene wants >>> in the resultant wcharBuffer > > lucene_utf8towcs calls lucene_utf8towc for every character; the > comment on the function is this: > > /** > * lucene_utf8towc: > * @p: a pointer to Unicode character encoded as UTF-8 > * > * Converts a sequence of bytes encoded as UTF-8 to a Unicode character. > * If @p does not point to a valid UTF-8 encoded character, results are > * undefined. If you are not sure that the bytes are complete > * valid Unicode characters, you should use lucene_utf8towc_validated() > * instead. > * > * Return value: the resulting character > **/ > > The call to doc->Add actually expects a TCHAR, so if your utf8 to > utf16 conversion can produce a TCHAR, then that's all that would be > necessary I think. > > Matthew > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page