This is something that I've been wondering about for a while.

My unexamined assumption had been that in the 'new' fully unicode LC, text was held in UTF-8. However when I saved some text strings in binary I got something like UTF-8 - but not quite. And the recent experiments with offset suggested that LC at the least is able to distinguish between a string which is fully represented as single-byte (or perhaps ASCII?). And the reports of the ingenious investigators using UTF-32 to speed up offsets, and discovering that offset somehow managed to be case-insensitive in this case, made me wonder whether after using textEncode(xt, "UTF-32") LC marks the string in some way to give a clue about how to interpret it as text?

So could someone who is familar with this bit of the engine enlighten us? In particular:
- What is the internal format?
- Is it different on different platforms?
- Given that it appears to include a flag to indicate whether it is single-byte text or not, are there any other attributes?
- Does saving a string in 'binary' file faithfully report the internal format?

TIA,

Ben

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to