This is something that I've been wondering about for a while.
My unexamined assumption had been that in the 'new' fully unicode LC, text was
held in UTF-8. However when I saved some text strings in binary I got
something like UTF-8 - but not quite. And the recent experiments with offset
suggested that LC at the least is able to distinguish between a string which
is fully represented as single-byte (or perhaps ASCII?). And the reports of
the ingenious investigators using UTF-32 to speed up offsets, and discovering
that offset somehow managed to be case-insensitive in this case, made me
wonder whether after using textEncode(xt, "UTF-32") LC marks the string in
some way to give a clue about how to interpret it as text?
So could someone who is familar with this bit of the engine enlighten us? In
particular:
- What is the internal format?
- Is it different on different platforms?
- Given that it appears to include a flag to indicate whether it is
single-byte text or not, are there any other attributes?
- Does saving a string in 'binary' file faithfully report the internal format?
TIA,
Ben
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode