What is LC's internal text format?

Ben Rubinstein via use-livecode Mon, 12 Nov 2018 14:38:03 -0800

This is something that I've been wondering about for a while.

My unexamined assumption had been that in the 'new' fully unicode LC, text washeld in UTF-8. However when I saved some text strings in binary I gotsomething like UTF-8 - but not quite. And the recent experiments with offsetsuggested that LC at the least is able to distinguish between a string whichis fully represented as single-byte (or perhaps ASCII?). And the reports ofthe ingenious investigators using UTF-32 to speed up offsets, and discoveringthat offset somehow managed to be case-insensitive in this case, made mewonder whether after using textEncode(xt, "UTF-32") LC marks the string insome way to give a clue about how to interpret it as text?

So could someone who is familar with this bit of the engine enlighten us? Inparticular:

- What is the internal format?
- Is it different on different platforms?

- Given that it appears to include a flag to indicate whether it issingle-byte text or not, are there any other attributes?

- Does saving a string in 'binary' file faithfully report the internal format?

TIA,

Ben

_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

What is LC's internal text format?

Reply via email to