Dar Scott wrote:
Yeah, there is no need to use binfile, but it is OK. You can process the line
ends before or after converting to Unicode, if you do.
Not too cautious for not knowing. It is a normal and right approach to be
aware of potential problems and make code robust for those, but now you know.
Assuming a valid UTF-8 file...
Only the ASCII characters in UTF-8 have the high bit zero. They are
represented as single bytes. (ASCII files are UTF-8 files.) All other
characters are represented with multiple bytes that have the high bit set, not
just the first but even the following. (The first byte in binary is 11xxxxxx
and the continuing bytes are 10xxxxxx.)
This means there are no CR, LF, tab, or comma hidden in the non-ASCII
characters. ASCII never has the high bit set. You can use line and item
chunks with UTF-8. You can use offset (with care) and replace.
Thanks for that background, Dar. I had suspected there may have been
something that makes such distinctions identifiable, but didn't know the
details. Now I can use "file" with confidence (and less work handling
line endings).
Really nice to have you back on this list.
--
Richard Gaskin
Fourth World
LiveCode training and consulting: http://www.fourthworld.com
Webzine for LiveCode developers: http://www.LiveCodeJournal.com
Follow me on Twitter: http://twitter.com/FourthWorldSys
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode