lines of UFT16 text are broken?

Kee Nethery Tue, 29 Mar 2011 18:04:30 -0700

I convert UFT8 text into UTF16 and then work through each line of text.

The problem I run into is that on my Intel Mac, (bigendian), a return is 
encoded as "10 0" and if I have a set of characters 
uniencode("123" & return & "456") 
it encodes into UTF16 as the bytes:
49 0 50 0 51 0_10_0_52 0 53 0 54 0


When I look at line 1 I get:
49 0 50 0 51 0_

When I look at line 2 I get:
_0_52 0 53 0 54 0

The 0 from the return (actually a linefeed) being interpreted as part of the 
next line. "10 0" is not the line break, "10" is the line break.

How do I get it to break at "10 0" instead of at "10"? My fear is that I'm 
going to come across a unicode character that includes "10" in the right 
location, kind of like "32 10" (no clue what that is) and the system is going 
to see the "10" and deal with it as the divider between two lines.

How do people deal with this? Do I need to build a UTF16 version of all the 
text parsing routines to safely get each line?

Kee Nethery




_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

lines of UFT16 text are broken?

Reply via email to