On 8/19/13 2:15 PM, Devin Asay wrote:

On Aug 19, 2013, at 1:03 PM, J. Landman Gay wrote:

I need to read and process a tab-delimited text file that is in
UTF8 format containing unicode. The final goal is to get it into an
array with the first tabbed item as the keys, preserving all
unicode. There are some HTML format tags in it as well.

If I read the file as binfile, carriage returns are all lost.

Jacque,

Where are the files coming from? Maybe they're using ASCII 13 as a
line terminator, or ASCII 10 + 13. Can't you replace whatever the
native line delimiter is with numToChar(10)?

I forgot about that. They're ascii 13, and replacing them does keep the line breaks. Thanks.

When I run uniEncode(tData,"UTF8") on it, the high-ascii characters are in the variable watcher as "+" and an unprintable box. Can I assume the real character is in there? Will it work for text chunking, etc? When I split it into an array, will the keys be intact?

--
Jacqueline Landman Gay         |     jac...@hyperactivesw.com
HyperActive Software           |     http://www.hyperactivesw.com

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to