Re: Text encoding.

Mark Waddingham via use-livecode Thu, 02 Sep 2021 10:35:16 -0700

On 2021-09-02 12:12, Alex Tweedly via use-livecode wrote:

Sorry to drag us off the interesting topic of licensing :-) into some
Livecode question.


I know little or nothing about Unicode, text encodings, etc. - so my
question is indeed naive.

I have a text file (War & Peace from Project Gutenberg), about 3.4Mb.
The Mac describes it simply as "Plain text".


Do you have a link to the file handy?


When I read that into a variable, and then do
    replace tChar by SPACE in tWholeText
it takes between 1000 and 4000 millisecs - versus the 8-10 msecs I had
expected from other samples.

If I put in
    put textEncode(tWHoleText, "UTF8") into tWholeText
before the replace then it does indeed tae 8-10 msecs.

What exact code are you using in both cases? (including reading in thefile, char you are replacing etc.)

Additional info - I just discovered that according to 'more' command
line, the file start with :

<U+FEFF>The Project ....


That suggests the file is unicode encoded - it is a 'byte order mark'.

The character itself is the 'undefined/illegal codepoint' which has adifferent sequence of bytes for each of the main (UTF-8/16LE,BE/32LE,BE)encodings. If you do `hexdump -c | less` on the file, then if it isUTF-8 there will be three bytes before the T, or 4 if it is UTF-16.


Warmest Regards,

Mark.

--
Mark Waddingham ~ [email protected] ~ http://www.livecode.com/
LiveCode: Everyone can create apps

_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Text encoding.

Reply via email to