Following this with interest, but also a little confusion.  I completely fell 
into the trap of assuming you encode outgoing and decode incoming.

Alex states that put textEncode(tWHoleText, "UTF8") into tWholeText speeds 
replace up, but David B says LC internal format is UTF16.  Doesn’t the 8 vs 16 
difference matter?  Or matters less than other encodings?

Cheers

David Glasgow


> On 2 Sep 2021, at 1:01 pm, David Bovill via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> Thanks for the question Alex, I’m wrestling with the same issues - but so far 
> got no responses from encoding gurus here :)
> 
> This is my understanding:
> 
> 1) Yes its recommended to textEncode text that comes from outside into 
> Livecode’s internal native format (which is utf16).  Livecode handles 
> everything internally “transparently” from then on - which I guess means all 
> usual language and control operations expect this utf16 internal format. My 
> guess is this is why a few things have got slower as compared with early 
> versions of Livecode.
> 2) Without doing textEncode the engine tries to guess the encoding 
> (duck-typing?) and does this in a platform specific way? Again exactly what 
> is going on there is a bit opaque to me, but the take-home message is that 
> this is slower and less robust. So yes -losing nothing (assuming the original 
> file is utf8, and yes its the best alternative.
> 
> I thing the hard thing to find out is exactly what type of encoding some 
> files are - would be great if there was a duck-typing service where we could 
> paste text or upload files and it would say - hey this looks like utf8 - but 
> that’s asking too much
> 
> 📆    Schedule a call with me
> On 2 Sep 2021, 12:12 +0100, Alex Tweedly via use-livecode 
> <use-livecode@lists.runrev.com>, wrote:
>> Sorry to drag us off the interesting topic of licensing :-) into some
>> Livecode question.
>> 
>> I know little or nothing about Unicode, text encodings, etc. - so my
>> question is indeed naive.
>> 
>> I have a text file (War & Peace from Project Gutenberg), about 3.4Mb.
>> The Mac describes it simply as "Plain text".
>> 
>> When I read that into a variable, and then do
>>     replace tChar by SPACE in tWholeText
>> it takes between 1000 and 4000 millisecs - versus the 8-10 msecs I had
>> expected from other samples.
>> 
>> If I put in
>>     put textEncode(tWHoleText, "UTF8") into tWholeText
>> before the replace then it does indeed tae 8-10 msecs.
>> 
>> Q1. What (if anything) am I losing by doing that ?
>> 
>> Q2. Is this the best alternative ?
>> 
>> Additional info - I just discovered that according to 'more' command
>> line, the file start with :
>> 
>> <U+FEFF>The Project ....
>> 
>> if that is useful.
>> 
>> Many thanks,
>> 
>> Alex.
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to