Following this with interest, but also a little confusion. I completely fell into the trap of assuming you encode outgoing and decode incoming.
Alex states that put textEncode(tWHoleText, "UTF8") into tWholeText speeds replace up, but David B says LC internal format is UTF16. Doesn’t the 8 vs 16 difference matter? Or matters less than other encodings? Cheers David Glasgow > On 2 Sep 2021, at 1:01 pm, David Bovill via use-livecode > <use-livecode@lists.runrev.com> wrote: > > Thanks for the question Alex, I’m wrestling with the same issues - but so far > got no responses from encoding gurus here :) > > This is my understanding: > > 1) Yes its recommended to textEncode text that comes from outside into > Livecode’s internal native format (which is utf16). Livecode handles > everything internally “transparently” from then on - which I guess means all > usual language and control operations expect this utf16 internal format. My > guess is this is why a few things have got slower as compared with early > versions of Livecode. > 2) Without doing textEncode the engine tries to guess the encoding > (duck-typing?) and does this in a platform specific way? Again exactly what > is going on there is a bit opaque to me, but the take-home message is that > this is slower and less robust. So yes -losing nothing (assuming the original > file is utf8, and yes its the best alternative. > > I thing the hard thing to find out is exactly what type of encoding some > files are - would be great if there was a duck-typing service where we could > paste text or upload files and it would say - hey this looks like utf8 - but > that’s asking too much > > 📆 Schedule a call with me > On 2 Sep 2021, 12:12 +0100, Alex Tweedly via use-livecode > <use-livecode@lists.runrev.com>, wrote: >> Sorry to drag us off the interesting topic of licensing :-) into some >> Livecode question. >> >> I know little or nothing about Unicode, text encodings, etc. - so my >> question is indeed naive. >> >> I have a text file (War & Peace from Project Gutenberg), about 3.4Mb. >> The Mac describes it simply as "Plain text". >> >> When I read that into a variable, and then do >> replace tChar by SPACE in tWholeText >> it takes between 1000 and 4000 millisecs - versus the 8-10 msecs I had >> expected from other samples. >> >> If I put in >> put textEncode(tWHoleText, "UTF8") into tWholeText >> before the replace then it does indeed tae 8-10 msecs. >> >> Q1. What (if anything) am I losing by doing that ? >> >> Q2. Is this the best alternative ? >> >> Additional info - I just discovered that according to 'more' command >> line, the file start with : >> >> <U+FEFF>The Project .... >> >> if that is useful. >> >> Many thanks, >> >> Alex. >> >> >> _______________________________________________ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode