It sure helped me to understand it! Thanks. As I understand the performance issue thought between 6.7 and later versions of LC, it revolves around having to process all the unicode strings that are native now. Or so the discussion has gone in the past. If not, then the performance hit since v7 has yet to be explained sufficiently.
Bob S > On Sep 8, 2021, at 02:42 , Ben Rubinstein via use-livecode > <use-livecode@lists.runrev.com> wrote: > > > On 07/09/2021 17:22, Bob Sneidar via use-livecode wrote: >> This makes sense to me (I think) because if I am not mistaken, UTF16 is >> Unicode, and UTF8 is simple ASCII. The slowdown from 6.7 to 7.0 was >> precicely the support for Unicode text. Someone will correct me if I am >> wrong about this. As a hobbyist, I try and stay away from localization >> issues. But I am interested in the idea that all text incoming should be >> text decoded and outgoing the inverse. (Did I get that right??) > > Cue scenes of strong men reeling back in horror, ladies fainting, etc > (Bateman cartoons, for those of a British persuasion). > > UTF16 is not Unicode, UTF8 is not simple ASCII, and I'm not even sure that > the slowdown from 6.7 to 7.0 was precisely the support for Unicode text, > though I'm not sure about that. > > Unicode and ASCII are both conventions that assign character interpretations > to numbers. ASCII assigned approximately 94 character interpretations to the > numbers 32-126 (plus a few control interpretations to some other numbers). > WindowsLatin1, MacRoman, ISO-8859-1 etc all did the same but to a wider range > of numbers up to 255. Unicode does the same thing for a... much... larger > number of characters and glyphs, and hence using a... much... larger range of > numbers. > > Unicode specifies numbers, not bytes. UTF8 and UTF16 are two of several ways > of representing Unicode strings in bytes. UTF8 is designed to do so in a way > that makes ASCII text compatible with UTF8, i.e. a file of ASCII text is a > valid UTF8 file; the reverse is not necessarily true. > > A long-running problem with Metacard, Revolution, LC up to v6 was being > surprisingly platform-centric about character sets. To this day, textEncode > etc only support MacRoman on Mac, only support ISO-8859-1 on Linux, and so > on; as if we never are on one platform, needing to deal with character > streams generated on another. See > https://quality.livecode.com/show_bug.cgi?id=12205 > https://quality.livecode.com/show_bug.cgi?id=22391 > https://quality.livecode.com/show_bug.cgi?id=21320 > > LC7 brought LiveCode into the later part of the 20th century by properly > supporting Unicode, and by breaking the assumed link between bytes and > characters. However if I understand correctly, the internal format of strings > does not, or at least not necessarily, correspond to any externally defined > standard, but can take various forms in order to maximise efficiencies of > processing and storage. > > Not sure if this helps, but it helped me to write it! > > Ben > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode