Thanks for the explanation--very much appreciated.I more or less stumbled onto 1. and am using that in every relevant circumstance, but I'll revise to 4. (I can't generally use 5. because some of the strings use what I assume are non-ASCII characters like "←".)
Thanks again, Chris On 6/7/23 10:24, Dr. Jürgen Sauermann wrote:
Hi Chris, wrapping arbitrary (= UTF8-encoded) strings into UTF8_string first is the proper way to go. Consider the differences between: ***1. UCS_string yyy(UTF8_string(xxx)); // almost proper, but ambiguous (most vexing parse error)****2. UCS_string yyy(xxx); // now private: so never use it 3a. UTF8_string utf(xxx); // really proper 3b. UCS_string yyy(utf);4. ***UCS_string yyy((UTF8_string(xxx))); // also proper (this is 1. without the *****most vexing parse error)5. UCS_ASCII_string yyy(xxx) *** If *xxx* is entirely *ASCII* then all of the above are equivalent. Otherwise the difference is that 1. properly decodes UTF8-encoded strings while the old 2. (which is now disabled by private:) did not (and the compiler has no way to detect an incorrect usage of 2. Even worse, C++ would sometimes do 2. automatically (and incorrectly) and without notice. Probably some of the recent Tokenization Errors reported on bug-apl were caused by this. Although 1. was throwing an assertion when used incorrectly, some people wrapped a *try {} catch {}* around it which caused the error to slip through unnoticed (at least up to the tokenizer). A somewhat unfortunate decision in the C++11 ff. standards was toresolve *yyy* in 1. (which is ambiguous at a closer look) into a declarationof function*yyy() *and not (as gcc still does) into two constructor calls *UTF8_string(xxx)* followed by *UCS_string()* with the first. This problem can apparently be avoided by using 4. instead of 1. (note the extra pair of () which is NOT redundant).Finally, 5. is a safe replacement for 2. (and the comment in the *.hh* fileis still valid (so *xxx* MUST be ASCII), which should hopefully avoid the automatic use of 2. by the compiler. It is also easier to use with *grep* in order to spot the (still possible) incorrect usage of 5. Hope this helps, Jürgen On 6/6/23 22:13, Chris Moller wrote:Yeah, I saw your comment in one of the .hh files. What I did was wrap all the edif ASCII strings in UTF8_string() calls. That works, but if it's circumventing what you're trying to do, let me know and I'll think of something else.Even after a lot of years, I'm still not sure of the differences between UTF, UCS, Unicode, etc, etc.--cm On 6/6/23 15:56, Dr. Jürgen Sauermann wrote:Hi,sorry for that. The reason for making it private is to entirely prevent its usage. The former implementation of of it only worked for ASCII strings. There was a note about that in the header file, but I have seen quite a few incorrect usages of it (read: with UTF8-encoded strings) which then caused other, difficultto find, errors later on. Best Regards, Jürgen On 6/6/23 17:31, Chris Moller wrote:Hi, Xtian,Just pushed a fix for edif if you want to give it a try. Works for me on SVN 1706 and yesterday's SVN 1708.--cm On 6/5/23 03:33, Christian Robert wrote:SVN 1704 completely broke libedif Juergen made UCS_string (const char *) a private member of the class so a lot of compile errors in edif.cc ...Not sure if this can be fixed. I reverted to SVN 1702 meanwhile. The is no way I'll revert to the "DEL Editor" !Xtian.
OpenPGP_0xDA6C01938888083E.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature