Re: SVN 1704 completely broke libedif

Chris Moller Wed, 07 Jun 2023 08:28:05 -0700

Thanks for the explanation--very much appreciated.

I more or less stumbled onto 1. and am using that in every relevant circumstance, but I'll revise to 4. (I can't generally use 5. because some of the strings use what I assume are non-ASCII characters like "←".)


Thanks again,
Chris


On 6/7/23 10:24, Dr. Jürgen Sauermann wrote:

Hi Chris,

wrapping arbitrary (= UTF8-encoded) strings into UTF8_string first is
the proper way to go. Consider the differences between:
*
**1. UCS_string yyy(UTF8_string(xxx)); // almost proper, but ambiguous (most vexing parse error)**
**2.  UCS_string yyy(xxx);                // now private: so never use it
3a. UTF8_string utf(xxx);              // really proper
3b. UCS_string yyy(utf);
4. ***UCS_string yyy((UTF8_string(xxx))); // also proper (this is 1. without the *****most vexing parse error)
5.  UCS_ASCII_string yyy(xxx)
***
If *xxx* is entirely *ASCII* then all of the above are equivalent.

Otherwise the difference is that 1. properly decodes UTF8-encoded
strings while the old 2. (which is now  disabled by private:) did not
(and the compiler has no way to detect an incorrect usage of 2.

Even worse, C++ would sometimes do 2. automatically (and incorrectly)
and without notice. Probably some of the recent Tokenization Errors
reported on bug-apl were caused by this.

Although 1. was throwing an assertion when used incorrectly, some
people wrapped a *try {} catch {}* around it which caused the error
to slip through unnoticed (at least up to the tokenizer).

A somewhat  unfortunate decision in the C++11 ff. standards was to
resolve *yyy* in 1. (which is ambiguous at a closer look) into a declaration
of function*yyy() *and not (as gcc still does) into two constructor calls
*UTF8_string(xxx)* followed by *UCS_string()* with the first. This problem
can apparently be avoided by using 4. instead of 1. (note the extra pair
of () which is NOT redundant).
Finally, 5. is a safe replacement for 2. (and the comment in the *.hh* file
is still valid (so *xxx* MUST be ASCII), which should hopefully avoid the
automatic use of 2. by the compiler. It is also easier to use with *grep*
in order to spot the (still possible) incorrect usage of 5.

Hope this helps,
Jürgen


On 6/6/23 22:13, Chris Moller wrote:
Yeah, I saw your comment in one of the .hh files. What I did was wrap all the edif ASCII strings in UTF8_string() calls. That works, but if it's circumventing what you're trying to do, let me know and I'll think of something else.
Even after a lot of years, I'm still not sure of the differences between UTF, UCS, Unicode, etc, etc.
--cm

On 6/6/23 15:56, Dr. Jürgen Sauermann wrote:
Hi,
sorry for that. The reason for making it private is to entirely prevent its usage. The former implementation of of it only worked for ASCII strings. There was a note about that in the header file, but I have seen quite a few incorrect usages of it (read: with UTF8-encoded strings) which then caused other, difficult
to find, errors later on.

Best Regards,
Jürgen


On 6/6/23 17:31, Chris Moller wrote:
Hi, Xtian,
Just pushed a fix for edif if you want to give it a try. Works for me on SVN 1706 and yesterday's SVN 1708.
--cm

On 6/5/23 03:33, Christian Robert wrote:
SVN 1704 completely broke libedif

Juergen made UCS_string (const char *)  a private member of the class
so a lot of compile errors in edif.cc ...
Not sure if this can be fixed. I reverted to SVN 1702 meanwhile. The is no way I'll revert to the "DEL Editor" !
Xtian.

OpenPGP_0xDA6C01938888083E.asc
Description: OpenPGP public key

OpenPGP_signature
Description: OpenPGP digital signature

Re: SVN 1704 completely broke libedif

Reply via email to