michael.a wrote:
It would be interesting for someone to try to make a practical argument that
is anything but a nest of technicalities, as to why ctors and unions
shouldn't be mixable.

The Fortran language specification allows essentially this, although the terms are initializers and equivalences rather than ctors and unions. Just this week, I reviewed the patch to add this functionality to the GCC Fortran front end, and I wrote a bit of the infrastructure it uses, so I can speak somewhat to the problems of implementing it.

(This was PR29786; you can see how long it took before it was fixed, even though it was a regression against the old Fortran front end, and was also for quite a while only one of two elements of the Fortran standard that we hadn't implemented yet.)

In Fortran, the rule is that any element in an equivalence (i.e., union) can be initialized, so long as no two initializers attempt to initialize the same piece of memory to different values.

The implementation for this creates an unsigned-char buffer array representing target memory, and goes through every initializer (i.e., ctor) in the equivalence, converting their values into their target-memory representations, checking to see what bits of memory they touch and whether those have already been initialized to something different, and then writing them into the buffer array. Then, an entirely new initializer is created from that buffer array.

That all had to be built on a fair pile of front-end code to convert values into their target-memory representations, and then rather more code that was essentially a special-purpose initializer constructor to deal with the buffer array.

A lot of the trickiness is in exactly how you specify what's allowed. The Fortran rule requires explicitly simulating the target memory storage and checking byte-value versions of the initializers against each other, which is a rather messy thing to be doing in the front end, but it's at least simple to specify.

An alternate version would be to specify that overlapping ctors are not allowed even if they do result in the same byte-values. Aside from being a somewhat arbitrary restriction, this doesn't simplify things very much, since the front end still needs to look pretty deeply into the target memory representation to see if things overlap.

The version we used to have in the Fortran front end was simply to only allow one item in each equivalence to have an initializer. That seemed to work without doing anything particular to the initializers, but I'm not sure whether things are tracked in the other front ends in ways that would make enforcing such a rule easy -- and, very likely, it wouldn't work for the example you describe (with a four-number rectangle being unioned with two two-point vectors) because you have two vectors in the union and they both have initializers. It's also a rather arbitrary rule that's not the sort of thing one would really want in a language standard.

Now, as for "shouldn't"? I can't speak to that, given that the Fortran committee thought it a valuable feature to include, and that we did implement it and it works. Well, mostly works, at least -- I wouldn't at all swear that we've got all the bugs out of it yet. But it was a pain, and it (along with one other feature that required simulating the writing of things to target memory) required an amount of effort to implement that was dramatically out of proportion to the importance of the feature.

- Brooks

Reply via email to