Eric Blake <ebl...@redhat.com> writes: > On 02/17/2016 11:08 AM, Markus Armbruster wrote: >> Eric Blake <ebl...@redhat.com> writes: >> >>> Commit cee2dedb noticed that if you have a partial flat union >>> (such as if an input parse failed due to a missing >>> discriminator), calling the dealloc visitor could result in >>> trying to dereference a NULL pointer if we attempted to visit >>> an object branch without an earlier successful call to >>> visit_start_implicit_struct() allocating the pointer for that >>> branch. But the "fix" it implemented requires the use of a >>> '.data' member in the union, which may or may not be the same >>> size as other branches of the union (consider a 32-bit platform >>> where one of the branches is an int64), which feels fairly dirty. >> >> Well, until the previous commit, it was the same, wasn't it? All >> pointers. > > For simple unions, you could have (well, still can have, until my later > patch gets rid of the simple_union_type() magic): > > struct SU { > SUKind type; > union { > void *data; > int8_t byte; > } u; > };
Begs the question why that works :) > But you're right - for flat unions, ALL branches were represented as > pointers (until this series unboxed them). > >> >>> Plus, as mentioned in that commit, it only works if you can >>> assume that '.data' would be zero-initialized even if '.kind' was >>> uninitialized, which is rather poor logic: our usage of >>> visit_start_struct() happens to zero-initialize both fields, >>> which means '.kind' is never truly uninitialized - but if we >>> changed visit_start_struct() to use g_new() instead of g_new0(), >>> then '.data' would not be any more reliable as a condition on >>> whether to visit the branch matching '.kind', regardless of >>> whether '.kind' was 0). >>> >>> Menawhile, now that we have just inlined the fields of all flat > > Meanwhile, > >>> unions, there is no longer the possibility of a null pointer to >>> dereference in the first place. Where the branch structure used >>> to be separately allocated by visit_start_implicit_struct(), it >>> is now just pointing to a subset of the memory already >>> zero-allocated by visit_start_struct(). > > I guess I may try and reword this slightly, and point to the fact that > the NULL dereference was due to calling visit_start_implicit_FOO() (only > done for flat unions; for simple unions the branches call > visit_type_FOO(), and that call safely handled NULL); That's why it works? > because we were > using visit_start/end_implicit_struct() for its allocation effects. But > the net result is the same - now that we no longer call > visit_start_implicit_struct() for a union visit, the dealloc visitor no > longer has to worry about a NULL dereference on a partially constructed > object, so we no longer need to probe if the union contains any data. > >>> +++ b/scripts/qapi-visit.py >>> @@ -246,9 +246,6 @@ void visit_type_%(c_name)s(Visitor *v, const char >>> *name, %(c_name)s **obj, Error >>> if variants: >>> ret += gen_err_check(label='out_obj') >>> ret += mcgen(''' >>> - if (!visit_start_union(v, !!(*obj)->u.data, &err) || err) { >>> - goto out_obj; >>> - } >> >> I'm afraid the previous commit broke this for flat unions. >> >> Before the previous commit, all members of (*obj)->u were pointers to >> the struct holding the variant members both for flat and simple unions. >> !!(*obj)->u.data tests whether the struct holding the variant members >> has been allocated. This relies on uniform pointer format. >> >> The dealloc visitor uses the "has been allocated" bit to suppress >> visiting the struct when it hasn't been allocated. >> >> The previous commit unboxes the struct for flat unions. Now ->u.data >> reinterprets the first few bytes of that struct as pointer. If you're >> "lucky", they're not all zero, and the struct gets visited. > > You're right - and I bet I could come up with a case where valgrind > could call me on it. > >> >> Obvious fix: squash this hunk into the previous commit, then let this >> commit drop the code that's no longer used. > > Yep, for bisectability, I think that's what I'll end up doing. > >> >> However, simple unions are still boxed. Why can't their pointer be null >> in the dealloc visitor? > > Simple unions still go through visit_type_FOO(), and _that_ function > properly checks for NULL. It was only visit_type_implicit_FOO() that > blindly dereferenced things. In fact, in the earlier incantation of > this patch, my fix was to teach visit_type_implicit_FOO() how to check > for NULL: > https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05442.html > > But now that visit_type_implicit_FOO() is gone, my earlier incantation > got reduced in size. I guess it's all in how I document the commit message. Give it a try :)