On 02/17/2016 11:08 AM, Markus Armbruster wrote: > Eric Blake <ebl...@redhat.com> writes: > >> Commit cee2dedb noticed that if you have a partial flat union >> (such as if an input parse failed due to a missing >> discriminator), calling the dealloc visitor could result in >> trying to dereference a NULL pointer if we attempted to visit >> an object branch without an earlier successful call to >> visit_start_implicit_struct() allocating the pointer for that >> branch. But the "fix" it implemented requires the use of a >> '.data' member in the union, which may or may not be the same >> size as other branches of the union (consider a 32-bit platform >> where one of the branches is an int64), which feels fairly dirty. > > Well, until the previous commit, it was the same, wasn't it? All > pointers.
For simple unions, you could have (well, still can have, until my later patch gets rid of the simple_union_type() magic): struct SU { SUKind type; union { void *data; int8_t byte; } u; }; But you're right - for flat unions, ALL branches were represented as pointers (until this series unboxed them). > >> Plus, as mentioned in that commit, it only works if you can >> assume that '.data' would be zero-initialized even if '.kind' was >> uninitialized, which is rather poor logic: our usage of >> visit_start_struct() happens to zero-initialize both fields, >> which means '.kind' is never truly uninitialized - but if we >> changed visit_start_struct() to use g_new() instead of g_new0(), >> then '.data' would not be any more reliable as a condition on >> whether to visit the branch matching '.kind', regardless of >> whether '.kind' was 0). >> >> Menawhile, now that we have just inlined the fields of all flat Meanwhile, >> unions, there is no longer the possibility of a null pointer to >> dereference in the first place. Where the branch structure used >> to be separately allocated by visit_start_implicit_struct(), it >> is now just pointing to a subset of the memory already >> zero-allocated by visit_start_struct(). I guess I may try and reword this slightly, and point to the fact that the NULL dereference was due to calling visit_start_implicit_FOO() (only done for flat unions; for simple unions the branches call visit_type_FOO(), and that call safely handled NULL); because we were using visit_start/end_implicit_struct() for its allocation effects. But the net result is the same - now that we no longer call visit_start_implicit_struct() for a union visit, the dealloc visitor no longer has to worry about a NULL dereference on a partially constructed object, so we no longer need to probe if the union contains any data. >> +++ b/scripts/qapi-visit.py >> @@ -246,9 +246,6 @@ void visit_type_%(c_name)s(Visitor *v, const char *name, >> %(c_name)s **obj, Error >> if variants: >> ret += gen_err_check(label='out_obj') >> ret += mcgen(''' >> - if (!visit_start_union(v, !!(*obj)->u.data, &err) || err) { >> - goto out_obj; >> - } > > I'm afraid the previous commit broke this for flat unions. > > Before the previous commit, all members of (*obj)->u were pointers to > the struct holding the variant members both for flat and simple unions. > !!(*obj)->u.data tests whether the struct holding the variant members > has been allocated. This relies on uniform pointer format. > > The dealloc visitor uses the "has been allocated" bit to suppress > visiting the struct when it hasn't been allocated. > > The previous commit unboxes the struct for flat unions. Now ->u.data > reinterprets the first few bytes of that struct as pointer. If you're > "lucky", they're not all zero, and the struct gets visited. You're right - and I bet I could come up with a case where valgrind could call me on it. > > Obvious fix: squash this hunk into the previous commit, then let this > commit drop the code that's no longer used. Yep, for bisectability, I think that's what I'll end up doing. > > However, simple unions are still boxed. Why can't their pointer be null > in the dealloc visitor? Simple unions still go through visit_type_FOO(), and _that_ function properly checks for NULL. It was only visit_type_implicit_FOO() that blindly dereferenced things. In fact, in the earlier incantation of this patch, my fix was to teach visit_type_implicit_FOO() how to check for NULL: https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg05442.html But now that visit_type_implicit_FOO() is gone, my earlier incantation got reduced in size. I guess it's all in how I document the commit message. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature