Re: [Qemu-devel] [PATCH v1 01/10] qdict: implement a qdict_crumple method for un-flattening a dict

Max Reitz Sat, 05 Mar 2016 07:18:11 -0800

On 03.03.2016 12:01, Daniel P. Berrange wrote:
> On Wed, Mar 02, 2016 at 05:13:59PM +0100, Max Reitz wrote:
>> On 19.02.2016 17:47, Daniel P. Berrange wrote:
>>> The qdict_flatten() method will take a dict whose elements are
>>> further nested dicts/lists and flatten them by concatenating
>>> keys.
>>>
>>> The qdict_crumple() method aims todo the reverse, taking a flat
>>> qdict, and turning it into a set of nested dicts/lists. It will
>>> apply nesting based on the key name, with a '.' indicating a
>>> new level in the hierarchy. If the keys in the nested structure
>>> are all numeric, it will create a list, otherwise it will create
>>> a dict.
>>>
>>> If the keys are a mixture of numeric and non-numeric, or the
>>> numeric keys are not in strictly ascending order, an error will
>>> be reported.


[...]

>>> +            if (!child) {
>>> +                goto error;
>>> +            }
>>> +
>>> +            qdict_put_obj(tmp2, entry->key, child);
>>> +        } else {
>>> +            qobject_incref(entry->value);
>>> +            qdict_put_obj(tmp2, entry->key, entry->value);
>>> +        }
>>> +
>>> +        entry = next;
>>> +    }
>>> +    QDECREF(tmp1);
>>> +
>>> +    /* Step 3: detect if we need to turn our dict into list */
>>
>> You could use qdict_array_entries(tmp2, "") > 0 for this.
>>
>> If you do want to make { "0": 42, "2": 23 } and { "0": 42, "x": 23 }
>> errors (my qdict_unflatten() just kept those QDicts), then you'd have to
>> check on qdict_array_entries() error whether the QDict contained an
>> integer key, but that would still be simpler than the following loop and
>> the check in step 4.
> 
> If I use qdict_array_entries() then it does 2 O(N) iterations
> of the input qdict, and to check the errors, I'll have todo
> yet another iteration.  My inlined code here does everything
> in a single O(N) iteration instead of 3. I think that's
> compelling enough to reason to not use that function.

O(3N) = O(N) :-)

Making qdict_array_entries() O(N) (pretending that O(N) and O(2N) were
different things) for this case is trivial: Just omit the second loop if
"" has been passed as the subqdict.

So then you'd end up with "O(2N)", if you do the error checking.

So you are right, there is a reason not to use qdict_array_entries(),
but I'd personally still use it. I only have very limited knowledge of
the whole qemu code base, but it doesn't appear to me as if QDict
functions are ever used in a hot path or as if QDicts ever contain a
large number of elements (with large being more than, say, 1000),
especially not the kind of QDicts you'd want to crumple.

Because of that, I chose to use these existing functions, even while
knowing that they are probably not optimal for this case.

You did propose removing qdict_array_entries() and qdict_array_split()
in favor of just using this function in their stead (see my reply
below), so if we do so, reusing them would obviously not be ideal any
more. In that case, I'd still put this into its own static function if
possible.

tl;dr: I don't think runtime complexity is an issue here, but if we are
going to drop qdict_array_entries() and qdict_array_split() anyway, then
using them is a bad idea, obviously. It would then still be nice to put
this into an own function.

>>> +    entry = qdict_first(tmp2);
>>> +    while (entry != NULL) {
>>> +        next = qdict_next(tmp2, entry);
>>> +
>>> +        errno = 0;
>>> +        if (qemu_strtoll(entry->key, NULL, 10, &val) == 0) {
>>> +            if (!dst) {
>>> +                dst = (QObject *)qlist_new();
>>> +                isList = true;
>>> +            } else if (!isList) {
>>> +                error_setg(errp,
>>> +                           "Key '%s' is for a list, but previous key is "
>>> +                           "for a dict", entry->key);
>>> +                goto error;
>>> +            }
>>> +            listLen++;
>>> +            if (val > listMax) {
>>> +                listMax = val;
>>> +            }
>>> +        } else {
>>> +            if (!dst) {
>>> +                dst = (QObject *)tmp2;
>>> +                qobject_incref(dst);
>>> +                isList = false;
>>> +            } else if (isList) {
>>> +                error_setg(errp,
>>> +                           "Key '%s' is for a dict, but previous key is "
>>> +                           "for a list", entry->key);
>>> +                goto error;
>>> +            }
>>> +        }
>>> +
>>> +        entry = next;
>>> +    }
>>> +
>>> +    /* Step 4: Turn the dict into a list */
>>
>> Why not just use qdict_array_split(tmp2, &dst);?
> 
> Again qdict_array_split() is a pretty inefficiently written
> function. It does multiple nested iterations over its input
> giving it O(n^2) time, compared to O(n) for my code here.

Strictly speaking, it's not O(n^2) but O(nm), where n is qdict_size(src)
and m is the size of the resulting QList.

(Side note: This function only is O(n) if qdict_get() is O(1), which it
is in best case. In worst case, it's O(n), and then this code becomes
O(n^2).)

Again, I don't think that O(nm) is much worse than O(n) for the use case
at hand, but if you are going to drop qdict_array_split() anyway, then,
again, it doesn't make sense to call it at all.

You could again put the code inside of the "if (isList) {}" block into
an own function, but it's not that long so I'd be fine with it staying
here (although it is certainly self-contained enough to look good in an
own function :-)).

> The only user of qdict_array_entries() and qdict_array_split()
> is the block quorum driver. From a brief look at that code
> I think it could probably be changed to just call this new
> qdict_crumple() method, and those 2 inefficient functions
> could be deleted, so we don't have duplication of code.

I'm not sure. First, you do not want to crumple recursively there, so
you'd have to re-flatten all the QList elements after you have crumpled
them. Could be solved by adding a "bool recurse" parameter to this function.

Second, yes, qdict_array_entries() is used by quorum. However, it does
(no longer) use qdict_array_split(); the users of that are
util/qemu-config.c and blockdev.c. Replacing those with a non-recursing
call to qdict_crumple() should be trivial, but quorum is going to be a
bit more difficult. You could make it just call qdict_crumple(), get the
size of the resulting list and then discard it, but that seems a bit
over the top.

All in all, I think you're right in that this function can replace the
(potentially) worse qdict_array_split() function (if the caller can stop
it from recursing), but I don't know whether we can get rid of
qdict_array_entries() so easily.

Max

>>> +    if (isList) {
>>> +        if (listLen != (listMax + 1)) {
>>> +            error_setg(errp, "List indexes are not continuous, "
>>> +                       "saw %zu elements but %zu largest index",
>>> +                       listLen, listMax);
>>> +            goto error;
>>> +        }
>>> +
>>> +        for (i = 0; i < listLen; i++) {
>>> +            char *key = g_strdup_printf("%zu", i);
>>> +
>>> +            child = qdict_get(tmp2, key);
>>> +            g_free(key);
>>> +            if (!child) {
>>> +                error_setg(errp, "Unexpected missing list entry %zu", i);
>>> +                goto error;
>>> +            }
>>> +
>>> +            qobject_incref(child);
>>> +            qlist_append_obj((QList *)dst, child);
>>> +        }
>>> +    }
>>> +    QDECREF(tmp2);
>>> +
>>> +    return dst;
>>> +
>>> + error:
>>> +    QDECREF(tmp2);
>>> +    QDECREF(tmp1);
>>> +    qobject_decref(dst);
>>> +    return NULL;
>>> +}
>>> +
>>> +
> 
> Regards,
> Daniel
>

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v1 01/10] qdict: implement a qdict_crumple method for un-flattening a dict

Reply via email to