David Hildenbrand <da...@redhat.com> writes: > On 14.11.18 18:38, Markus Armbruster wrote: >> David Hildenbrand <da...@redhat.com> writes: >> >>> The input visitor has some problems right now, especially >>> - unsigned type "Range" is used to process signed ranges, resulting in >>> inconsistent behavior and ugly/magical code >>> - uint64_t are parsed like int64_t, so big uint64_t values are not >>> supported and error messages are misleading >>> - lists/ranges of int64_t are accepted although no list is parsed and >>> we should rather report an error >>> - lists/ranges are preparsed using int64_t, making it hard to >>> implement uint64_t values or uint64_t lists >>> - types that don't support lists don't bail out >> >> Known weirdness: empty list is invalid (test-string-input-visitor.c >> demonstates). Your patch doesn't change that (or else it would update >> the test). Should it be changed? >> > > I don't change the test, so the old behavior still works. > (empty string -> error)
Understand. Design question: should it remain an error? Feel free to declare the question out of scope for this patch. >>> So let's rewrite it by getting rid of usage of the type "Range" and >>> properly supporting lists of int64_t and uint64_t (including ranges of >>> both types), fixing the above mentioned issues. >>> >>> Lists of other types are not supported and will properly report an >>> error. Virtual walks are now supported. >>> >>> Tests have to be fixed up: >>> - Two BUGs were hardcoded that are fixed now >>> - The string-input-visitor now actually returns a parsed list and not >>> an ordered set. >>> >>> While at it, do some smaller style changes (e.g. use g_assert). >> >> Please don't replace assert() by g_assert() in files that consistently >> use assert() now. > > Alright, will use assert() and don't replace. (I thought g_assert() was > the new fancy thing to do it in QEMU). > >> In my opinion, g_assert() is one of the cases where GLib pointlessly >> reinvents wheels that have been carrying load since forever. >> >>> Signed-off-by: David Hildenbrand <da...@redhat.com> >>> --- >>> include/qapi/string-input-visitor.h | 3 +- >>> qapi/string-input-visitor.c | 436 +++++++++++++++++----------- >>> tests/test-string-input-visitor.c | 18 +- >>> 3 files changed, 264 insertions(+), 193 deletions(-) >>> >>> diff --git a/include/qapi/string-input-visitor.h >>> b/include/qapi/string-input-visitor.h >>> index 33551340e3..2c40f7f5a6 100644 >>> --- a/include/qapi/string-input-visitor.h >>> +++ b/include/qapi/string-input-visitor.h >>> @@ -19,8 +19,7 @@ typedef struct StringInputVisitor StringInputVisitor; >>> >>> /* >>> * The string input visitor does not implement support for visiting >>> - * QAPI structs, alternates, null, or arbitrary QTypes. It also >>> - * requires a non-null list argument to visit_start_list(). >>> + * QAPI structs, alternates, null, or arbitrary QTypes. >> >> Preexisting: neglects to spell out the list limitations, i.e. can do >> only flat lists of integers. Care do fix that, too? > > Added "Only flat lists of integers (int64/uint64) are supported." Hmm, do lists of narrower integer types also work? I guess they do: the narrower visit_type_*int*() call v->type_*int64() via visit_type_*intN(). Lists of type size are expressly excluded, in parse_type_size() below. That's okay, we can lift the restriction when it gets in the way. >>> */ >>> Visitor *string_input_visitor_new(const char *str); >>> >>> diff --git a/qapi/string-input-visitor.c b/qapi/string-input-visitor.c >>> index dee708d384..16da47409e 100644 >>> --- a/qapi/string-input-visitor.c >>> +++ b/qapi/string-input-visitor.c >>> @@ -4,10 +4,10 @@ >>> * Copyright Red Hat, Inc. 2012-2016 >>> * >>> * Author: Paolo Bonzini <pbonz...@redhat.com> >>> + * David Hildenbrand <da...@redhat.com> >>> * >>> * This work is licensed under the terms of the GNU LGPL, version 2.1 or >>> later. >>> * See the COPYING.LIB file in the top-level directory. >>> - * >>> */ >>> >>> #include "qemu/osdep.h" >>> @@ -18,21 +18,39 @@ >>> #include "qapi/qmp/qerror.h" >>> #include "qapi/qmp/qnull.h" >>> #include "qemu/option.h" >>> -#include "qemu/queue.h" >>> -#include "qemu/range.h" >>> #include "qemu/cutils.h" >>> >>> +typedef enum ListMode { >>> + /* no list parsing active / no list expected */ >>> + LM_NONE, >>> + /* we have an unparsed string remaining */ >>> + LM_UNPARSED, >>> + /* we have an unfinished int64 range */ >>> + LM_INT64_RANGE, >>> + /* we have an unfinished uint64 range */ >>> + LM_UINT64_RANGE, >>> + /* we have parsed the string completely and no range is remaining */ >>> + LM_END, >>> +} ListMode; >>> + >>> +typedef union RangeLimit { >>> + int64_t i64; >>> + uint64_t u64; >>> +} RangeLimit; >>> >>> struct StringInputVisitor >>> { >>> Visitor visitor; >>> >>> - GList *ranges; >>> - GList *cur_range; >>> - int64_t cur; >>> + /* Porperties related to list processing */ >> >> "Properties" >> >> Suggest >> >> /* List parsing state */ > > Yes, will use that. > >> >>> + ListMode lm; >>> + RangeLimit rangeNext; >>> + RangeLimit rangeEnd; >> >> RangeLimit is a good name for rangeEnd, but not so hot for rangeNext. >> Naming is hard. > > Initially I had two unnamed unions to not have to give a name ;) > > RangeElement ? Will use that for now. Works for me. >> >>> + const char *unparsed_string; >>> + void *list; >> >> You drop /* Only needed for sanity checking the caller */. >> Double-checking: is that not true anymore? > > I consider such comments bad as they can easily become outdated (and > IMHO don't help anybody). But I can leave it here, don't really care. I'm okay with dropping it. > [...] > >>> + error_setg(errp, "Already processing a list."); >> >> Drop the period, please. error_setg()'s contract stipulates: >> >> * The resulting message should be a single phrase, with no newline or >> * trailing punctuation. >> >> More of the same below. > > All dots dropped :) > > [...] >>> static GenericList *next_list(Visitor *v, GenericList *tail, size_t size) >>> { >>> StringInputVisitor *siv = to_siv(v); >>> - Range *r; >>> - >>> - if (!siv->ranges || !siv->cur_range) { >>> - return NULL; >>> - } >>> >>> - r = siv->cur_range->data; >>> - if (!r) { >>> + switch (siv->lm) { >>> + case LM_NONE: >>> + case LM_END: >>> + /* we have reached the end of the list already or have no list */ >>> return NULL; >> >> Hmm. Is there a way to reach case LM_NONE other than a programming >> error? If no, we should abort then. To do so, fold LM_NONE into the >> default case below. > > Alright, all programming errors will be encoded as asserts. Yes, please. In my experience, distinguishing between programming errors and other errors really helps with understanding the code. >> >>> - } >>> - >>> - if (!range_contains(r, siv->cur)) { >>> - siv->cur_range = g_list_next(siv->cur_range); >>> - if (!siv->cur_range) { >>> - return NULL; >>> - } >>> - r = siv->cur_range->data; >>> - if (!r) { >>> - return NULL; >>> - } >>> - siv->cur = range_lob(r); >>> + case LM_INT64_RANGE: >>> + case LM_UINT64_RANGE: >>> + case LM_UNPARSED: >>> + /* we have an unparsed string or something left in a range */ >>> + break; >>> + default: >>> + g_assert_not_reached(); >> >> abort() suffices, and is more consistent with the rest of qapi/. > > Right, qapi is special ;) Sort of. I like keeping things consistent and simple. Some parts of QEMU consistently use plain assert(), some parts consistently use g_assert(), and some mix them up with abandon. I'm fighting a rearguard action against the latter. > Will use s/g_assert/assert/ > s/g_assert_not_reached/abort/ Thanks! >> >>> } >>> >>> tail->next = g_malloc0(size); >>> @@ -179,88 +109,216 @@ static GenericList *next_list(Visitor *v, >>> GenericList *tail, size_t size) >>> static void check_list(Visitor *v, Error **errp) >>> { >>> const StringInputVisitor *siv = to_siv(v); >>> - Range *r; >>> - GList *cur_range; >>> >>> - if (!siv->ranges || !siv->cur_range) { >>> + switch (siv->lm) { >>> + case LM_NONE: >>> + error_setg(errp, "Not processing a list."); >> >> Same question as for next_list(). > > abort() it is :) -> default case > >> >>> + case LM_INT64_RANGE: >>> + case LM_UINT64_RANGE: >>> + case LM_UNPARSED: >>> + error_setg(errp, "There are elements remaining in the list."); >> >> Hmm. qobject_input_check_list() reports "Only %u list elements expected >> in %s" here. Looks unintuitive, until you remember how it's normally >> used: to parse user input. The error gets reported when the parsing >> didn't consume the whole list. Can only happen with a virtual walk. >> And when it happens, the issue to report is "you provided a longer list >> than I can accept". qobject_input_check_list() does exactly that. Your >> message is less clear. > > We don't keep track of element count (and I don't like adding it just to > print a error message). The QObject input visitor has more of a need for identifying the offending part of the input precisely, because it supports nesting. > What about "Less list elements expected"? That at least matches the > other error. Good enough. I'd say "fewer", though. >> >>> return; >>> - } >>> - >>> - r = siv->cur_range->data; >>> - if (!r) { >>> + case LM_END: >>> return; >>> + default: >>> + g_assert_not_reached(); >>> } >>> - >>> - if (!range_contains(r, siv->cur)) { >>> - cur_range = g_list_next(siv->cur_range); >>> - if (!cur_range) { >>> - return; >>> - } >>> - r = cur_range->data; >>> - if (!r) { >>> - return; >>> - } >>> - } >>> - >>> - error_setg(errp, "Range contains too many values"); >>> } >>> >>> static void end_list(Visitor *v, void **obj) >>> { >>> StringInputVisitor *siv = to_siv(v); >>> >>> - assert(siv->list == obj); >> >> I suspect state LM_NONE is also a programming error here. If that's >> correct, let's assert(siv->lm != LM_NONE). > > Indeed, added. > > [...] >>> + if (siv->rangeNext.i64 > siv->rangeEnd.i64 || *obj == INT64_MAX) { >>> + /* end of range, check if there is more to parse */ >>> + if (siv->unparsed_string[0]) { >>> + siv->lm = LM_UNPARSED; >>> + } else { >>> + siv->lm = LM_END; >>> + } >>> } >> >> Are you sure this is kosher? >> >> if siv->rangeNext.i64 and siv->rangeEnd are both initially INT64_MAX, >> siv->rangeNext++ wraps around to INT64_MIN. Except when the compiler >> exploits undefined behavior. > > *obj == INT64_MAX checks exactly that > > (it is the old value of siv->rangeNext.i64 before a possible wraparound). By the time you check *obj == INT64_MAX, you already executed siv->rangeNext++ with undefined behavior. But wait... your code is fine because we compile with -fwrapv. > BTW I added a test case for exactly that as I had the BUG you described > in the code ;) > > [...] >>> + return; >>> + case LM_UNPARSED: >>> + if (try_parse_int64_list_entry(siv, obj)) { >>> + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, name ? name : >>> "null", >>> + "list of int64 values or ranges"); >>> + } >> >> I figure I'd make try_parse_int64_list_entry() just parse, and on >> success fall through to case LM_INT64_RANGE. But your solution works, >> too. > > Then we would have to represent even single values as ranges, which is > something I'd like to avoid. Your artistic license applies. >>> + return; >>> + case LM_END: >>> + error_setg(errp, "No more elements in the list."); >> >> This is the buddy of check_list() case LM_UNPARSED etc. There, input >> provides more list members than we expect. Here, input provides less. >> Again, the error is less clear than the one the QObject input visitor >> reports: "Parameter '%s' is missing". >> >> Same for the unsigned copy below. > > Will also use "Less list elements expected". Good enough. I'd say "fewer", though. > "Parameter '%s' is missing" does not really apply to list elements > without a name. We can discuss when I resend. The QObject input visitor uses the list index as name, see full_name_nth(). I'm *not* suggesting you implement that for the string input visitor. >> >>> + return; >>> + default: >>> + error_setg(errp, "Lists don't support mixed types."); >> >> Is this a programming error? >> >> Same for the unsigned copy below. > > abort() it is. > > [...] >>> + >>> + if (siv->rangeNext.u64 > siv->rangeEnd.u64 || *obj == UINT64_MAX) { >>> + /* end of range, check if there is more to parse */ >>> + if (siv->unparsed_string[0]) { >>> + siv->lm = LM_UNPARSED; >>> + } else { >>> + siv->lm = LM_END; >>> + } >>> + } >> >> Unsigned wraps around fine. But when you fix the signed code, you >> should probably keep this unsigned code as similar as practical. > > Again, *obj == UINT64_MAX checks this. > >> >>> + return; >>> + case LM_UNPARSED: >>> + if (try_parse_uint64_list_entry(siv, obj)) { >>> + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, name ? name : >>> "null", >>> + "list of uint64 values or ranges"); >>> + } >>> + return; >>> + case LM_END: >>> + error_setg(errp, "No more elements in the list."); >>> + return; >>> + default: >>> + error_setg(errp, "Lists don't support mixed types."); >>> + return; >>> } >>> } >>> >>> @@ -271,6 +329,11 @@ static void parse_type_size(Visitor *v, const char >>> *name, uint64_t *obj, >>> Error *err = NULL; >>> uint64_t val; >>> >>> + if (siv->lm != LM_NONE) { >>> + error_setg(errp, "Lists not supported for type \"size\""); >>> + return; >>> + } >>> + >> >> Programming error? More of the same below. > > assert() it is. > > [...] >>> visit_type_uint64List(v, NULL, &res, &error_abort); >>> tail = res; >>> for (i = 0; i < n; i++) { >>> @@ -117,10 +107,10 @@ static void check_ulist(Visitor *v, uint64_t >>> *expected, size_t n) >>> static void test_visitor_in_intList(TestInputVisitorData *data, >>> const void *unused) >>> { >>> - /* Note: the visitor *sorts* ranges *unsigned* */ >>> - int64_t expect1[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 20 }; >>> + int64_t expect1[] = { 1, 2, 0, 2, 3, 4, 20, 5, 6, 7, 8, 9, 1, 2, 3, 4, >>> 5, >> >> Please wrap this line a bit earlier. > > Will do. > >> >>> + 6, 7, 8 }; >>> int64_t expect2[] = { 32767, -32768, -32767 }; >>> - int64_t expect3[] = { INT64_MAX, INT64_MIN }; >>> + int64_t expect3[] = { INT64_MIN, INT64_MAX }; >> >> Did you mean to change this line? >> > > Yes, we don't sort the list anymore. D'oh! >>> uint64_t expect4[] = { UINT64_MAX }; >>> Error *err = NULL; >>> int64List *res = NULL; >>> @@ -189,7 +179,7 @@ static void >>> test_visitor_in_intList(TestInputVisitorData *data, >>> visit_type_int64(v, NULL, &tail->value, &err); >>> g_assert_cmpint(tail->value, ==, 0); >>> visit_type_int64(v, NULL, &val, &err); >>> - g_assert_cmpint(val, ==, 1); /* BUG */ >>> + error_free_or_abort(&err); >> >> Which of the problems listed in commit message is this? > > I guess the not mentioned "visiting beyond the end of a list is not > handled properly" bug (most probably due to old range handling). Will > add it to the description, Thanks! >> >>> visit_check_list(v, &error_abort); >>> visit_end_list(v, (void **)&res); >> >> Please don't let my review comments discourage you! This is so much >> better than the code we have now, and feels pretty close to ready. >> Thanks a lot! >> > > I appreciate people reviewing my code :) So don't worry, I'll do > requested changes and resend, then we'll see what we are still missing. > > BTW: I'll add support for single-element ranges (1-1) as the previous > code seemed to accepted that. Good idea.