Re: C++ AVRO-3601 broke custom non-string attributes

John Dickson Tue, 06 Aug 2024 14:18:48 -0700

OK, I am subscribed now.
I've raised 2 PRs with possible implementations to select between.


PR3064 custom attributes that are simple string values are stored as the
value, all other types are stored as a string containing a JSON fragment.
This means you can't be certain if the value was a string that contained
JSON or if it was the JSON value itself, but it avoids changing the current
behavior.
PR3069 custom attributes are always stored in JSON encoding, avoiding any
ambiguity. This changes the existing behavior as string values will be
quoted.

Thanks,
John

On Wed, Jul 31, 2024 at 9:14 AM Martin Grigorov <mgrigo...@apache.org>
wrote:

> Adding John to CC because he is not subscribed.
>
> On Tue, Jul 30, 2024 at 8:32 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
>> Hey John,
>>
>> Would you be interested in fixing this? Feel free to open up a pull
>> request
>> on Github including some tests to capture this behavior.
>>
>> Kind regards,
>> Fokko
>>
>> Op di 30 jul 2024 om 17:44 schreef John Dickson <jmdick...@gmail.com>:
>>
>> > AVRO-3601 fixed the install issue, but any schema with non-string custom
>> > attributes now fails to compile with an exception.
>> > For example the following IDL generates a schema that cannot be used in
>> C++
>> > 1.11.2, while it worked in earlier versions:
>> >
>> > protocol A {
>> > record Test {
>> >   string @extra({"custom1":"value", "custom2": true}) f1;
>> > }
>> > }
>> >
>> > schema:
>> > {
>> >   "type": "record",
>> >   "name": "Test",
>> >   "fields": [ {
>> >     "name": "f1",
>> >     "type": "string",
>> >     "extra": {
>> >       "custom1": "value",
>> >       "custom2": true
>> >     }
>> >   } ]
>> > }
>> >
>> > The following patch allows the schema to compile while maintaining the
>> > current behavior for string attributes, but leaves it ambiguous if the
>> > content of the string is also a valid JSON value.
>> >
>> >
>> > ~/projects/libraries/avro$ git diff lang/c++/impl/Compiler.cc
>> > lang/c++/test/SchemaTests.cc
>> > diff --git a/lang/c++/impl/Compiler.cc b/lang/c++/impl/Compiler.cc
>> > index 383798c4d..e5990e6e2 100644
>> > --- a/lang/c++/impl/Compiler.cc
>> > +++ b/lang/c++/impl/Compiler.cc
>> > @@ -275,7 +275,11 @@ static void getCustomAttributes(const Object& m,
>> > CustomAttributes &customAttribu
>> >    const std::unordered_set<std::string>& kKnownFields =
>> getKnownFields();
>> >    for (const auto &entry : m) {
>> >      if (kKnownFields.find(entry.first) == kKnownFields.end()) {
>> > -      customAttributes.addAttribute(entry.first,
>> > entry.second.stringValue());
>> > +      if (entry.second.type() == json::EntityType::String) {
>> > +        customAttributes.addAttribute(entry.first,
>> > entry.second.stringValue());
>> > +      } else {
>> > +        customAttributes.addAttribute(entry.first,
>> > entry.second.toString());
>> > +      }
>> >      }
>> >    }
>> >  }
>> > diff --git a/lang/c++/test/SchemaTests.cc b/lang/c++/test/SchemaTests.cc
>> > index d73b759cf..ee1014510 100755
>> > --- a/lang/c++/test/SchemaTests.cc
>> > +++ b/lang/c++/test/SchemaTests.cc
>> > @@ -112,6 +112,8 @@ const char *basicSchemas[] = {
>> >      "{\"type\": \"record\",\"name\": \"Test\",\"fields\": "
>> >          "[{\"name\": \"f1\",\"type\": \"long\","
>> >          "\"extra field1\": \"1\",\"extra field2\": \"2\"}]}"
>> > +    ,R"({"type": "record","name": "Test","fields":
>> > +        [{"name": "f1","type": "string", "extra": {"custom1":
>> > "value","custom2": true }}]})"
>> >  };
>> >
>> >  const char *basicSchemaErrors[] = {
>> >
>> >
>> > To get rid of the ambiguity at the cost of altering the existing
>> behavior,
>> > you could just change Compiler.cc line 278 from:
>> > customAttributes.addAttribute(entry.first, entry.second.stringValue());
>> > to:
>> > customAttributes.addAttribute(entry.first, entry.second.toString());
>> >
>> > Thanks,
>> > John
>> >
>>
>

Re: C++ AVRO-3601 broke custom non-string attributes

Reply via email to