[ 
https://issues.apache.org/jira/browse/AVRO-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217747#comment-17217747
 ] 

Adam Bellemare commented on AVRO-2702:
--------------------------------------

I seem to have hit this issue when attempting to use Confluent's 6.0.0 version 
of Schema Registry, containing Avro 1.9.2 (just released last week). We have a 
very substantial amount of data written without the "avro.java.string" tags, 
that now always fail to resolve due to this issue. I think the impact of this 
is going to grow as more Kafka users move onto Avro 1.9.2, as many of us have 
created a ton of domain-events using Kafka Connect (with an older version of 
Avro to produce the records, where this wasn't an issue) and are now updating 
our consumers to keep up with the latest releases. 

Extra Info: Change-Data Capture mechanisms, like the popular Debezium 
([https://debezium.io/documentation/reference/1.3/connectors/mysql.html)] for 
Kafka Connect, have nested Optional Strings within Optional records. I have 
encountered this bug by using the maven SpecificRecord Generator plugin 
(setting <stringType>String</stringType> as I have used for a number of years 
now) to consume records produced by Kafka Connect's Debezium. 


Prior versions of Confluent's Schema Registry & Serdes didn't have this issue, 
and I would go so far as to say that this issue is a blocker for most Kafka 
Avro users. I am currently looking to just roll my own version of the jars to 
force our internaclients to be able to resolve this, but I think that if we 
(someone? anyone?) can get a fix in ASAP it would be worth rolling out an 
update just for this issue alone. I suspect many Kafka users upgrading to 
Confluent 6.0.0 will be here in short order wondering the same thing as me.

> Avro ResolvingGrammarGenerator does not honor "avro.java.string" property in 
> inner record schemas
> -------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-2702
>                 URL: https://issues.apache.org/jira/browse/AVRO-2702
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.1
>            Reporter: Thorsten Hake
>            Priority: Major
>              Labels: ClassCastException, Deserialize
>         Attachments: Bar.kt
>
>
> The type property "avro.java.string" is being used to qualify the 
> CharSequence implementation of a string type in java. This property will be 
> set in the java code generated by the avro maven plugin, if the <stringType> 
> property is set to "String".
> However the ResolvingGrammarGenerator, which helps in matching the writer 
> schema to the reader schema, does not honor this property for inner records 
> within unions. Instead of deserializing to java.lang.String, the strings of 
> the inner record will be deserialized to org.apache.avro.util.Utf8. String 
> properties belonging to the outer record will be correctly deserialized to 
> java.lang.String.
> If you try to deserialize an Avro record from a schema that has an inner 
> record within an union type with the java code generated by the maven plugin 
> (<stringType> is set to "String"), you'll get a ClassCastException:
> {noformat}
> Caused by: java.lang.ClassCastException: class org.apache.avro.util.Utf8 
> cannot be cast to class java.lang.String
> {noformat}
> This is because the generated java code expects the strings to be 
> deserialized according to the "avro.java.string" property which does not 
> happen for the inner record.
> I would expect that the deserializer treats the strings in the inner record 
> the same as the strings in the outer record.
> Example:
> writer schema:
> {code:json}
> {
>   "type": "record",
>   "name": "foo",
>   "fields": [
>     {
>       "name": "k",
>       "type": "string"
>     },
>     {
>       "name": "value",
>       "type": [
>         "null",
>         {
>           "type": "record",
>           "name": "bar",
>           "fields": [
>             {
>               "name": "str",
>               "type": "string"
>             }
>           ]
>         }
>       ]
>     }
>   ]
> }
> {code}
>  reader schema:
> {code:json}
> {
>   "type": "record",
>   "name": "foo",
>   "fields": [
>     {
>       "name": "k",
>       "type": {
>         "type": "string",
>         "avro.java.string": "String"
>       }
>     },
>     {
>       "name": "value",
>       "type": [
>         "null",
>         {
>           "type": "record",
>           "name": "bar",
>           "fields": [
>             {
>               "name": "str",
>               "type": {
>                 "type": "string",
>                 "avro.java.string": "String"
>               }
>             }
>           ]
>         }
>       ]
>     }
>   ]
> }
> {code}
> You'll find some example kotlin code demonstrating the problem in the 
> attached Bar.kt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to