[jira] [Commented] (AVRO-3408) Schema evolution with logical types

Ivan Zemlyanskiy (Jira) Mon, 04 Apr 2022 13:55:07 -0700


    [ 
https://issues.apache.org/jira/browse/AVRO-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517117#comment-17517117
 ]


Ivan Zemlyanskiy commented on AVRO-3408:
----------------------------------------

[~opwvhk], thank you for your precise description! 

Indeed in my PR I implemented the option 2
{quote}the reader applies logical type conversions on the data as described by 
the write schema (before / instead of schema resolution) 
{quote}
Whenever  a reader sees a logical type and a type mismatch it tries to find a 
chain of conversions to succeed reading. In other words it's 

for a field *N* If a reader schema is 

{ "name" : "N", "type" : "A", "logicalType" : "LT"}

a writer schema is 

{ "name" : "N", "type" : "B", "logicalType" : "LT"}

and if a logical type *LT* has a conversions to types *A* and *B* we have a 
conversion chain *B -> LT instance -> A*

In theory we may think even wider and apply the same rule for 2 different 
logical types: 

For a field *N* a reader schema is

{ "name" : "N", "type" : "A", "logicalType" : "LT1"}

a writer schema is 

{ "name" : "N", "type" : "B", "logicalType" : "LT2"}

and if the logical type *LT1* has conversions to *A* and *C,* the logical type 
has conversions to *B* and *C* we have a chain to succeed reading: *B -> LT2 -> 
C -> LT1 -> A*

IMHO, this way is pretty much straight forward because we use the contracts for 
the fields and the logical types, i.e. 
 # (The most important) This is the same field N, that means it's just 
different ways to represent the same information.
 # The logical type conversions are deterministic and don't loose any 
information during conversion from type to an instance and vise versa. 

at the end of the day any conversion for the same field should lead us to a 
valid result if conversion functions has no bugs. 
I took a look on [the spec part about logical 
types|https://avro.apache.org/docs/current/spec.html#Logical+Types] and I 
haven't found any word about conversions... IMHO, we have to put those 
conversion limitations like be deterministic to the spec as well.

As an Avro user, I can say, I would be grateful for better flexibility during 
schema  evolution (because it's always a pain).

> Schema evolution with logical types 
> ------------------------------------
>
>                 Key: AVRO-3408
>                 URL: https://issues.apache.org/jira/browse/AVRO-3408
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.11.0
>            Reporter: Ivan Zemlyanskiy
>            Assignee: Ivan Zemlyanskiy
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Hello!
> First of all, thank you for this project. I love Avro encoding from both 
> technology and code culture points of view. (y)
> I know you recommend migrating schema by adding a new field and removing the 
> old one in the future, but please-please-please consider my case as well. 
> In my company, we have some DTOs, and it's about 200+ fields in total that we 
> encode with Avro and send over the network. About a third of them have type 
> `java.math.BigDecimal`. At some point, we discovered we send them with a 
> schema like
> {code:json}
> {
>   "name":"performancePrice",
>   "type":{
>     "type":"string",
>     "java-class":"java.math.BigDecimal"
>   }
> }
> {code}
> That's a kind of disaster for us cos we have pretty much a high load with ~2 
> million RPS. 
> So we start to think about migrating to something lighter than strings (no 
> blame for choosing it as a default, I know BigDecimal has a lot of pitfalls, 
> and string is the easiest way for encoding/decoding).
> It was fine to make a standard precision for all such fields, so we found 
> `Conversions.DecimalConversion` and decided at the end of the day we were 
> going to use this logical type with a recommended schema like
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.BYTES);
>         LogicalTypes.Decimal decimalType =
>                 LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(), 
> DecimalUtils.MONEY_ROUNDING_SCALE);
>         decimalType.addToSchema(schema);
>         return schema;
>     }
> {code}
> (we use `org.apache.avro.reflect.ReflectData`)
> It all looks good and promising, but the question is how to migrate to such 
> schema? 
> As I said, we have a lot of such fields, and migrating all of them with 
> duplication fields with future removal might be painful and would cost us a 
> considerable overhead.
> I made some tests and found out if two applications register the same 
> `BigDecimalConversion` but for one application the `getRecommendedSchema()` 
> is like the method above and for another application the 
> `getRecommendedSchema()` is
> {code:java}
>     @Override
>     public Schema getRecommendedSchema() {
>         Schema schema = Schema.create(Schema.Type.STRING);
>         schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName());
>         return schema;
>     }
> {code}
> so they can easily read each other messages using _SERVER_ schema.
> So, I made two applications and wired them up with `ProtocolRepository`, 
> `ReflectResponder` and all that stuff, I found out it doesn't work. Because 
> `org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some 
> reason. 
> So as a result, one application specifically told "I encode this field as a 
> byte array which supposed to be a logical type 'decimal' with precision N", 
> but another application just tries to convert those bytes to a string and 
> make a BigDecimal based on the result string. As a result, we got
> {code:java}
> java.lang.NumberFormatException: Character ' is neither a decimal digit 
> number, decimal point, nor "e" notation exponential mark.
> {code}
> In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect 
> logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding 
> conversion instance for reading values. In my example, I'd say it might be 
> {code}
> ResolvingDecoder#readString() -> read the actual logical type -> find 
> BigDecimalConversion instance -> 
> conversion.fromBytes(readValueWithActualSchema()) -> 
> conversion.toCharSequence(readValueWithConversion)
> {code}
> I'd love to read your opinion on all of that. 
> Thank you in advance for your time, and sorry for the long issue description. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (AVRO-3408) Schema evolution with logical types

Reply via email to