Thanks for answering Oscar,

Ok, just so I understand, you're basically saying that the Parsing Canonical 
Form doesn't currently include information on logical types and that's why it 
doesn't make sense to merge this now? Because the logical type info in the 
writer schema might have been stripped away and in that case the validation in 
this PR won't work? 

Or is it that we want to solve more cases of logical type evolution/validation 
than the one in this PR while we're at it, and for all of that to work the 
Parsing Canonical Form must be extended?

If it is the latter, I would argue for maybe separating this into two different 
issues - one for solving the (IMO) more pressing problem with following the 
spec for decimal types and another for looking at the entire scope of possible 
evolution between logical types.

//Katrin


On 2024-08-20, 09:47, "Oscar Westra van Holthe - Kind" <opw...@apache.org 
<mailto:opw...@apache.org>> wrote:


EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust 
the sender and know the content is safe.


Hi all,


With respect to schema evolution, the current rules allow only type changes
that have the exact same byte representation, and are guaranteed to fit.
Like int->long: thanks to the zigzag encoding, ints and smaller longs are
encoded to the same byte sequence.


When reading, type evolution is essentially pretending the underlying type
is different.


Evolution between the logical time types is therefore not possible, as the
bytes have a different meaning. Ditto for the scale of decimals: we store
the unscaled number, and use the scale to restore the decimal point. The
only possibility for additional evolution is increasing the precision of
decimals, as long as it fits the underlying type.


To make other forms of schema evolution possible, like changing the
parameters or the underlying type of a logical type, the parsing process
needs knowledge of both the read and write schemata. This goes beyond what
happens for evolving names, and is similar to what we do with default
values. However, this would mean that the Parsing Canonical Form is no
longer sufficient and needs to change: it does not contain any information
on (previous) logical types.


So in the current state of Avro, I'd vote not to merge this. We need to
decide on the future shape of the Parsing Canonical Form and schema
fingerprints first, to make these types of evolution possible.


Kind regards,
Oscar


-- 
Oscar Westra van Holthe - Kind <opw...@apache.org <mailto:opw...@apache.org>>


Op ma 19 aug. 2024 14:25 schreef Katrin Skoglund
<katrin.skogl...@avanza.se.inva <mailto:katrin.skogl...@avanza.se.inva>lid>:


> I'm more than happy to fix the merge conflicts in a new PR (unless the
> original PR author can pick it up and do it), but I want to check first if
> there is a chance that it can be merged at all. The last comment in the
> Jira says as follows:
>
> --- quote ---
> nandorKollar commented on issue #247: AVRO-2078: Avro does not enforce
> schema resolution rules for Decimal …
> URL: https://github.com/apache/avro/pull/247#issuecomment-398326638 
> <https://github.com/apache/avro/pull/247#issuecomment-398326638>
> Hey @big-andy-coates ! I didn't yet merge this PR, because the logical
> type schema resolution rules are not well defined, and there was some
> debate how we should handle it. The spec mentions an evolution rule for
> Decimal, but other types are not mentioned at all, therefore no evolution
> rule is defined for example Time -> Timestamp promotion. Is it allowed? How
> should Avro interpret the value? Right now if one uses generated classes,
> the methods will return wrong values, because the in memory representation
> is incorrect!
> The current implementation doesn't conform with the specification, because
> it omits changes in the decimal parameters, but I'm not sure if we should
> change the spec, or get this PR merged. I found a discussion on AVRO-1721
> (actually you're the last one who commented it), which is about similar
> topic, and looks like there was no consensus about the correct approach.
> Could you please open this topic for discussion on the dev list?
> --- end quote ---
>
> IMO there is no need to increase the scope like this. The questions
> regarding other logical types are not problematic in practice since any
> changes in those will break compilation in Java. Moreover, they are not
> mentioned in the spec. The problem with Decimal is that it will have the
> same representation in Java (BigDecimal), so everything will keep working,
> only the data will be wrong.
>
> //Katrin
>
>
>
> On 2024-08-19, 13:56, "Martin Grigorov" <mgrigo...@apache.org 
> <mailto:mgrigo...@apache.org> <mailto:
> mgrigo...@apache.org <mailto:mgrigo...@apache.org>>> wrote:
>
>
> EXTERNAL EMAIL: Do not click any links or open any attachments unless you
> trust the sender and know the content is safe.
>
>
> Hi,
>
>
> On Mon, Aug 19, 2024 at 2:51 PM Katrin Skoglund
> <katrin.skogl...@avanza.se.inva <mailto:katrin.skogl...@avanza.se.inva> 
> <mailto:katrin.skogl...@avanza.se.inva 
> <mailto:katrin.skogl...@avanza.se.inva>>lid>
> wrote:
>
>
> > Hey folks,
> >
> > I’m wondering about this old Jira and PR from 2017 (
> > https://issues.apache.org/jira/browse/AVRO-2078 
> > <https://issues.apache.org/jira/browse/AVRO-2078> <
> https://issues.apache.org/jira/browse/AVRO-2078> 
> <https://issues.apache.org/jira/browse/AVRO-2078&gt;>). It’s a bit scary to
> > work with decimal types in Avro from Java since changes in precision and
> > scale are not considered incompatible but will royally mess up the data
> if
> > introduced. We have implemented our own validation for this and I was
> > considering adding that as a PR when I found this one.
> >
> > The Avro spec says “For the purposes of schema resolution, two schemas
> > that are decimal logical types match if their scales and precisions
> match”,
> > so that’s pretty clear IMO.
> >
> > My question is basically:
> > Is there a chance of getting this validation merged? If not as-is, then
> > maybe modified in some way (like making logical types validation
> > configurable).
> >
>
>
> I am not very familiar with the Java SDK, so I cannot comment on the
> technical solution but the PR has 5 files with merge conflicts, so it
> cannot be merged "as-is" for sure.
> Someone will have to rebase it to main branch!
>
>
> Martin
>
>
>
>
>
>
> >
> > I’m happy to help with the PR or create a new one if needed.
> >
> > Best,
> > Katrin
> >
>
>
>
>



Reply via email to