[ 
https://issues.apache.org/jira/browse/AVRO-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043113#comment-17043113
 ] 

Michael A. Smith commented on AVRO-2748:
----------------------------------------

I think you're suggesting we could match schemas once and for all when we 
instantiate the DatumReader. Is that what you're saying? I'm not sure how to 
reconcile that for cases where the schema could match, but may not, depending 
on the datum. Here's an example:

{code:python}
#!/usr/bin/env python3

import io, avro.io, avro.schema

##
# Define the schema and the DatumReader and Writer.
wsc=avro.schema.parse('["int", "string"]')
rsc=avro.schema.parse('"int"')
w=avro.io.DatumWriter(wsc)
r=avro.io.DatumReader(wsc, rsc)

##
# Define the encoder and write the integer 12 to it.
enc=avro.io.BinaryEncoder(io.BytesIO())
w.write(12, enc)

##
# Define the decoder and read the integer 12 from it.
dec=avro.io.BinaryDecoder(io.BytesIO(enc.writer.getvalue()))
print(r.read(dec))

##
# Clear the write buffer and then write the string "hello" to it.
enc.writer.truncate(0)
enc.writer.seek(0)
w.write("hello", enc)

##
# Redefine the decoder and read the string from it.
# This fails because the reader schema expects an integer. 
dec=avro.io.BinaryDecoder(io.BytesIO(enc.writer.getvalue()))
print(r.read(dec))
{code}

The way this fails in the second case is with a {{SchemaResolutionException}} 
saying the schemas don't match. But in the first case, the schemas do match. 
But of course they're the same schemas, it's just that the datum makes the 
difference.

Am I misunderstanding what you're describing?

> python schema resolution occurs on every read
> ---------------------------------------------
>
>                 Key: AVRO-2748
>                 URL: https://issues.apache.org/jira/browse/AVRO-2748
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.9.2
>            Reporter: Erik Erlandson
>            Priority: Minor
>
> In python, the schema resolution appears to be happening on each read 
> operation. I'm not an avro expert but in my perusing through the python io 
> code I haven't yet noticed a reason that the schema resolution couldn't 
> happen once up front, during the construction of DataFileReader, when it 
> first loads the write_schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to