Hello,

I'm getting this problem with the PIP package avro-python3-1.9.0.

The package seems to have an issue with raw codec files containing no records 
(just a '0' block count), but which then following the empty block record with 
a sync marker. I've attached an example file but I'm not sure if it'll come 
through - let me know if you'd like it. it's been written by a process external 
to us.

The "avro-tools" package reads these kinds of files fine.

The problem files generate this traceback and assertion. Example code and 
traceback:


from avro.datafile import DataFileReader, DataFileWriter

with DataFileReader(open("28.avro", 'rb'), DatumReader()) as r:
    print(r.meta)
    for rec in r:
        print(rec)


Traceback (most recent call last):
  File "./test.py", line 31, in <module>
    for rec in r:
  File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/datafile.py", 
line 526, in __next__
    datum = self.datum_reader.read(self.datum_decoder)
  File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 
489, in read
    return self.read_data(self.writer_schema, self.reader_schema, decoder)
  File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 
534, in read_data
    return self.read_record(writer_schema, reader_schema, decoder)
  File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 
734, in read_record
    field_val = self.read_data(field.type, readers_field.type, decoder)
  File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 
512, in read_data
    return decoder.read_utf8()
  File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 
257, in read_utf8
    input_bytes = self.read_bytes()
  File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 
249, in read_bytes
    assert (nbytes >= 0), nbytes
AssertionError: -11


I think the issue is in the __next__ function of DataFileReader, which seems to 
assume that a datum will always follow a block header read. The following 
implementation fixes the bug for me. Is it correct?

  def __next__(self):
    """Return the next datum in the file."""
    while True:
        if self.block_count == 0:
            if self.is_EOF():
                raise StopIteration
            elif self._skip_sync():
                pass
            else:
                self._read_block_header()
        else:
            datum = self.datum_reader.read(self.datum_decoder)
            self._block_count -= 1
            return datum


Please also note that it seems two __next__ methods have been mistakenly put in 
this class.

Regards,
David

Bupa A&NZ email disclaimer: The information contained in this email and any 
attachments is confidential and may be subject to copyright or other 
intellectual property protection. If you are not the intended recipient, you 
are not authorized to use or disclose this information, and we request that you 
notify us by reply mail or telephone and delete the original message from your 
mail system.

Attachment: 28.avro
Description: 28.avro

Reply via email to