Hello, I'm getting this problem with the PIP package avro-python3-1.9.0.
The package seems to have an issue with raw codec files containing no records (just a '0' block count), but which then following the empty block record with a sync marker. I've attached an example file but I'm not sure if it'll come through - let me know if you'd like it. it's been written by a process external to us. The "avro-tools" package reads these kinds of files fine. The problem files generate this traceback and assertion. Example code and traceback: from avro.datafile import DataFileReader, DataFileWriter with DataFileReader(open("28.avro", 'rb'), DatumReader()) as r: print(r.meta) for rec in r: print(rec) Traceback (most recent call last): File "./test.py", line 31, in <module> for rec in r: File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/datafile.py", line 526, in __next__ datum = self.datum_reader.read(self.datum_decoder) File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 489, in read return self.read_data(self.writer_schema, self.reader_schema, decoder) File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 534, in read_data return self.read_record(writer_schema, reader_schema, decoder) File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 734, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 512, in read_data return decoder.read_utf8() File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 257, in read_utf8 input_bytes = self.read_bytes() File "/home/dbeswick/.local/lib/python3.6/site-packages/avro/io.py", line 249, in read_bytes assert (nbytes >= 0), nbytes AssertionError: -11 I think the issue is in the __next__ function of DataFileReader, which seems to assume that a datum will always follow a block header read. The following implementation fixes the bug for me. Is it correct? def __next__(self): """Return the next datum in the file.""" while True: if self.block_count == 0: if self.is_EOF(): raise StopIteration elif self._skip_sync(): pass else: self._read_block_header() else: datum = self.datum_reader.read(self.datum_decoder) self._block_count -= 1 return datum Please also note that it seems two __next__ methods have been mistakenly put in this class. Regards, David Bupa A&NZ email disclaimer: The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.
28.avro
Description: 28.avro