RecordBatchFile with no batches, Error: Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

White4, Ryan (STATCAN) Tue, 08 Jan 2019 11:49:58 -0800

Hi,

I get an error when writing a file with no record batches. I came across this 
when implementing a simple way to spill the buffer to disk automatically (this 
is potentially coming in release 0.12???).


I'm using pyarrow 0.11.
Is there a JIRA related to this, or is there a problem in this simple example 
below:

my_schema = pa.schema([('field0', pa.int32())])
sink = pa.BufferOutputStream()
writer = pa.RecordBatchFileWriter(sink, my_schema)
writer.close()
buf = sink.getvalue()

reader = pa.open_file(buf)
print(reader.schema)
print(reader.num_record_batches)

Traceback...
Reader = pa.open_file(buf)
Pyarrow/ipc.py, line142, in open_file
Return RecordBatchFileReader(source, fotter_offset=footer_offset)
Pyarrow/ipc.py, line 89, in __init__
Self._open(source, footer_offset=fotter_offset)
Pyarrow/ipc.pxi, line 352
Pyarrow/error.pxi, line 81
Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

Thanks,
Ryan


Ryan Mackenzie White, Ph. D.

Senior Research Analyst - Administrative Data Division, Analytical Studies, 
Methodology and Statistical Infrastructure Field
Statistics Canada / Government of Canada
ryan.whi...@canada.ca<mailto:ryan.whi...@canada.ca> / Tel: 613-608-0015

Analyste principal de recherche- Division des données administratives, Secteur 
des études analytiques, de la méthodologie et de l'infrastructure statistique
Statistique Canada / Gouvernement du Canada
ryan.whi...@canada.ca<mailto:ryan.whi...@canada.ca> / Tél. : 613-608-0015

RecordBatchFile with no batches, Error: Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

Reply via email to