Re: RecordBatchFile with no batches, Error: Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

Uwe L. Korn Wed, 09 Jan 2019 07:49:28 -0800

Hello Ryan,

for CentOS and pip, I would recommend to use the docker scripts that we use to 
build the manylinux1 compatible wheels (the ones we also upload to PyPI): 
https://github.com/apache/arrow/tree/master/python/manylinux1 They will 
bootstrap an isolated environment in docker that is independent of your host 
system. The resulting wheels then work on all modern Linux systems with glibc 
(read "not on Alpine Linux"). Hope this helps with debugging on master.


Otherwise, conda is the recommended way to build and test the Arrow source code 
as outlined here: 
http://arrow.apache.org/docs/latest/python/development.html#development That's 
what all main (py)arrow developers use.

Cheers
Uwe

On Wed, Jan 9, 2019, at 3:10 PM, White4, Ryan (STATCAN) wrote:
> Thanks Wes,
> 
> I confirmed this is fixed in master. In the future, I'll check against 
> the master if we come across anything. We would be interested to use the 
> nightly builds, for sure. We do not use conda as of now, so may be best 
> to become more familiar with conda. I needed to get home to use my mac 
> because I could not get the build working properly on both CentOS/
> Fedora, possibly b/c I used pip. Failed when trying to run py.test.
> 
> Also, thanks very much for the planning document posted in December. 
> That has been an excellent resource.
> 
> Best, Ryan
> 
> 
> -----Original Message-----
> From: Wes McKinney [mailto:wesmck...@gmail.com] 
> Sent: Tuesday, January 8, 2019 3:16 PM
> To: dev@arrow.apache.org
> Subject: Re: RecordBatchFile with no batches, Error: 
> Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.
> 
> I think I fixed this in master. Are you able to build from source to try it 
> out?
> 
> I am hopeful that sometime this year my team and I can provide a conda
> channel with nightly Arrow builds to help with testing and development
> 
> On Tue, Jan 8, 2019 at 1:49 PM White4, Ryan (STATCAN)
> <ryan.whi...@canada.ca> wrote:
> >
> > Hi,
> >
> > I get an error when writing a file with no record batches. I came across 
> > this when implementing a simple way to spill the buffer to disk 
> > automatically (this is potentially coming in release 0.12???).
> >
> > I'm using pyarrow 0.11.
> > Is there a JIRA related to this, or is there a problem in this simple 
> > example below:
> >
> > my_schema = pa.schema([('field0', pa.int32())])
> > sink = pa.BufferOutputStream()
> > writer = pa.RecordBatchFileWriter(sink, my_schema)
> > writer.close()
> > buf = sink.getvalue()
> >
> > reader = pa.open_file(buf)
> > print(reader.schema)
> > print(reader.num_record_batches)
> >
> > Traceback...
> > Reader = pa.open_file(buf)
> > Pyarrow/ipc.py, line142, in open_file
> > Return RecordBatchFileReader(source, fotter_offset=footer_offset)
> > Pyarrow/ipc.py, line 89, in __init__
> > Self._open(source, footer_offset=fotter_offset)
> > Pyarrow/ipc.pxi, line 352
> > Pyarrow/error.pxi, line 81
> > Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.
> >
> > Thanks,
> > Ryan
> >
> >
> > Ryan Mackenzie White, Ph. D.
> >
> > Senior Research Analyst - Administrative Data Division, Analytical Studies, 
> > Methodology and Statistical Infrastructure Field
> > Statistics Canada / Government of Canada
> > ryan.whi...@canada.ca<mailto:ryan.whi...@canada.ca> / Tel: 613-608-0015
> >
> > Analyste principal de recherche- Division des données administratives, 
> > Secteur des études analytiques, de la méthodologie et de l'infrastructure 
> > statistique
> > Statistique Canada / Gouvernement du Canada
> > ryan.whi...@canada.ca<mailto:ryan.whi...@canada.ca> / Tél. : 613-608-0015
> >
> >
> >
> >

Re: RecordBatchFile with no batches, Error: Pyarrow.lib.ArrowInvalid: File is smaller than indicated metadata size.

Reply via email to