Re: Pandas Block Manager

Nicholas White Wed, 11 Nov 2020 14:52:45 -0800

Thanks all, this has been interesting. I've made a patch that sort-of does
what I want[1] - I hope the test case is clear! I made the batch writer use
the `alignment` field that was already in the `IpcWriteOptions` to align
the buffers, instead of fixing their alignment at 8. Arrow then writes out
the buffers consecutively, so you can map them as a 2D memory array like I
wanted. There's one problem though...the test case thinks the arrow data is
invalid as it can't read the metadata properly (error below). Do you have
any idea why? I think it's because Arrow puts the metadata at the end of
the file after the now-unaligned buffers yet assumes the metadata is still
8-byte aligned (which it probably no longer is).


Nick

````
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow/ipc.pxi:494: in pyarrow.lib.RecordBatchReader.read_all
    check_status(self.reader.get().ReadAll(&table))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise ArrowInvalid(message)
E   pyarrow.lib.ArrowInvalid: Expected to read 117703432 metadata bytes,
but only read 19
````

[1] https://github.com/apache/arrow/pull/8644

Re: Pandas Block Manager

Reply via email to