It seems this could be due to our use of MAP_PRIVATE for read-only memory maps

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L393

Some more investigation would be required

On Wed, May 22, 2019 at 7:43 PM John Muehlhausen <j...@jgm.org> wrote:
>
> Is there an example somewhere of referring to the RecordBatch data in a 
> memory-mapped IPC File in a zero-copy manner?
>
> I tried to do this in Python and must be doing something wrong.  (I don't 
> really care whether the example is Python or C++)
>
> In the attached test, when I get to the first prompt and hit return, I get 
> the same content again.  Likewise when I hit return on the second prompt I 
> get the same content again.
>
> However, if before hitting return on the first prompt I issue:
>
> dd conv=notrunc if=/dev/urandom of=/tmp/test.batch bs=478 count=1
>
>
> i.e. overwrite the contents of the file, I get a garbled result.  (Replace 
> 478 with the size of your file.)
>
> However, if I wait until the second prompt to issue the dd command before 
> hitting return, I do not get an error.  Instead, batch.to_pandas() works the 
> same both before and after the data is overwritten.  This was not expected as 
> I thought that the batch object was looking at the file in-place, i.e. 
> zero-copy?
>
> Am I tying together the memory-mapping and the batch construction in the 
> wrong way?
>
> Thanks,
> John

Reply via email to