I don't think that is it.  I changed my mmap to MAP_PRIVATE in the first
raw mmap test and the dd changes are still visible.  I also changed to
storing the stream format instead of the file format and got the same
result.

Where is the code that constructs a buffer/array by pointing it into the
mmap space instead of by allocating space?  Sorry I'm so confused about
this, I just don't see how it is supposed to work.

On Wed, May 22, 2019 at 7:58 PM Wes McKinney <wesmck...@gmail.com> wrote:

> It seems this could be due to our use of MAP_PRIVATE for read-only memory
> maps
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L393
>
> Some more investigation would be required
>
> On Wed, May 22, 2019 at 7:43 PM John Muehlhausen <j...@jgm.org> wrote:
> >
> > Is there an example somewhere of referring to the RecordBatch data in a
> memory-mapped IPC File in a zero-copy manner?
> >
> > I tried to do this in Python and must be doing something wrong.  (I
> don't really care whether the example is Python or C++)
> >
> > In the attached test, when I get to the first prompt and hit return, I
> get the same content again.  Likewise when I hit return on the second
> prompt I get the same content again.
> >
> > However, if before hitting return on the first prompt I issue:
> >
> > dd conv=notrunc if=/dev/urandom of=/tmp/test.batch bs=478 count=1
> >
> >
> > i.e. overwrite the contents of the file, I get a garbled result.
> (Replace 478 with the size of your file.)
> >
> > However, if I wait until the second prompt to issue the dd command
> before hitting return, I do not get an error.  Instead, batch.to_pandas()
> works the same both before and after the data is overwritten.  This was not
> expected as I thought that the batch object was looking at the file
> in-place, i.e. zero-copy?
> >
> > Am I tying together the memory-mapping and the batch construction in the
> wrong way?
> >
> > Thanks,
> > John
>

Reply via email to