Hi,
Which use of mmap are you referring to in the code base? Mmap in general could 
have a lot of different uses. The point of the paper you linked is that 
database management systems should explicitly manage their paging to and from 
disk to maintain transactional consistency or to avoid performance penalties if 
the working set doesn’t fit in memory. Arrow doesn’t care about the former. As 
for the latter, something like IPC might make good use of mmap. It could be mot 
even writing to a real file on disk but to a stream or even to another 
process’s address space. In that scenario mmap definitely does make sense. 

That’s not to say this isn’t something worth discussing, but I feel the paper’s 
results are much more nuanced than “we should remove mmap because mmap is bad”. 
It would help to have some specific instances to look at to see if it makes 
sense to switch to something else. 

Sasha Krassovsky

> 5 мая 2022 г., в 23:03, Alvin Chunga Mamani <al...@voltrondata.com> 
> написал(а):
> 
> Hi all,
> I start this discussion to comment on the change to disable the use of mmap
> by default, which represents a risk in non-local/pseudo file systems that
> can affect performance.
> Part of the solution would be to have a flag at the compilation level that
> allows you to activate or deactivate the use of mmap in arrow C++/pyarrow.
> Here in [1] an analysis on the use of mmap in Database Management System is
> presented
> 
> 
> Thanks.
> 
> [1] https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf

Reply via email to