Hi, Which use of mmap are you referring to in the code base? Mmap in general could have a lot of different uses. The point of the paper you linked is that database management systems should explicitly manage their paging to and from disk to maintain transactional consistency or to avoid performance penalties if the working set doesn’t fit in memory. Arrow doesn’t care about the former. As for the latter, something like IPC might make good use of mmap. It could be mot even writing to a real file on disk but to a stream or even to another process’s address space. In that scenario mmap definitely does make sense.
That’s not to say this isn’t something worth discussing, but I feel the paper’s results are much more nuanced than “we should remove mmap because mmap is bad”. It would help to have some specific instances to look at to see if it makes sense to switch to something else. Sasha Krassovsky > 5 мая 2022 г., в 23:03, Alvin Chunga Mamani <al...@voltrondata.com> > написал(а): > > Hi all, > I start this discussion to comment on the change to disable the use of mmap > by default, which represents a risk in non-local/pseudo file systems that > can affect performance. > Part of the solution would be to have a flag at the compilation level that > allows you to activate or deactivate the use of mmap in arrow C++/pyarrow. > Here in [1] an analysis on the use of mmap in Database Management System is > presented > > > Thanks. > > [1] https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf