Hello, I'm using R package arrow_7.0.0.tar.gz, in R 4.1.1, on Linux
(Ubuntu 18.04.4 LTS).

In R, I am mmap-ing many small Arrow files by calling arrow::read_feather()
with as_data_frame=FALSE on each one.  Compressed with lz4, each file
is quite small, often only 25 kB or so, but I'll often be mmap-ing
many thousands of them.  From the time this takes, I suspect that
Arrow is reading the full contents of each file rather than just
setting up the mmap, but I don't know how to properly check that.

I would like to make sure that at this stage, I JUST mmap each file,
and defer reading their data until later when I actually need it.  Are
there any settings or arguments I can use to make sure that happens?
Or ways to verify precisely what is happening?

I think I found the relevant C++ code in "r/src/io.cpp" and
"cpp/src/arrow/io/file.cc", but I definitely don't understand its
performance implications, nor how to control this sort of thing.

Thanks for your help and advice!

-- 
Andrew Piskorski <a...@piskorski.com>

Reply via email to