Hello, I'm using R package arrow_7.0.0.tar.gz, in R 4.1.1, on Linux (Ubuntu 18.04.4 LTS).
In R, I am mmap-ing many small Arrow files by calling arrow::read_feather() with as_data_frame=FALSE on each one. Compressed with lz4, each file is quite small, often only 25 kB or so, but I'll often be mmap-ing many thousands of them. From the time this takes, I suspect that Arrow is reading the full contents of each file rather than just setting up the mmap, but I don't know how to properly check that. I would like to make sure that at this stage, I JUST mmap each file, and defer reading their data until later when I actually need it. Are there any settings or arguments I can use to make sure that happens? Or ways to verify precisely what is happening? I think I found the relevant C++ code in "r/src/io.cpp" and "cpp/src/arrow/io/file.cc", but I definitely don't understand its performance implications, nor how to control this sort of thing. Thanks for your help and advice! -- Andrew Piskorski <a...@piskorski.com>