Hi Everyone,

I've encountered a memory mapping error when attempting to read a parquet
file to a Pandas DataFrame. It seems to be happening intermittently though,
I've so far encountered it once. In my case the pq.read_table code is being
invoked in a Linux docker container. I had a look at the docs for the
PyArrow memory and IO management here:
https://arrow.apache.org/docs/python/memory.html

What could give rise to the stacktrace below?

File "read_file.py", line 173, in load_chunked_data return
pq.read_table(data_obj_path, columns=columns).to_pandas()File
"/opt/anaconda-python-5.0.1/lib/python2.7/site-packages/pyarrow/parquet.py",
line 890, in read_table pf = ParquetFile (source,
metadata=metadata)File
"/opt/anaconda-python-5.0.1/lib/python2.7/site-packages/pyarrow/parquet.py",
line 56, in __init__ self.reader.open(source, metadata=metadata)File
"pyarrow/_parquet.pyx", line 624, in
pyarrow._parquet.ParquetReader.open
(/arrow/python/build/temp.linux-x86_64-2.7/_parquet.cxx:11558)
get_reader(source, &rd_handle)File "pyarrow/io.pxi", line 798, in
pyarrow.lib.get_reader
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:58504) source =
memory_map(source, mode='r')File "pyarrow/io.pxi", line 473, in
pyarrow.lib.memory_map
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:54834)
mmap._open(path, mode)File "pyarrow/io.pxi", line 452, in
pyarrow.lib.MemoryMappedFile ._open
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:54613)
check_status(CMemoryMappedFile .Open(c_path, c_mode, &handle))File
"pyarrow/error.pxi", line 79, in pyarrow.lib.check_status
(/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:8345) raise
ArrowIOError(message) ArrowIOError: Memory mapping file failed, errno:
22



Thanks for the help.

Kind Regards
Simba

Reply via email to