[ https://issues.apache.org/jira/browse/ARROW-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662179#comment-17662179 ]
Rok Mihevc commented on ARROW-5156: ----------------------------------- This issue has been migrated to [issue #21635|https://github.com/apache/arrow/issues/21635] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] `df.to_parquet('s3://...', partition_cols=...)` fails with > `'NoneType' object has no attribute '_isfilestore'` > ----------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-5156 > URL: https://issues.apache.org/jira/browse/ARROW-5156 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.1 > Environment: Mac, Linux > Reporter: Victor Shih > Priority: Major > Labels: parquet > > According to > [https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files], > writing a parquet to S3 with `partition_cols` should work, but it fails for > me. Example script: > {code:java} > import pandas as pd > import sys > print(sys.version) > print(pd._version_) > df = pd.DataFrame([{'a': 1, 'b': 2}]) > df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow') > print('OK 1') > df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], > engine='pyarrow') > print('OK 2') > {code} > Output: > {noformat} > 3.5.2 (default, Feb 14 2019, 01:46:27) > [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] > 0.24.2 > OK 1 > Traceback (most recent call last): > File "./t.py", line 14, in <module> > df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], > engine='pyarrow') > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", > line 2203, in to_parquet > partition_cols=partition_cols, **kwargs) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", > line 252, in to_parquet > partition_cols=partition_cols, **kwargs) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", > line 118, in write > partition_cols=partition_cols, **kwargs) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", > line 1227, in write_to_dataset > _mkdir_if_not_exists(fs, root_path) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", > line 1182, in _mkdir_if_not_exists > if fs._isfilestore() and not fs.exists(path): > AttributeError: 'NoneType' object has no attribute '_isfilestore' > {noformat} > > Original issue - [https://github.com/apache/arrow/issues/4030] -- This message was sent by Atlassian Jira (v8.20.10#820010)