Hello! We recently updated Arrow to 7.0.0 and hit some error with our old code (Details below). I wonder if there is a new way to do this with the current version?
import pyarrow import pyarrow.parquet as pq df = pd.DataFrame({"aa": [1, 2, 3], "bb": [1, 2, 3]}) uri = "gs://amp_bucket_liao/try" s3fs = # ... pq.write_to_dataset( table=pyarrow.Table.from_pandas(df=df, preserve_index=True), root_path=uri, filesystem=s3fs, partition_cols=["aa"] ) # so far it works fine. # The following gives an error, error message in the thread test_df = pq.read_table( source=uri, filesystem=s3fs ) Error: /home/tsdist/vats_deployments/modeling.env.interactive-bc9b04a0-708b-45b8-90bc-14b9ca6ee9ba/ext/public/python/pyarrow/7/0/x/dist/lib/python3.9/pyarrow/error.pxi in pyarrow.lib.check_status() 97 98 if status.IsInvalid(): ---> 99 raise ArrowInvalid(message) 100 elif status.IsIOError(): 101 # Note: OSError constructor is ArrowInvalid: GetFileInfo() yielded path 'amp_bucket_liao/try/aa=3/235add6629d44a2f8fa4ec772340b73d.parquet', which is outside base dir 'gs://amp_bucket_liao/try'