Hello!

We recently updated Arrow to 7.0.0 and hit some error with our old code
(Details below). I wonder if there is a new way to do this with the current
version?

import pyarrow

import pyarrow.parquet as pq



df = pd.DataFrame({"aa": [1, 2, 3], "bb": [1, 2, 3]})

uri = "gs://amp_bucket_liao/try"

s3fs = # ...



pq.write_to_dataset(

    table=pyarrow.Table.from_pandas(df=df, preserve_index=True),

    root_path=uri, filesystem=s3fs, partition_cols=["aa"]

)

# so far it works fine.



# The following gives an error, error message in the thread

test_df = pq.read_table(

    source=uri, filesystem=s3fs

)



Error:

/home/tsdist/vats_deployments/modeling.env.interactive-bc9b04a0-708b-45b8-90bc-14b9ca6ee9ba/ext/public/python/pyarrow/7/0/x/dist/lib/python3.9/pyarrow/error.pxi
in pyarrow.lib.check_status()

     97

     98         if status.IsInvalid():

---> 99             raise ArrowInvalid(message)

    100         elif status.IsIOError():

    101             # Note: OSError constructor is



ArrowInvalid: GetFileInfo() yielded path
'amp_bucket_liao/try/aa=3/235add6629d44a2f8fa4ec772340b73d.parquet',
which is outside base dir 'gs://amp_bucket_liao/try'

Reply via email to