Hi Li Jin,

I'm not sure yet what changed, but I believe you can fix that error simply
by omitting the scheme prefix from the URI and just use the page when
loading the dataset. Here's my repro:

import pyarrow as pa
import pyarrow.dataset as ds
from pyarrow.fs import S3FileSystem

s3fs = S3FileSystem(
    endpoint_override="https://storage.googleapis.com";,
    anonymous=True
)

uri = "gs://voltrondata-labs-datasets/nyc-taxi"

# This works
ds.dataset(uri[5:], filesystem=s3fs)

# With prefix causes error
ds.dataset(uri, filesystem=s3fs)
# ArrowInvalid: Expected an S3 object path of the form 'bucket/key...', got
a URI: 'gs://voltrondata-labs-datasets/nyc-taxi'

Best,

Will Jones

On Mon, Aug 1, 2022 at 9:00 AM Li Jin <ice.xell...@gmail.com> wrote:

> Hello!
>
> We recently updated Arrow to 7.0.0 and hit some error with our old code
> (Details below). I wonder if there is a new way to do this with the current
> version?
>
> import pyarrow
>
> import pyarrow.parquet as pq
>
>
>
> df = pd.DataFrame({"aa": [1, 2, 3], "bb": [1, 2, 3]})
>
> uri = "gs://amp_bucket_liao/try"
>
> s3fs = # ...
>
>
>
> pq.write_to_dataset(
>
>     table=pyarrow.Table.from_pandas(df=df, preserve_index=True),
>
>     root_path=uri, filesystem=s3fs, partition_cols=["aa"]
>
> )
>
> # so far it works fine.
>
>
>
> # The following gives an error, error message in the thread
>
> test_df = pq.read_table(
>
>     source=uri, filesystem=s3fs
>
> )
>
>
>
> Error:
>
>
> /home/tsdist/vats_deployments/modeling.env.interactive-bc9b04a0-708b-45b8-90bc-14b9ca6ee9ba/ext/public/python/pyarrow/7/0/x/dist/lib/python3.9/pyarrow/error.pxi
> in pyarrow.lib.check_status()
>
>      97
>
>      98         if status.IsInvalid():
>
> ---> 99             raise ArrowInvalid(message)
>
>     100         elif status.IsIOError():
>
>     101             # Note: OSError constructor is
>
>
>
> ArrowInvalid: GetFileInfo() yielded path
> 'amp_bucket_liao/try/aa=3/235add6629d44a2f8fa4ec772340b73d.parquet',
> which is outside base dir 'gs://amp_bucket_liao/try'
>

Reply via email to