Will Jones created ARROW-17069: ---------------------------------- Summary: [Python][R] GCSFIleSystem reports cannot resolve host on public buckets Key: ARROW-17069 URL: https://issues.apache.org/jira/browse/ARROW-17069 Project: Apache Arrow Issue Type: Bug Components: Python, R Affects Versions: 8.0.0 Reporter: Will Jones Assignee: Will Jones Fix For: 9.0.0
GCSFileSystem will return {{Couldn't resolve host name}} if you don't supply {{anonymous}} as the user: {code:python} import pyarrow.dataset as ds # Fails: dataset = ds.dataset("gs://anonymous@voltrondata-labs-datasets/taxi-data/?retry_limit_seconds=3") # Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 749, in dataset # return _filesystem_dataset(source, **kwargs) # File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 441, in _filesystem_dataset # fs, paths_or_selector = _ensure_single_source(source, filesystem) # File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 417, in _ensure_single_source # raise FileNotFoundError(path) # FileNotFoundError: voltrondata-labs-datasets/taxi-data # This works fine: >>> dataset = >>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3") {code} I would expect that we could connect. -- This message was sent by Atlassian Jira (v8.20.10#820010)