Will Jones created ARROW-17069:
----------------------------------

             Summary: [Python][R] GCSFIleSystem reports cannot resolve host on 
public buckets
                 Key: ARROW-17069
                 URL: https://issues.apache.org/jira/browse/ARROW-17069
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python, R
    Affects Versions: 8.0.0
            Reporter: Will Jones
            Assignee: Will Jones
             Fix For: 9.0.0


GCSFileSystem will return {{Couldn't resolve host name}} if you don't supply 
{{anonymous}} as the user:
{code:python}
import pyarrow.dataset as ds

# Fails:
dataset = 
ds.dataset("gs://anonymous@voltrondata-labs-datasets/taxi-data/?retry_limit_seconds=3")
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 749, in dataset
#     return _filesystem_dataset(source, **kwargs)
#   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 441, in _filesystem_dataset
#     fs, paths_or_selector = _ensure_single_source(source, filesystem)
#   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 417, in _ensure_single_source
#     raise FileNotFoundError(path)
# FileNotFoundError: voltrondata-labs-datasets/taxi-data

# This works fine:
>>> dataset = 
>>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
{code}

I would expect that we could connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to