[ 
https://issues.apache.org/jira/browse/ARROW-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Jones updated ARROW-17069:
-------------------------------
    Description: 
GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply 
{{anonymous}} as the user:
{code:python}
import pyarrow.dataset as ds

# Fails:
dataset = 
ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 749, in dataset
    return _filesystem_dataset(source, **kwargs)
  File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 441, in _filesystem_dataset
    fs, paths_or_selector = _ensure_single_source(source, filesystem)
  File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 408, in _ensure_single_source
    file_info = filesystem.get_file_info(path)
  File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info
    info = GetResultValue(self.fs.GetFileInfo(path))
  File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
    return check_status(status)
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
    raise IOError(message)
OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in 
GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name)

# This works fine:
>>> dataset = 
>>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
{code}

I would expect that we could connect.

  was:
GCSFileSystem will return {{Couldn't resolve host name}} if you don't supply 
{{anonymous}} as the user:
{code:python}
import pyarrow.dataset as ds

# Fails:
dataset = 
ds.dataset("gs://anonymous@voltrondata-labs-datasets/taxi-data/?retry_limit_seconds=3")
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 749, in dataset
#     return _filesystem_dataset(source, **kwargs)
#   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 441, in _filesystem_dataset
#     fs, paths_or_selector = _ensure_single_source(source, filesystem)
#   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
line 417, in _ensure_single_source
#     raise FileNotFoundError(path)
# FileNotFoundError: voltrondata-labs-datasets/taxi-data

# This works fine:
>>> dataset = 
>>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
{code}

I would expect that we could connect.


> [Python][R] GCSFIleSystem reports cannot resolve host on public buckets
> -----------------------------------------------------------------------
>
>                 Key: ARROW-17069
>                 URL: https://issues.apache.org/jira/browse/ARROW-17069
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>    Affects Versions: 8.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Critical
>             Fix For: 9.0.0
>
>
> GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply 
> {{anonymous}} as the user:
> {code:python}
> import pyarrow.dataset as ds
> # Fails:
> dataset = 
> ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 749, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 441, in _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 408, in _ensure_single_source
>     file_info = filesystem.get_file_info(path)
>   File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info
>     info = GetResultValue(self.fs.GetFileInfo(path))
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>     return check_status(status)
>   File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
>     raise IOError(message)
> OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in 
> GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name)
> # This works fine:
> >>> dataset = 
> >>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> {code}
> I would expect that we could connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to