[ 
https://issues.apache.org/jira/browse/ARROW-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-4943:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/21450

> pyarrow.lib.HadoopFileSystem._connect failed due to TypeError
> -------------------------------------------------------------
>
>                 Key: ARROW-4943
>                 URL: https://issues.apache.org/jira/browse/ARROW-4943
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.1
>         Environment: Kernel: 4.4.95.x86_64
> Python: 2.7.5
>            Reporter: vanderliang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>  
> When run [https://github.com/uber/petastorm.git] pytorch_hello_world.py 
> script, it fails due to TypeError as following.
> It seems that the pyarrow.lib.HadoopFileSystem._connect require unicode 
> argument, however, the argument input is aways a string type. So add a 
> unicode() convert to make sure that the argument is a unicode type.
> Traceback (most recent call last):
>  File "pytorch_hello_world.py", line 31, in <module>
>  pytorch_hello_world()
>  File "pytorch_hello_world.py", line 25, in pytorch_hello_world
>  with DataLoader(make_reader(dataset_url)) as train_loader:
>  File "/usr/lib/python2.7/site-packages/petastorm/reader.py", line 132, in 
> make_reader
>  resolver = FilesystemResolver(dataset_url, hdfs_driver=hdfs_driver)
>  File "/usr/lib/python2.7/site-packages/petastorm/fs_utils.py", line 83, in 
> __init__
>  self._filesystem = connector.connect_to_either_namenode(namenodes)
>  File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 
> 266, in connect_to_either_namenode
>  return HAHdfsClient(cls, list_of_namenodes)
>  File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 
> 224, in __init__
>  self._do_connect()
>  File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 
> 233, in _do_connect
>  self._connector_cls._try_next_namenode(self._index_of_nn, 
> self._list_of_namenodes)
>  File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 
> 289, in _try_next_namenode
>  cls.hdfs_connect_namenode(urlparse('hdfs://' + str(host or 'default')))
>  File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line 
> 250, in hdfs_connect_namenode
>  return pyarrow.hdfs.connect(url.hostname or 'default', url.port or 8020, 
> driver=driver)
>  File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 209, in 
> connect
>  extra_conf=extra_conf)
>  File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 39, in 
> __init__
>  self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>  File "pyarrow/io-hdfs.pxi", line 97, in pyarrow.lib.HadoopFileSystem._connect
> TypeError: Expected unicode, got str



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to