[ https://issues.apache.org/jira/browse/ARROW-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-4943: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/21450 > pyarrow.lib.HadoopFileSystem._connect failed due to TypeError > ------------------------------------------------------------- > > Key: ARROW-4943 > URL: https://issues.apache.org/jira/browse/ARROW-4943 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.1 > Environment: Kernel: 4.4.95.x86_64 > Python: 2.7.5 > Reporter: vanderliang > Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > > When run [https://github.com/uber/petastorm.git] pytorch_hello_world.py > script, it fails due to TypeError as following. > It seems that the pyarrow.lib.HadoopFileSystem._connect require unicode > argument, however, the argument input is aways a string type. So add a > unicode() convert to make sure that the argument is a unicode type. > Traceback (most recent call last): > File "pytorch_hello_world.py", line 31, in <module> > pytorch_hello_world() > File "pytorch_hello_world.py", line 25, in pytorch_hello_world > with DataLoader(make_reader(dataset_url)) as train_loader: > File "/usr/lib/python2.7/site-packages/petastorm/reader.py", line 132, in > make_reader > resolver = FilesystemResolver(dataset_url, hdfs_driver=hdfs_driver) > File "/usr/lib/python2.7/site-packages/petastorm/fs_utils.py", line 83, in > __init__ > self._filesystem = connector.connect_to_either_namenode(namenodes) > File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line > 266, in connect_to_either_namenode > return HAHdfsClient(cls, list_of_namenodes) > File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line > 224, in __init__ > self._do_connect() > File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line > 233, in _do_connect > self._connector_cls._try_next_namenode(self._index_of_nn, > self._list_of_namenodes) > File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line > 289, in _try_next_namenode > cls.hdfs_connect_namenode(urlparse('hdfs://' + str(host or 'default'))) > File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line > 250, in hdfs_connect_namenode > return pyarrow.hdfs.connect(url.hostname or 'default', url.port or 8020, > driver=driver) > File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 209, in > connect > extra_conf=extra_conf) > File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 39, in > __init__ > self._connect(host, port, user, kerb_ticket, driver, extra_conf) > File "pyarrow/io-hdfs.pxi", line 97, in pyarrow.lib.HadoopFileSystem._connect > TypeError: Expected unicode, got str -- This message was sent by Atlassian Jira (v8.20.10#820010)