[ https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17661435#comment-17661435 ]
Rok Mihevc commented on ARROW-4413: ----------------------------------- This issue has been migrated to [issue #20975|https://github.com/apache/arrow/issues/20975] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] pyarrow.hdfs.connect() failing > --------------------------------------- > > Key: ARROW-4413 > URL: https://issues.apache.org/jira/browse/ARROW-4413 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.0 > Environment: Python 2.7 > Hadoop distribution: Amazon 2.7.3 > Hive 2.1.1 > Spark 2.1.1 > Tez 0.8.4 > Linux 4.4.35-33.55.amzn1.x86_64 > Reporter: Bradley Grantham > Assignee: Antoine Pitrou > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}. > This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used > the same environment when testing that it still worked on {{v0.11.1}}) > > {code:java} > In [1]: import pyarrow as pa > In [2]: fs = pa.hdfs.connect() > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > <ipython-input-2-e0007ad7fa95> in <module>() > ----> 1 fs = pa.hdfs.connect() > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, > port, user, kerb_ticket, driver, extra_conf) > 205 fs = HadoopFileSystem(host=host, port=port, user=user, > 206 kerb_ticket=kerb_ticket, driver=driver, > --> 207 extra_conf=extra_conf) > 208 return fs > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, > host, port, user, kerb_ticket, driver, extra_conf) > 36 _maybe_set_hadoop_classpath() > 37 > ---> 38 self._connect(host, port, user, kerb_ticket, driver, > extra_conf) > 39 > 40 def __reduce__(self): > /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in > pyarrow.lib.HadoopFileSystem._connect() > 72 if host is not None: > 73 conf.host = tobytes(host) > ---> 74 self.host = host > 75 > 76 conf.port = port > TypeError: Expected unicode, got str > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)