[ https://issues.apache.org/jira/browse/ARROW-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662071#comment-17662071 ]
Rok Mihevc commented on ARROW-5049: ----------------------------------- This issue has been migrated to [issue #21543|https://github.com/apache/arrow/issues/21543] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow > FileSystem used in spark > ---------------------------------------------------------------------------------------------- > > Key: ARROW-5049 > URL: https://issues.apache.org/jira/browse/ARROW-5049 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.0, 0.12.1, 0.13.0 > Reporter: Tiger068 > Assignee: Tiger068 > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 3h > Remaining Estimate: 0h > > when i init pyarrow filesystem to connect hdfs clusfter in spark,the libhdfs > throws error: > {code:java} > org/apache/hadoop/fs/FileSystem class not found > {code} > I print out the CLASSPATH, the classpath value is wildcard mode > {code:java} > ../share/hadoop/hdfs;spark/spark-2.0.2-bin-hadoop2.7/jars... > {code} > The value is set by spark,but libhdfs must load class from jar files. > > Root cause is: > In hdfs.py we just check the string ''hadoop" in classpath,but not jar file > {code:java} > def _maybe_set_hadoop_classpath(): > if 'hadoop' in os.environ.get('CLASSPATH', ''): > return{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)