[ https://issues.apache.org/jira/browse/ARROW-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662259#comment-17662259 ]
Rok Mihevc commented on ARROW-5236: ----------------------------------- This issue has been migrated to [issue #16719|https://github.com/apache/arrow/issues/16719] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] hdfs.connect() is trying to load libjvm in windows > ----------------------------------------------------------- > > Key: ARROW-5236 > URL: https://issues.apache.org/jira/browse/ARROW-5236 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Environment: Windows 7 Enterprise, pyarrow 0.13.0 > Reporter: Kamaraju > Priority: Major > Labels: hdfs > > This issue was originally reported at > [https://github.com/apache/arrow/issues/4215] . Raising a Jira as per Wes > McKinney's request. > Summary: > The following script > {code} > $ cat expt2.py > import pyarrow as pa > fs = pa.hdfs.connect() > {code} > tries to load libjvm in windows 7 which is not expected. > {noformat} > $ python ./expt2.py > Traceback (most recent call last): > File "./expt2.py", line 3, in <module> > fs = pa.hdfs.connect() > File > "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", > line 183, in connect > extra_conf=extra_conf) > File > "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", > line 37, in __init__ > self._connect(host, port, user, kerb_ticket, driver, extra_conf) > File "pyarrow\io-hdfs.pxi", line 89, in > pyarrow.lib.HadoopFileSystem._connect > File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status > pyarrow.lib.ArrowIOError: Unable to load libjvm > {noformat} > There is no libjvm file in Windows Java installation. > {noformat} > $ echo $JAVA_HOME > C:\Progra~1\Java\jdk1.8.0_141 > $ find $JAVA_HOME -iname '*libjvm*' > <returns nothing.> > {noformat} > I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow. > Steps to reproduce the issue (with more details): > Create the environment > {noformat} > $ cat scratch_py36_pyarrow.yml > name: scratch_py36_pyarrow > channels: > - defaults > dependencies: > - python=3.6.8 > - pyarrow > {noformat} > {noformat} > $ conda env create -f scratch_py36_pyarrow.yml > {noformat} > Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do > this since the Hadoop installation that comes with MapR <[https://mapr.com/]> > windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named > $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call > fails with FileNotFoundError if this patch is not applied. > {noformat} > $ cat ~/x/patch.txt > 131c131 > < hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME']) > --- > > hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME']) > $ patch > /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py > ~/x/patch.txt > patching file > /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py > {noformat} > Activate the environment > {noformat} > $ source activate scratch_py36_pyarrow > {noformat} > Sample script > {noformat} > $ cat expt2.py > import pyarrow as pa > fs = pa.hdfs.connect() > {noformat} > Execute the script > {noformat} > $ python ./expt2.py > Traceback (most recent call last): > File "./expt2.py", line 3, in <module> > fs = pa.hdfs.connect() > File > "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", > line 183, in connect > extra_conf=extra_conf) > File > "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py", > line 37, in __init__ > self._connect(host, port, user, kerb_ticket, driver, extra_conf) > File "pyarrow\io-hdfs.pxi", line 89, in > pyarrow.lib.HadoopFileSystem._connect > File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status > pyarrow.lib.ArrowIOError: Unable to load libjvm > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)