[jira] [Created] (ARROW-5236) hdfs.connect() is trying to load libjvm in windows

Kamaraju (JIRA) Mon, 29 Apr 2019 08:27:08 -0700

Kamaraju created ARROW-5236:
-------------------------------

             Summary: hdfs.connect() is trying to load libjvm in windows
                 Key: ARROW-5236
                 URL: https://issues.apache.org/jira/browse/ARROW-5236
             Project: Apache Arrow
          Issue Type: Bug
         Environment: Windows 7 Enterprise, pyarrow 0.13.0
            Reporter: Kamaraju



This issue was originally reported at 
[https://github.com/apache/arrow/issues/4215] . Raising a Jira as per Wes 
McKinney's request.

Summary:
 The following script
{code}
$ cat expt2.py
import pyarrow as pa
fs = pa.hdfs.connect()
{code}
tries to load libjvm in windows 7 which is not expected.
{noformat}
$ python ./expt2.py
Traceback (most recent call last):
  File "./expt2.py", line 3, in <module>
    fs = pa.hdfs.connect()
  File 
"C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
 line 183, in connect
    extra_conf=extra_conf)
  File 
"C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
 line 37, in __init__
    self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm
{noformat}
There is no libjvm file in Windows Java installation.
{noformat}
$ echo $JAVA_HOME
C:\Progra~1\Java\jdk1.8.0_141

$ find $JAVA_HOME -iname '*libjvm*'
<returns nothing.>
{noformat}
I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.

Steps to reproduce the issue (with more details):

Create the environment
{noformat}
$ cat scratch_py36_pyarrow.yml
name: scratch_py36_pyarrow
channels:
  - defaults
dependencies:
  - python=3.6.8
  - pyarrow
{noformat}
{noformat}
$ conda env create -f scratch_py36_pyarrow.yml
{noformat}
Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do 
this since the Hadoop installation that comes with MapR <[https://mapr.com/]> 
windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named 
$HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call 
fails with FileNotFoundError if this patch is not applied.
{noformat}
$ cat ~/x/patch.txt
131c131
<         hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
---
>         hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME'])

$ patch 
/c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
 ~/x/patch.txt
patching file 
/c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
{noformat}
Activate the environment
{noformat}
$ source activate scratch_py36_pyarrow
{noformat}
Sample script
{noformat}
$ cat expt2.py
import pyarrow as pa
fs = pa.hdfs.connect()
{noformat}
Execute the script
{noformat}
$ python ./expt2.py
Traceback (most recent call last):
  File "./expt2.py", line 3, in <module>
    fs = pa.hdfs.connect()
  File 
"C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
 line 183, in connect
    extra_conf=extra_conf)
  File 
"C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
 line 37, in __init__
    self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5236) hdfs.connect() is trying to load libjvm in windows

Reply via email to