Dear Maintainer. My name is Joseph Hwang in South Korea. I need some advice 
about PyArrow.
 
I try to develop Hadoop File System client application with PyArrow. But some 
errors are thrown from hdfs connect.
My IDE is Eclipse PyDev and I use 2 OS (Windows 10 64bit and CentOS 8 64bit) 
but there are some problems on both os. 
 
My python codes are simple.
 
 
import pyarrow as pa
fs = pa.hdfs.connect(host='localhost', port=9000)
 
 
== Errors on Windows 10
 
Traceback (most recent call last):
  File "C:\eclipse-workspace\PythonFredProj\com\aaa\fred\hdfs3-test.py", line 
14, in <module>
    fs = pa.hdfs.connect(host='localhost', port=9000)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 208, in 
connect
    fs = HadoopFileSystem(host=host, port=port, user=user,
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 38, in 
__init__
    _maybe_set_hadoop_classpath()
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 136, in 
_maybe_set_hadoop_classpath
    classpath = _hadoop_classpath_glob(hadoop_bin)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 163, in 
_hadoop_classpath_glob
    return subprocess.check_output(hadoop_classpath_args)
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 is not a valid win32 application
 
I installed Visual C++ 2015. But the above errors are still thrown.
 
 
== Errors on CentOS 8
 
Traceback (most recent call last):
  File 
"/home/jhwang/eclipse-workspace/BigDataPythonTest/com/aaa/etl/hdfs3-test.py", 
line 7, in <module>
    fs = pa.hdfs.connect(host='localhost', port=9000)
  File "/usr/python/anaconda3/lib/python3.8/site-packages/pyarrow/hdfs.py", 
line 208, in connect
    fs = HadoopFileSystem(host=host, port=port, user=user,
  File "/usr/python/anaconda3/lib/python3.8/site-packages/pyarrow/hdfs.py", 
line 40, in __init__
    self._connect(host, port, user, kerb_ticket, extra_conf)
  File "pyarrow/io-hdfs.pxi", line 75, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object file: 
No such file or directory
 
In case of CentOS 8 , python cli does not throw these errors. Only on eclipse 
IDE. I think my PyDev configuration of Eclipse
has some problems. kindly inform me how to fix these errors. Thanks in advance.

Reply via email to