Do either of these machines have a current Hadoop installation (and is that installation in the system path)?
On Tue, Oct 20, 2020 at 9:53 AM 황세규 <gladiato...@naver.com> wrote: > > Dear Maintainer. My name is Joseph Hwang in South Korea. I need some advice > about PyArrow. > > I try to develop Hadoop File System client application with PyArrow. But some > errors are thrown from hdfs connect. > My IDE is Eclipse PyDev and I use 2 OS (Windows 10 64bit and CentOS 8 64bit) > but there are some problems on both os. > > My python codes are simple. > > > import pyarrow as pa > fs = pa.hdfs.connect(host='localhost', port=9000) > > > == Errors on Windows 10 > > Traceback (most recent call last): > File "C:\eclipse-workspace\PythonFredProj\com\aaa\fred\hdfs3-test.py", line > 14, in <module> > fs = pa.hdfs.connect(host='localhost', port=9000) > File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 208, in > connect > fs = HadoopFileSystem(host=host, port=port, user=user, > File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 38, in > __init__ > _maybe_set_hadoop_classpath() > File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 136, in > _maybe_set_hadoop_classpath > classpath = _hadoop_classpath_glob(hadoop_bin) > File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 163, in > _hadoop_classpath_glob > return subprocess.check_output(hadoop_classpath_args) > File "C:\Python-3.8.3-x64\lib\subprocess.py", line 411, in check_output > return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, > File "C:\Python-3.8.3-x64\lib\subprocess.py", line 489, in run > with Popen(*popenargs, **kwargs) as process: > File "C:\Python-3.8.3-x64\lib\subprocess.py", line 854, in __init__ > self._execute_child(args, executable, preexec_fn, close_fds, > File "C:\Python-3.8.3-x64\lib\subprocess.py", line 1307, in _execute_child > hp, ht, pid, tid = _winapi.CreateProcess(executable, args, > OSError: [WinError 193] %1 is not a valid win32 application > > I installed Visual C++ 2015. But the above errors are still thrown. > > > == Errors on CentOS 8 > > Traceback (most recent call last): > File > "/home/jhwang/eclipse-workspace/BigDataPythonTest/com/aaa/etl/hdfs3-test.py", > line 7, in <module> > fs = pa.hdfs.connect(host='localhost', port=9000) > File "/usr/python/anaconda3/lib/python3.8/site-packages/pyarrow/hdfs.py", > line 208, in connect > fs = HadoopFileSystem(host=host, port=port, user=user, > File "/usr/python/anaconda3/lib/python3.8/site-packages/pyarrow/hdfs.py", > line 40, in __init__ > self._connect(host, port, user, kerb_ticket, extra_conf) > File "pyarrow/io-hdfs.pxi", line 75, in > pyarrow.lib.HadoopFileSystem._connect > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > OSError: Unable to load libhdfs: ./libhdfs.so: cannot open shared object > file: No such file or directory > > In case of CentOS 8 , python cli does not throw these errors. Only on eclipse > IDE. I think my PyDev configuration of Eclipse > has some problems. kindly inform me how to fix these errors. Thanks in > advance.