Hi all,
I am seeing flaky Python tests time to time and if I am not mistaken mostly
in amp-jenkins-worker-05:
======================================================================
ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 25, in <module>
from pandas import hashtable, tslib, lib
ImportError: cannot import name 'hashtable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
line 3057, in test_filtered_frame
pdf = df.filter("i < 0").toPandas()
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
line 1727, in toPandas
import pandas as pd
File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 31, in <module>
"the C extensions first.".format(module))
ImportError: C extension: 'hashtable' not built. If you want to import
pandas from the source directory, you may need to run 'python setup.py
build_ext --inplace --force' to build the C extensions first.
======================================================================
ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...
======================================================================
ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...
======================================================================
ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...
I sounds environment problem apparently due to missing hashtable (which I
believe should have been compiled and importable properly).
I suspect few possibilities such as a bug somewhere or unsuccessful manual
build from Pandas source but I am unable to reproduce this and check this.
So, yes. This is rather my guess.
Does anyone know if this is an environment problem and how to fix this?