Hi all, I am seeing flaky Python tests time to time and if I am not mistaken mostly in amp-jenkins-worker-05:
====================================================================== ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py", line 25, in <module> from pandas import hashtable, tslib, lib ImportError: cannot import name 'hashtable' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py", line 3057, in test_filtered_frame pdf = df.filter("i < 0").toPandas() File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py", line 1727, in toPandas import pandas as pd File "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py", line 31, in <module> "the C extensions first.".format(module)) ImportError: C extension: 'hashtable' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first. ====================================================================== ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests) ---------------------------------------------------------------------- ... ====================================================================== ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests) ---------------------------------------------------------------------- ... ====================================================================== ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests) ---------------------------------------------------------------------- ... I sounds environment problem apparently due to missing hashtable (which I believe should have been compiled and importable properly). I suspect few possibilities such as a bug somewhere or unsuccessful manual build from Pandas source but I am unable to reproduce this and check this. So, yes. This is rather my guess. Does anyone know if this is an environment problem and how to fix this?