Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Hyukjin Kwon Sat, 05 Aug 2017 00:41:36 -0700

Hi all,

I am seeing flaky Python tests time to time and if I am not mistaken mostly
in amp-jenkins-worker-05:



======================================================================
ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 25, in <module>
    from pandas import hashtable, tslib, lib
ImportError: cannot import name 'hashtable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
line 3057, in test_filtered_frame
    pdf = df.filter("i < 0").toPandas()
  File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
line 1727, in toPandas
    import pandas as pd
  File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 31, in <module>
    "the C extensions first.".format(module))
ImportError: C extension: 'hashtable' not built. If you want to import
pandas from the source directory, you may need to run 'python setup.py
build_ext --inplace --force' to build the C extensions first.

======================================================================
ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...


I sounds environment problem apparently due to missing hashtable (which I
believe should have been compiled and importable properly).

I suspect few possibilities such as a bug somewhere or unsuccessful manual
build from Pandas source but I am unable to reproduce this and check this.
So, yes. This is rather my guess.


Does anyone know if this is an environment problem and how to fix this?

Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Reply via email to