Hi everyone, It is a either to late or to early for me to think straight so please forgive me if it is something trivial. I am trying to add a test case extending SparkSessionTestCase to pyspark.ml.tests (example patch attached). If test collects data, and there is another TestCase extending extending SparkSessionTestCase executed before it, I get AttributeError due to _jsc being None:
====================================================================== ERROR: test_foo (pyspark.ml.tests.FooTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/spark/python/pyspark/ml/tests.py", line 1258, in test_foo File "/home/spark/python/pyspark/sql/dataframe.py", line 389, in collect with SCCallSiteSync(self._sc) as css: File "/home/spark/python/pyspark/traceback_utils.py", line 72, in __enter__ self._context._jsc.setCallSite(self._call_site) AttributeError: 'NoneType' object has no attribute 'setCallSite' ---------------------------------------------------------------------- If TestCase is executed alone it seems to work just fine. Can anyone reproduce this? Is there something obvious I miss here? -- Best, Maciej
diff --git a/python/pyspark/ml/tests.py b/python/pyspark/ml/tests.py index 3524160557..cc6e49d6cf 100755 --- a/python/pyspark/ml/tests.py +++ b/python/pyspark/ml/tests.py @@ -1245,6 +1245,17 @@ class ALSTest(SparkSessionTestCase): self.assertEqual(als.getFinalStorageLevel(), "DISK_ONLY") self.assertEqual(als._java_obj.getFinalStorageLevel(), "DISK_ONLY") + als.fit(df).userFactors.collect() + + +class FooTest(SparkSessionTestCase): + def test_foo(self): + df = self.spark.createDataFrame( + [(0, 0, 4.0), (0, 1, 2.0), (1, 1, 3.0), (1, 2, 4.0), (2, 1, 1.0), (2, 2, 5.0)], + ["user", "item", "rating"]) + als = ALS().setMaxIter(1).setRank(1) + als.fit(df).userFactors.collect() + class DefaultValuesTests(PySparkTestCase): """
signature.asc
Description: OpenPGP digital signature