zhengruifeng opened a new pull request, #50621: URL: https://github.com/apache/spark/pull/50621
revert https://github.com/apache/spark/pull/50615 manually test master ``` (spark_313) ➜ spark git:(master) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time'] python3 python_implementation is CPython python3 version is: Python 3.13.3 Starting test(python3): pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/75f38dd9-30c8-4122-b88f-047c33bff8ee/python3__pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state_TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time__1ec968gn.log) Running tests... ---------------------------------------------------------------------- WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). /Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamDuration to Some(1s) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamDuration". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110 warnings.warn(warn) /Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamSize to Some(123) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamSize". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110 warnings.warn(warn) /Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.authenticate.token to Some(deadbeef) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.authenticate.token". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110 warnings.warn(warn) test_transform_with_state_with_wmark_and_non_event_time (pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state.TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time) ... ERROR (4.823s) ====================================================================== ERROR [4.823s]: test_transform_with_state_with_wmark_and_non_event_time (pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state.TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/streaming/readwriter.py", line 587, in foreachBatch self._write_proto.foreach_batch.python_function.command = CloudPickleSerializer().dumps( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ func ^^^^ ) ^ File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/serializers.py", line 460, in dumps return cloudpickle.dumps(obj, pickle_protocol) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^ File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/cloudpickle/cloudpickle.py", line 1537, in dumps cp.dump(obj) ~~~~~~~^^^^^ File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/cloudpickle/cloudpickle.py", line 1303, in dump return super().dump(obj) ~~~~~~~~~~~~^^^^^ File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/cloudpickle/cloudpickle.py", line 966, in _file_reduce raise pickle.PicklingError( "Cannot pickle files that are not opened for reading: %s" % obj.mode ) _pickle.PicklingError: Cannot pickle files that are not opened for reading: w During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py", line 624, in test_transform_with_state_with_wmark_and_non_event_time self._test_transform_with_state_in_pandas_event_time( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ MinEventTimeStatefulProcessor(), check_results, "None" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py", line 561, in _test_transform_with_state_in_pandas_event_time .foreachBatch(check_results) ~~~~~~~~~~~~^^^^^^^^^^^^^^^ File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/streaming/readwriter.py", line 591, in foreachBatch raise PySparkPicklingError( ...<2 lines>... ) pyspark.errors.exceptions.base.PySparkPicklingError: [STREAMING_CONNECT_SERIALIZATION_ERROR] Cannot serialize the function `foreachBatch`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`. ---------------------------------------------------------------------- Ran 1 test in 8.169s FAILED (errors=1) Generating XML reports... Generated XML report: target/test-reports/TEST-pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state.TransformWithStateInPandasParityTests-20250417173545.xml Had test failures in pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time with python3; see logs. ``` after revert: ``` (spark_313) ➜ spark git:(test_test) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time'] python3 python_implementation is CPython python3 version is: Python 3.13.3 Starting test(python3): pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/9c5f324b-d914-4a50-bdf6-20fcfa9474c7/python3__pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state_TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time__fkrm_xmh.log) Finished test(python3): pyspark.sql.tests.connect.pandas.test_parity_pandas_transform_with_state TransformWithStateInPandasParityTests.test_transform_with_state_with_wmark_and_non_event_time (51s) Tests passed in 51 seconds ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org