HeartSaVioR commented on code in PR #50615: URL: https://github.com/apache/spark/pull/50615#discussion_r2048618906
########## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ########## @@ -612,11 +612,14 @@ def check_results(batch_df, batch_id): assert set(batch_df.sort("id").collect()) == { Row(id="a", timestamp="4"), } - else: + elif batch_id == 2: # watermark for late event = 10 and min event = 2 with no filtering assert set(batch_df.sort("id").collect()) == { Row(id="a", timestamp="2"), } + else: + for q in self.spark.streams.active: Review Comment: @zhengruifeng Since you are looking into this, could you please give a try with replacing `self.spark` to `batch_df.sparkSession`? It seems like the serialization issue is due to reference of outer spark session. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org