Re: [PR] [SPARK-51758][SS][FOLLOWUP][TESTS] Fix flaky test around watermark due to additional batch causing empty df [spark]

via GitHub Thu, 17 Apr 2025 02:49:30 -0700


HeartSaVioR commented on code in PR #50615:
URL: https://github.com/apache/spark/pull/50615#discussion_r2048618906



##########
python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py:
##########
@@ -612,11 +612,14 @@ def check_results(batch_df, batch_id):
                 assert set(batch_df.sort("id").collect()) == {
                     Row(id="a", timestamp="4"),
                 }
-            else:
+            elif batch_id == 2:
                 # watermark for late event = 10 and min event = 2 with no 
filtering
                 assert set(batch_df.sort("id").collect()) == {
                     Row(id="a", timestamp="2"),
                 }
+            else:
+                for q in self.spark.streams.active:

Review Comment:
   @zhengruifeng Since you are looking into this, could you please give a try 
with replacing `self.spark` to `batch_df.sparkSession`? It seems like the 
serialization issue is due to reference of outer spark session.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51758][SS][FOLLOWUP][TESTS] Fix flaky test around watermark due to additional batch causing empty df [spark]

Reply via email to