Re: [PR] [SPARK-50194][SS][PYTHON] Integration of New Timer API and Initial State API with Timer [spark]

via GitHub Fri, 15 Nov 2024 16:28:24 -0800


bogao007 commented on code in PR #48838:
URL: https://github.com/apache/spark/pull/48838#discussion_r1844724391



##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -420,10 +411,27 @@ def handleInputRows(
         timer_values: TimerValues
                       Timer value for the current batch that process the input 
rows.
                       Users can get the processing or event time timestamp 
from TimerValues.
+        """
+        return iter([])

Review Comment:
   Why do we change the `...` placeholder here?



##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -420,10 +411,27 @@ def handleInputRows(
         timer_values: TimerValues
                       Timer value for the current batch that process the input 
rows.
                       Users can get the processing or event time timestamp 
from TimerValues.
+        """
+        return iter([])
+
+    def handleExpiredTimer(

Review Comment:
   Just double check that this method is not required for users to implement, 
correct?



##########
python/pyspark/sql/pandas/group_ops.py:
##########
@@ -573,7 +568,16 @@ def transformWithStateUDF(
                 
statefulProcessorApiClient.set_handle_state(StatefulProcessorHandleState.CLOSED)
                 return iter([])
 
-            result = handle_data_with_timers(statefulProcessorApiClient, key, 
inputRows)
+            if timeMode != "none":
+                batch_timestamp = 
statefulProcessorApiClient.get_batch_timestamp()
+                watermark_timestamp = 
statefulProcessorApiClient.get_watermark_timestamp()
+            else:
+                batch_timestamp = -1
+                watermark_timestamp = -1

Review Comment:
   Can we abstract this as a separate method and share in both UDFs to reduce 
redundant code?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-50194][SS][PYTHON] Integration of New Timer API and Initial State API with Timer [spark]

Reply via email to