bogao007 commented on code in PR #48838: URL: https://github.com/apache/spark/pull/48838#discussion_r1844724391
########## python/pyspark/sql/streaming/stateful_processor.py: ########## @@ -420,10 +411,27 @@ def handleInputRows( timer_values: TimerValues Timer value for the current batch that process the input rows. Users can get the processing or event time timestamp from TimerValues. + """ + return iter([]) Review Comment: Why do we change the `...` placeholder here? ########## python/pyspark/sql/streaming/stateful_processor.py: ########## @@ -420,10 +411,27 @@ def handleInputRows( timer_values: TimerValues Timer value for the current batch that process the input rows. Users can get the processing or event time timestamp from TimerValues. + """ + return iter([]) + + def handleExpiredTimer( Review Comment: Just double check that this method is not required for users to implement, correct? ########## python/pyspark/sql/pandas/group_ops.py: ########## @@ -573,7 +568,16 @@ def transformWithStateUDF( statefulProcessorApiClient.set_handle_state(StatefulProcessorHandleState.CLOSED) return iter([]) - result = handle_data_with_timers(statefulProcessorApiClient, key, inputRows) + if timeMode != "none": + batch_timestamp = statefulProcessorApiClient.get_batch_timestamp() + watermark_timestamp = statefulProcessorApiClient.get_watermark_timestamp() + else: + batch_timestamp = -1 + watermark_timestamp = -1 Review Comment: Can we abstract this as a separate method and share in both UDFs to reduce redundant code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org