bogao007 commented on code in PR #49277: URL: https://github.com/apache/spark/pull/49277#discussion_r1918936950
########## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ########## @@ -1698,6 +1876,173 @@ def init(self, handle: StatefulProcessorHandle) -> None: self.list_state = handle.getListState("listState", key_schema) +class BasicProcessor(StatefulProcessor): + # Schema definitions + state_schema = StructType( + [StructField("id", IntegerType(), True), StructField("name", StringType(), True)] + ) + + def init(self, handle): + + self.state = handle.getValueState("state", self.state_schema) + + def handleInputRows(self, key, rows, timer_values) -> Iterator[pd.DataFrame]: + for pdf in rows: Review Comment: > @bogao007 Do we break the inputs of the same key into multiple Arrow batches like we do with applyInPandasWithState? I roughly remember we do, but to double confirm. @HeartSaVioR Yes we do break large input into multiple Arrow batches, same as what we do in applyInPandasWithState. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org