bogao007 commented on code in PR #49277:
URL: https://github.com/apache/spark/pull/49277#discussion_r1918936950


##########
python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py:
##########
@@ -1698,6 +1876,173 @@ def init(self, handle: StatefulProcessorHandle) -> None:
         self.list_state = handle.getListState("listState", key_schema)
 
 
+class BasicProcessor(StatefulProcessor):
+    # Schema definitions
+    state_schema = StructType(
+        [StructField("id", IntegerType(), True), StructField("name", 
StringType(), True)]
+    )
+
+    def init(self, handle):
+
+        self.state = handle.getValueState("state", self.state_schema)
+
+    def handleInputRows(self, key, rows, timer_values) -> 
Iterator[pd.DataFrame]:
+        for pdf in rows:

Review Comment:
   > @bogao007 Do we break the inputs of the same key into multiple Arrow 
batches like we do with applyInPandasWithState? I roughly remember we do, but 
to double confirm.
   
   @HeartSaVioR Yes we do break large input into multiple Arrow batches, same 
as what we do in applyInPandasWithState.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to