1 batch do not cause deadlock [sedona-db]

via GitHub Fri, 30 Jan 2026 11:39:43 -0800


zhangfengcdt commented on code in PR #558:
URL: https://github.com/apache/sedona-db/pull/558#discussion_r2747699094



##########
python/sedonadb/python/sedonadb/dataframe.py:
##########
@@ -253,10 +253,9 @@ def to_arrow_table(self, schema: Any = None) -> 
"pyarrow.Table":
         import pyarrow as pa
         import geoarrow.pyarrow  # noqa: F401
 
-        if schema is None:
-            return pa.table(self)
-        else:
-            return pa.table(self, schema=pa.schema(schema))
+        # Collects all batches into an object that exposes __arrow_c_stream__()
+        batches = self._impl.to_batches(schema)
+        return pa.table(batches)

Review Comment:
   I assume we do not need to pass schema to pa.table anymore.



##########
python/sedonadb/src/dataframe.rs:
##########
@@ -289,3 +309,51 @@ impl InternalDataFrame {
         Ok(PyCapsule::new(py, ffi_stream, Some(stream_capsule_name))?)
     }
 }
+
+#[pyclass]
+pub struct Batches {
+    schema: SchemaRef,
+    batches: Vec<RecordBatch>,
+    count: usize,
+}
+
+#[pymethods]
+impl Batches {
+    fn __len__(&self) -> usize {
+        self.count

Review Comment:
   Is this the number of rows or number of batches? Looks like __len__ is 
expected to be the later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(python/sedonadb): Ensure that Python UDFs executing with >1 batch do not cause deadlock [sedona-db]

Reply via email to