kylebarron commented on code in PR #825:
URL: https://github.com/apache/datafusion-python/pull/825#discussion_r1731868278


##########
src/dataframe.rs:
##########
@@ -539,3 +579,78 @@ fn print_dataframe(py: Python, df: DataFrame) -> 
PyResult<()> {
     print.call1((result,))?;
     Ok(())
 }
+
+fn project_schema(from_schema: Schema, to_schema: Schema) -> Result<Schema, 
ArrowError> {
+    let merged_schema = Schema::try_merge(vec![from_schema, 
to_schema.clone()])?;
+
+    let project_indices: Vec<usize> = to_schema
+        .fields
+        .iter()
+        .map(|field| field.name())
+        .filter_map(|field_name| merged_schema.index_of(field_name).ok())
+        .collect();
+
+    merged_schema.project(&project_indices)
+}
+
+fn record_batch_into_schema(

Review Comment:
   Well there is `cast`. Cast works on struct arrays, so you could make a 
simple wrapper around `cast` to work on `RecordBatch` by creating a struct 
array from the record batch. [This is what I do in 
pyo3-arrow](https://github.com/kylebarron/arro3/blob/c1cdb51527cb769694a4a400a8064a5e5a47290a/pyo3-arrow/src/ffi/to_python/utils.rs#L85-L87).
   
   The main difference is that cast doesn't _also_ project. It's not clear to 
me whether the PyCapsule Interface intends to support projection or not. I 
don't think anyone has asked.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to