Re: [PR] chore: Prepare for DataFusion 45 (bump to DataFusion rev 5592834 + Arrow 54.0.0) [datafusion-comet]

via GitHub Wed, 29 Jan 2025 11:16:22 -0800


andygrove commented on code in PR #1332:
URL: https://github.com/apache/datafusion-comet/pull/1332#discussion_r1934442772



##########
native/core/src/execution/operators/scan.rs:
##########
@@ -304,11 +304,7 @@ fn scan_schema(input_batch: &InputBatch, data_types: 
&[DataType]) -> SchemaRef {
                 .map(|(idx, c)| {
                     let datatype = 
ScanExec::unpack_dictionary_type(c.data_type());
                     // We don't use the field name. Put a placeholder.
-                    if matches!(datatype, DataType::Dictionary(_, _)) {
-                        Field::new_dict(format!("col_{}", idx), datatype, 
true, idx as i64, false)

Review Comment:
   There is no performance impact from this change. The original code stored a 
dictionary id in the metadata and the new code does not. This dictionary id is 
actually not used at all in FFI. It was used in Arrow IPC but is no longer used 
as of Arrow 54.0.0 because that feature is now removed and Arrow IPC manages 
its own dictionary ids. We do not use Arrow IPC now because we are using our 
own proprietary encoding.
   
   I will go ahead and run another TPC-H benchmark and post results here today, 
just to confirm there are no regressions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] chore: Prepare for DataFusion 45 (bump to DataFusion rev 5592834 + Arrow 54.0.0) [datafusion-comet]

Reply via email to