andygrove commented on code in PR #1332:
URL: https://github.com/apache/datafusion-comet/pull/1332#discussion_r1934442772


##########
native/core/src/execution/operators/scan.rs:
##########
@@ -304,11 +304,7 @@ fn scan_schema(input_batch: &InputBatch, data_types: 
&[DataType]) -> SchemaRef {
                 .map(|(idx, c)| {
                     let datatype = 
ScanExec::unpack_dictionary_type(c.data_type());
                     // We don't use the field name. Put a placeholder.
-                    if matches!(datatype, DataType::Dictionary(_, _)) {
-                        Field::new_dict(format!("col_{}", idx), datatype, 
true, idx as i64, false)

Review Comment:
   There is no performance impact from this change. The original code stored a 
dictionary id in the metadata and the new code does not. This dictionary id is 
actually not used at all in FFI. It was used in Arrow IPC but is no longer used 
as of Arrow 54.0.0 because that feature is now removed and Arrow IPC manages 
its own dictionary ids. We do not use Arrow IPC now because we are using our 
own proprietary encoding.
   
   I will go ahead and run another TPC-H benchmark and post results here today, 
just to confirm there are no regressions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to