andygrove commented on code in PR #1332: URL: https://github.com/apache/datafusion-comet/pull/1332#discussion_r1934442772
########## native/core/src/execution/operators/scan.rs: ########## @@ -304,11 +304,7 @@ fn scan_schema(input_batch: &InputBatch, data_types: &[DataType]) -> SchemaRef { .map(|(idx, c)| { let datatype = ScanExec::unpack_dictionary_type(c.data_type()); // We don't use the field name. Put a placeholder. - if matches!(datatype, DataType::Dictionary(_, _)) { - Field::new_dict(format!("col_{}", idx), datatype, true, idx as i64, false) Review Comment: There is no performance impact from this change. The original code stored a dictionary id in the metadata and the new code does not. This dictionary id is actually not used at all in FFI. It was used in Arrow IPC but is no longer used as of Arrow 54.0.0 because that feature is now removed and Arrow IPC manages its own dictionary ids. We do not use Arrow IPC now because we are using our own proprietary encoding. I will go ahead and run another TPC-H benchmark and post results here today, just to confirm there are no regressions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org