andygrove commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573676778
We'll still use ScanExec for shuffle reader though. The main reason for the initial batch scan is to determine if strings are dictionary-encoded or not. We then cast all batches to match the first batch (either unpacking dictionaries or forcing dictionary encoding). We always unpack dictionaries (in CopyExec) before a Sort or a Join anyway, so maybe we should just unpack them directly in ScanExec if there is no performance impact. I did experiment with this before but I do not remember what the performance impact was but I think it was small. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org