andygrove commented on issue #1204:
URL: 
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573676778

   We'll still use ScanExec for shuffle reader though. The main reason for the 
initial batch scan is to determine if strings are dictionary-encoded or not. We 
then cast all batches to match the first batch (either unpacking dictionaries 
or forcing dictionary encoding). We always unpack dictionaries (in CopyExec) 
before a Sort or a Join anyway, so maybe we should just unpack them directly in 
ScanExec if there is no performance impact. I did experiment with this before 
but I do not remember what the performance impact was but I think it was small.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to