viirya commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573695955
Okay. Then seems we can get rid of first batch fetch in ScanExec and assign
the scan schema from Spark. I will make a try.
--
This is an automated message from the Apache
andygrove commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573676778
We'll still use ScanExec for shuffle reader though. The main reason for the
initial batch scan is to determine if strings are dictionary-encoded or not. We
then cast all
andygrove commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573648804
With the new Parquet POC 1 & 2, we will use ParquetExec instead of the
current ScanExec, so at leat for that case the schema will already be known and
we will no longer n
viirya commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573663473
If ScanExec will be rarely used and we would like to use ParquetExec for
most time, maybe I can just add an internal cast to ScanExec if the schema is
different. Though it m
viirya commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566745420
It seems a fundamental rule in DataFusion physical plan. In many places,
physical schema of children operator are used. So once any difference is found
between physical sche
viirya commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566732827
I will think if there is a possible way to work around the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to G
viirya commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566732654
Okay. I forgot this point. It should be the cause of many test failures in
the draft PR #1203. I think it is a good direction to go, not just for
performance but also for on
andygrove commented on issue #1204:
URL:
https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566726756
One of the challenges related to this is that we start executing `ScanExec`
during construction, as part of query planning. If we want to share one plan
per executor (whi
viirya opened a new issue, #1204:
URL: https://github.com/apache/datafusion-comet/issues/1204
### What is the problem the feature request solves?
Currently we create a native plan per task for a query. Even those tasks are
on same executor, they still have separate native plans. For t