Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
viirya commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573695955 Okay. Then seems we can get rid of first batch fetch in ScanExec and assign the scan schema from Spark. I will make a try. -- This is an automated message from the Apache

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573676778 We'll still use ScanExec for shuffle reader though. The main reason for the initial batch scan is to determine if strings are dictionary-encoded or not. We then cast all

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
andygrove commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573648804 With the new Parquet POC 1 & 2, we will use ParquetExec instead of the current ScanExec, so at leat for that case the schema will already be known and we will no longer n

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2025-01-06 Thread via GitHub
viirya commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2573663473 If ScanExec will be rarely used and we would like to use ParquetExec for most time, maybe I can just add an internal cast to ScanExec if the schema is different. Though it m

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2024-12-31 Thread via GitHub
viirya commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566745420 It seems a fundamental rule in DataFusion physical plan. In many places, physical schema of children operator are used. So once any difference is found between physical sche

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2024-12-31 Thread via GitHub
viirya commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566732827 I will think if there is a possible way to work around the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2024-12-31 Thread via GitHub
viirya commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566732654 Okay. I forgot this point. It should be the cause of many test failures in the draft PR #1203. I think it is a good direction to go, not just for performance but also for on

Re: [I] Only create one native plan for a query on an executor [datafusion-comet]

2024-12-31 Thread via GitHub
andygrove commented on issue #1204: URL: https://github.com/apache/datafusion-comet/issues/1204#issuecomment-2566726756 One of the challenges related to this is that we start executing `ScanExec` during construction, as part of query planning. If we want to share one plan per executor (whi

[I] Only create one native plan for a query on an executor [datafusion-comet]

2024-12-28 Thread via GitHub
viirya opened a new issue, #1204: URL: https://github.com/apache/datafusion-comet/issues/1204 ### What is the problem the feature request solves? Currently we create a native plan per task for a query. Even those tasks are on same executor, they still have separate native plans. For t