milenkovicm commented on issue #17297:
URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3228944531

   Ballista will create cache partitions locally on some of the executors. 
Handling the physical part of execution is specific to the implementor, such as 
Ballista in this case.
   
   Overall flow would be something like:
   
   1. Ballista receives `LogicalPlan::Cache`
   2. `LogicalPlan::Cache` is then converted to `BallistaCacheReadExec` , this 
part is handled with `BallistaQueryPlanner` (there is a bit more to it, a job 
can be started which would create cache with `BallistaCacheWriteExec` if not 
materialised already, task will be pending until cache job creation is in 
progress, cache partitions can be re-created in case of node failures ...)  
   
   As the physical execution is tied to an external system (cache implementer) 
I believe we do not need to bring `CachePhysicalExec` to DataFusion, we just 
need to provide a `LogicalPlan::Cache`. 
   
   With the proposed solution we would be able to keep the current behaviour, 
or we can delegate cache handling to the external system if we wish. So if the 
user disables local cache `datafusion.execution.local_cache=false` it would be 
up to them to provide a query planner which would know how to handle 
`LogicalPlan::Cache`
   
   Probably it would make more sense to name 
`datafusion.execution.local_cache=false`, 
`datafusion.execution.external_cache=false`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to