Hi Aljoscha, I thought about relying on the failover mechanism to re-execute the whole graph when the cache doesn’t exist. The only concern I have is that every job that uses the cache table in the per-job cluster will have to go through the following process, job submit -> job fail because of the intermediate result doesn’t exist -> fall back to origin DAG -> job submit , which might not be ideal. One way of solving this is to let the CatalogManager probe the existence of the IntermediateResult so that the planner can decide if the cache table should be used.
Best, Xuannan On Sep 15, 2020, 3:28 PM +0800, Aljoscha Krettek <aljos...@apache.org>, wrote: > On 15.09.20 07:00, Xuannan Su wrote: > > Thanks for your comment. I agree that we should not introduce tight > > coupling with PipelineExecutor to the execution environment. With that in > > mind, to distinguish the per-job and session mode, we can introduce a new > > method, naming isPerJobModeExecutor, in the PipelineExecutorFactory or > > PipelineExecutor so that the execution environment can recognize per-job > > mode without instanced checks on the PipelineExecutor. What do you think? > > Any thoughts or suggestions are very welcome. > > I think this would just sidestep the problem. Can't we just ignore it? > With per-job mode the cache will not be there and the program will fall > back to re-execute the whole graph. That shouldn't be a problem. > > Best, > Aljoscha >