Hi Aljoscha,

I thought about relying on the failover mechanism to re-execute the whole graph 
when the cache doesn’t exist. The only concern I have is that every job that 
uses the cache table in the per-job cluster will have to go through the 
following process,
job submit -> job fail because of the intermediate result doesn’t exist -> fall 
back to origin DAG -> job submit
, which might not be ideal.
One way of solving this is to let the CatalogManager probe the existence of the 
IntermediateResult so that the planner can decide if the cache table should be 
used.

Best,
Xuannan
On Sep 15, 2020, 3:28 PM +0800, Aljoscha Krettek <aljos...@apache.org>, wrote:
> On 15.09.20 07:00, Xuannan Su wrote:
> > Thanks for your comment. I agree that we should not introduce tight 
> > coupling with PipelineExecutor to the execution environment. With that in 
> > mind, to distinguish the per-job and session mode, we can introduce a new 
> > method, naming isPerJobModeExecutor, in the PipelineExecutorFactory or 
> > PipelineExecutor so that the execution environment can recognize per-job 
> > mode without instanced checks on the PipelineExecutor. What do you think? 
> > Any thoughts or suggestions are very welcome.
>
> I think this would just sidestep the problem. Can't we just ignore it?
> With per-job mode the cache will not be there and the program will fall
> back to re-execute the whole graph. That shouldn't be a problem.
>
> Best,
> Aljoscha
>

Reply via email to