On 16/05/2019 11:34, Piotr Nowojski wrote:
Luckily it seems like those four issues/proposals could be
implemented/discussed independently or in stages.
I fully agree, and believe we should split this thread. We will end up
discussing too many issues at once.
Nevertheless,
On 16/05/2019 11:34, Piotr Nowojski wrote:
1. Do we currently account state restore as “RUNNING”? If yes, this might be
incorrect from your perspective.
I don't believe we do.
The Task state is set to running on the TM once the Invokable has been
instantiated, but at that point we aren't even on the Streaming API
level and hence haven't loaded anything. AFAIK this is all done in
StreamTask#invoke which is called afterwards.
On 16/05/2019 11:34, Piotr Nowojski wrote:
2a. This might be more tricky if various Tasks are in various stages. For
example in streaming, it should be safe to assume that state of the job, is
“minimum” of it’s Tasks’ states, so Job should be accounted as RUNNING only if
all of the Tasks are either RUNNING or COMPLETED.
2b. However in batch - including DataStream jobs running against bounded data
streams, like Blink SQL - this might be more tricky, since there are ongoing
efforts to schedule part of the job graphs in stages. For example do not
schedule probe side of the join until build side is done/completed.
I have my doubts that there's anything we can/should do here. The job
state works the way it does; I'd rather not change it now tih no much
work on the scheduler going on, nor would I want metrics to report
something that is no line with what is logged.