Sam Whittle created BEAM-11400:
----------------------------------
Summary: StreamingDataflowWorker stuck commits logic triggers
exceptions if commits eventually complete
Key: BEAM-11400
URL: https://issues.apache.org/jira/browse/BEAM-11400
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Reporter: Sam Whittle
Assignee: Sam Whittle
Commits that have not completed in a timeout are cancelled as stuck and lost,
in logs showing up as:
Detected key with sharding key -6893288510319386341 stuck in COMMITTING state,
completing it with error.
However if the commit was not lost but just very slow, when it eventually does
complete the following error occurs:
Exception while processing commit response {}
"java.lang.NullPointerException
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877)
at
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$ComputationState.completeWork(StreamingDataflowWorker.java:2246)
This occurs on the commit stream which finishes processing the current batch of
responses but then throws the error. This causes the stream to complete with
an error, resending all of the other commits. So if there were a large number
of commits on the stream, we make slow progress and only complete a couple
before retrying everything again. This slowdown can cause further commits to
exceed the timeout, entering a feedback loop.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)