Sam Whittle created BEAM-11400:
----------------------------------

             Summary: StreamingDataflowWorker stuck commits logic triggers 
exceptions if commits eventually complete
                 Key: BEAM-11400
                 URL: https://issues.apache.org/jira/browse/BEAM-11400
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Sam Whittle
            Assignee: Sam Whittle


Commits that have not completed in a timeout are cancelled as stuck and lost, 
in logs showing up as:
Detected key with sharding key -6893288510319386341 stuck in COMMITTING state, 
completing it with error.

However if the commit was not  lost but just very slow, when it eventually does 
complete the following error occurs:

Exception while processing commit response {}
"java.lang.NullPointerException
        at 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877)
        at 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$ComputationState.completeWork(StreamingDataflowWorker.java:2246)

This occurs on the commit stream which finishes processing the current batch of 
responses but then throws the error.  This causes the stream to complete with 
an error, resending all of the other commits.  So if there were a large number 
of commits on the stream, we make  slow progress and only complete a couple 
before retrying everything again.  This slowdown can cause further commits to 
exceed the timeout, entering a feedback loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to