That is quite strange. The timer ordering tests were quite stable on
DirectRunner. Prior to the fix it failed consistently. Dataflow on the
other hand seems to consistently pass.
On 10/31/19 6:28 PM, Kenneth Knowles wrote:
Hmm, classical Dataflow should fail.
- all user timers in a bundle processed first:
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L353
- processed in a loop that drains the StepContext:
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L451
- the context just feeds the iterable for the current bundle (no
priority queue of newly set timers):
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java#L550
Looks like we need some more tests.
Kenn
On Thu, Oct 31, 2019 at 10:06 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hi,
just today I noticed failures on portable dataflow [1] [2].
"Classical" dataflow seems to pass.
Jan
[1] https://issues.apache.org/jira/browse/BEAM-8530
[2] https://github.com/apache/beam/pull/9951
On 10/31/19 5:29 PM, Reuven Lax wrote:
Have you seen these failures on Dataflow as well? From code
examination I would expect Dataflow to have some bugs in this
area as well (especially if a timer is set while processing a
bundle). If the tests are passing on Dataflow this might mean
that we need different tests (or it might mean that Dataflow is
"working" for some mysterious reason that is not obvious from the
code :) ).
On Wed, Oct 23, 2019 at 2:54 AM Jan Lukavský <je...@seznam.cz
<mailto:je...@seznam.cz>> wrote:
Hi,
as part of [1] a new set of validatesRunner tests has been
introduced.
These tests (currently marked as category
UsesStrictTimerOrdering)
verify that runners fire timers in increasing timestamp under
all
circumstances. After adding these validatesRunner tests,
Samza [2] and
Portable Flink [3] started to fail these tests. I have
created the
tracking issues for that, because that behavior should be
fixed (timers
in wrong order can cause erratic behavior and/or data loss).
I'm writing to anyone interested in solving these issues.
Cheers,
Jan
[1] https://issues.apache.org/jira/browse/BEAM-7520
[2] https://issues.apache.org/jira/browse/BEAM-8459
[3] https://issues.apache.org/jira/browse/BEAM-8460