That is quite strange. The timer ordering tests were quite stable on DirectRunner. Prior to the fix it failed consistently. Dataflow on the other hand seems to consistently pass.

On 10/31/19 6:28 PM, Kenneth Knowles wrote:
Hmm, classical Dataflow should fail.

 - all user timers in a bundle processed first: https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L353  - processed in a loop that drains the StepContext: https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L451  - the context just feeds the iterable for the current bundle (no priority queue of newly set timers): https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java#L550

Looks like we need some more tests.

Kenn

On Thu, Oct 31, 2019 at 10:06 AM Jan Lukavský <je...@seznam.cz <mailto:je...@seznam.cz>> wrote:

    Hi,

    just today I noticed failures on portable dataflow [1] [2].
    "Classical" dataflow seems to pass.

    Jan

    [1] https://issues.apache.org/jira/browse/BEAM-8530

    [2] https://github.com/apache/beam/pull/9951

    On 10/31/19 5:29 PM, Reuven Lax wrote:
    Have you seen these failures on Dataflow as well? From code
    examination I would expect Dataflow to have some bugs in this
    area as well (especially if a timer is set while processing a
    bundle). If the tests are passing on Dataflow this might mean
    that we need different tests (or it might mean that Dataflow is
    "working" for some mysterious reason that is not obvious from the
    code :) ).

    On Wed, Oct 23, 2019 at 2:54 AM Jan Lukavský <je...@seznam.cz
    <mailto:je...@seznam.cz>> wrote:

        Hi,

        as part of [1] a new set of validatesRunner tests has been
        introduced.
        These tests (currently marked as category
        UsesStrictTimerOrdering)
        verify that runners fire timers in increasing timestamp under
        all
        circumstances. After adding these validatesRunner tests,
        Samza [2] and
        Portable Flink [3] started to fail these tests. I have
        created the
        tracking issues for that, because that behavior should be
        fixed (timers
        in wrong order can cause erratic behavior and/or data loss).

        I'm writing to anyone interested in solving these issues.

        Cheers,

          Jan

        [1] https://issues.apache.org/jira/browse/BEAM-7520

        [2] https://issues.apache.org/jira/browse/BEAM-8459

        [3] https://issues.apache.org/jira/browse/BEAM-8460

Reply via email to