Hi Danny,
I created https://github.com/apache/beam/issues/23301 with PR
https://github.com/apache/beam/pull/23302. I'd be happy to review the
Python part.
Jan
On 9/20/22 13:26, Danny McCormick via dev wrote:
Coincidentally, I was looking at the same issue in Python yesterday
which also doesn't use a watermark estimator. For Python at least, the
same problem exists on Dataflow.
I agree that we should be providing a watermark estimate as part of
periodic sequences for both Java and Python. I was planning on adding
manual watermark estimation to Python once I worked through some other
issues with the transform, I'm happy to review a similar change in
Java (assuming nobody with more context has objections)
Thanks,
Danny
On Tue, Sep 20, 2022 at 5:15 AM Jan Lukavský <[email protected]> wrote:
Hi,
looking into the code of PeriodicSequence (which is used by
PeriodicImpulse) it seems it uses SDF, but does not update downstream
watermark using WatermarkEstimator in the @ProcessElement method.
That
seems to produce outputs only after the Pipeline terminates (at
least it
seems to work this way on Flink runner). Is this bug in the
PeriodicImpulse (would be fixed by adding ManualWatermarkEstimator,
possibly), or in the Flink's SDF implementation? I suspect it is the
first one, because I cannot imagine how would the runner infer the
watermark without being explicitly told by the SDF.
Should this be fixed in the PeriodicSequence?
Jan