Hello, I'm working on a POC project with Apache Beam. The rough pipeline reads from a checkout Kafka topic, and generate hourly summary data on different dimensions. I suppose a Fixed Time Window, with Time-Based Trigger could handle the case. -EventTime is the checkout timestamp.
However, when the job, or the source is down for some time, like several hours, it would have problems to run the recovery. Data will be dropped, unless I set a large value for withAllowedLateness, large allowedLateness+ accumulatingFiredPanes also leads to lots of pane data in memory. Is this the right way to handle a recovery scenario? Appreciate for any suggestion. Thank you! Mingmin