Exceptions PubsubUnbounded source ACK

2020-06-01 Thread KV 59
Hi, I have a Dataflow pipeline with PubSub UnboundedSource, The pipeline transforms the data and writes to another PubSub topic. I have a question regarding exceptions in DoFns. If I chose to ignore an exception processing an element, does it ACK the bundle? Also if I were to just throw the excep

Re: Exceptions PubsubUnbounded source ACK

2020-06-01 Thread KV 59
is a chance of data loss if you have a pipeline that needs to > checkpoint data to disk and you shut down the pipeline without draining the > data. In that case, messages may have been ack'd to pubsub and held durably > only in the checkpoint state in Dataflow, so shutting down the pip

Re: Exceptions PubsubUnbounded source ACK

2020-06-01 Thread KV 59
PI affordances that Beam > provides for exception handling. > > [0] > https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/MapElements.html#exceptionsVia-org.apache.beam.sdk.transforms.InferableFunction- > > > On Mon, Jun 1, 2020 at 1:56 PM KV 59 wrote: &g

Count based triggers and latency

2020-10-12 Thread KV 59
Hi All, I'm building a pipeline to process events as they come and do not really care about the event time and watermark. I'm more interested in not discarding the events and reducing the latency. The downstream pipeline has a stateful DoFn. I understand that the default window strategy is Global

Re: Count based triggers and latency

2020-10-12 Thread KV 59
ion for GBK is per key and window pane. Note that >> elementCountAtLeast means that the runner can buffer as many as it wants >> and can decide to offer a low latency pipeline by triggering often or >> better throughput through the use of buffering. >> >> >> >>

AvroCoder works differently in DataflowRunner

2020-10-15 Thread KV 59
Hi, I'm using AvroCoder for my custom type. If I use the AvroCoder and perform these operations Object1 -> encode -> bytes -> decode -> Object2. Object1 and Object2 are not equal. The same works locally. What could be the problem? I'm using Beam 2.23.0 , Avro 1.9.1 and Java 1.8 . Do the versions

Processing historical data

2021-08-24 Thread KV 59
Hi, I have a Beam streaming pipeline processing live data from PubSub using sliding windows on event timestamps. I want to recompute the metrics for historical data in BigQuery. What are my options? I have looked at https://stackoverflow.com/questions/56702878/how-to-use-apache-beam-to-process-hi

Re: Processing historical data

2021-08-24 Thread KV 59
in and restart so has the same problem while being more complicated. > > Thanks, > Ankur > > On Tue, Aug 24, 2021 at 7:44 AM KV 59 wrote: > >> Hi, >> >> I have a Beam streaming pipeline processing live data from PubSub using >> sliding windows on event ti

How Sliding Windows work

2021-09-28 Thread KV 59
Hi, I have a question about how windows are assigned to elements or vice versa. ( I thought I understood this all this while but I'm a little confused now) My understanding: For a sliding window configuration of size S and period P 1. For every element E, with timestamp T it is assigned to each

Re: How Sliding Windows work

2021-09-28 Thread KV 59
Adding to my question if there is no element E with a timestamp T then there is no window [T-s, T) On Tue, Sep 28, 2021 at 6:25 AM KV 59 wrote: > Hi, > > I have a question about how windows are assigned to elements or vice > versa. ( I thought I understood this all this while but

Re: How Sliding Windows work

2021-09-28 Thread KV 59
or equality (to see if two elements are in the same window) > and it knows the timestamp when the window ends. > > Back to your question - your assessment is mostly correct, except that the > windows are always being created each time. > > Reuven > > On Tue, Sep 28, 2021

Google Dataflow job drain and correctness

2021-11-03 Thread KV 59
Hi, I have a question on how the Dataflow drains a job. I have a job which reads from PubSub and uses sliding windows to compute aggregates for each window. When I update the job and the new job is not compatible, I have an option to either cancel or drain the job. I want to understand how the d