Is this a Python or Java pipeline? I'm familiar with PubsubIO in Java, though I expect the behavior in Python is similar. It will ack messages at the first checkpoint step in the pipeline, so the behavior in your case depends on whether there is a GroupByKey operation happening before the exception is thrown.
If there are no GroupByKey operations (so your pipeline is basically just transforming single messages and republishing to a new topic), then I would expect you are safe and messages will not be ACK'd unless they have been published to the output topic. And yes, if an exception is thrown the whole bundle would fail, so those messages would be picked up by another worker and retried. There is a chance of data loss if you have a pipeline that needs to checkpoint data to disk and you shut down the pipeline without draining the data. In that case, messages may have been ack'd to pubsub and held durably only in the checkpoint state in Dataflow, so shutting down the pipeline uncleanly would lose data. On Mon, Jun 1, 2020 at 1:09 PM KV 59 <kvajjal...@gmail.com> wrote: > Hi, > > I have a Dataflow pipeline with PubSub UnboundedSource, The pipeline > transforms the data and writes to another PubSub topic. I have a question > regarding exceptions in DoFns. If I chose to ignore an exception processing > an element, does it ACK the bundle? > > Also if I were to just throw the exception, my understanding is the > Dataflow Runner will fail the whole bundle and keeps retrying until the > whole bundle is successful am I correct? > > Thanks for your response >