Correct. If you ignore the error on the single element, the corresponding PubSub message will be ack'd just like everything else in the bundle. PubsubIO provides no handle for preventing acks per-message.
In practice, if you have some messages that cause errors that are not retryable, you may want to split those to a separate collection that publishes to a separate "error" topic that you can inspect and handle separately. See [0] for an intro to some basic API affordances that Beam provides for exception handling. [0] https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/MapElements.html#exceptionsVia-org.apache.beam.sdk.transforms.InferableFunction- On Mon, Jun 1, 2020 at 1:56 PM KV 59 <kvajjal...@gmail.com> wrote: > Hi Jeff, > > Thanks for the response. Yes I have a Java pipeline and yes it is a simple > transformation. While DoFns work on bundles and if a single element in the > bundle fails and we ignore the error on the single element, then the bundle > is considered still successfully processed am I correct? Then it would just > ACK everything in the bundle > > Kishore > > On Mon, Jun 1, 2020 at 10:27 AM Jeff Klukas <jklu...@mozilla.com> wrote: > >> Is this a Python or Java pipeline? >> >> I'm familiar with PubsubIO in Java, though I expect the behavior in >> Python is similar. It will ack messages at the first checkpoint step in the >> pipeline, so the behavior in your case depends on whether there is a >> GroupByKey operation happening before the exception is thrown. >> >> If there are no GroupByKey operations (so your pipeline is basically just >> transforming single messages and republishing to a new topic), then I would >> expect you are safe and messages will not be ACK'd unless they have been >> published to the output topic. >> >> And yes, if an exception is thrown the whole bundle would fail, so those >> messages would be picked up by another worker and retried. >> >> There is a chance of data loss if you have a pipeline that needs to >> checkpoint data to disk and you shut down the pipeline without draining the >> data. In that case, messages may have been ack'd to pubsub and held durably >> only in the checkpoint state in Dataflow, so shutting down the pipeline >> uncleanly would lose data. >> >> On Mon, Jun 1, 2020 at 1:09 PM KV 59 <kvajjal...@gmail.com> wrote: >> >>> Hi, >>> >>> I have a Dataflow pipeline with PubSub UnboundedSource, The pipeline >>> transforms the data and writes to another PubSub topic. I have a question >>> regarding exceptions in DoFns. If I chose to ignore an exception processing >>> an element, does it ACK the bundle? >>> >>> Also if I were to just throw the exception, my understanding is the >>> Dataflow Runner will fail the whole bundle and keeps retrying until the >>> whole bundle is successful am I correct? >>> >>> Thanks for your response >>> >>