Correct. If you ignore the error on the single element, the corresponding
PubSub message will be ack'd just like everything else in the bundle.
PubsubIO provides no handle for preventing acks per-message.

In practice, if you have some messages that cause errors that are not
retryable, you may want to split those to a separate collection that
publishes to a separate "error" topic that you can inspect and handle
separately. See [0] for an intro to some basic API affordances that Beam
provides for exception handling.

[0]
https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/transforms/MapElements.html#exceptionsVia-org.apache.beam.sdk.transforms.InferableFunction-


On Mon, Jun 1, 2020 at 1:56 PM KV 59 <kvajjal...@gmail.com> wrote:

> Hi Jeff,
>
> Thanks for the response. Yes I have a Java pipeline and yes it is a simple
> transformation. While DoFns work on bundles and if a single element in the
> bundle fails and we ignore the error on the single element, then the bundle
> is considered still successfully processed am I correct? Then it would just
> ACK everything in the bundle
>
> Kishore
>
> On Mon, Jun 1, 2020 at 10:27 AM Jeff Klukas <jklu...@mozilla.com> wrote:
>
>> Is this a Python or Java pipeline?
>>
>> I'm familiar with PubsubIO in Java, though I expect the behavior in
>> Python is similar. It will ack messages at the first checkpoint step in the
>> pipeline, so the behavior in your case depends on whether there is a
>> GroupByKey operation happening before the exception is thrown.
>>
>> If there are no GroupByKey operations (so your pipeline is basically just
>> transforming single messages and republishing to a new topic), then I would
>> expect you are safe and messages will not be ACK'd unless they have been
>> published to the output topic.
>>
>> And yes, if an exception is thrown the whole bundle would fail, so those
>> messages would be picked up by another worker and retried.
>>
>> There is a chance of data loss if you have a pipeline that needs to
>> checkpoint data to disk and you shut down the pipeline without draining the
>> data. In that case, messages may have been ack'd to pubsub and held durably
>> only in the checkpoint state in Dataflow, so shutting down the pipeline
>> uncleanly would lose data.
>>
>> On Mon, Jun 1, 2020 at 1:09 PM KV 59 <kvajjal...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a Dataflow pipeline with PubSub UnboundedSource, The pipeline
>>> transforms the data and writes to another PubSub topic. I have a question
>>> regarding exceptions in DoFns. If I chose to ignore an exception processing
>>> an element, does it ACK the bundle?
>>>
>>> Also if I were to just throw the exception, my understanding is the
>>> Dataflow Runner will fail the whole bundle and keeps retrying until the
>>> whole bundle is successful am I correct?
>>>
>>> Thanks for your response
>>>
>>

Reply via email to