Hi Binh, The Deduplicate transform uses state api to do the de-duplication
which should do the needful operations to work across multiple concurrent
workers.

Thanks,
Ankur

On Thu, 2 Mar 2023 at 10:18, Binh Nguyen Van <binhn...@gmail.com> wrote:

> Hi,
>
> I am writing a pipeline and want to apply deduplication. I look at
> Deduplicate transform that Beam provides and wonder about its usage. Do I
> need to shuffle input collection by key before calling this transformation?
> I look at its source code and it doesn’t do any shuffle so wonder how it
> works when let’s say there are duplicates and the duplicated elements are
> processed concurrently on multiple workers.
>
> Thank you
> -Binh
>

Reply via email to