Hi Binh, The Deduplicate transform uses state api to do the de-duplication which should do the needful operations to work across multiple concurrent workers.
Thanks, Ankur On Thu, 2 Mar 2023 at 10:18, Binh Nguyen Van <binhn...@gmail.com> wrote: > Hi, > > I am writing a pipeline and want to apply deduplication. I look at > Deduplicate transform that Beam provides and wonder about its usage. Do I > need to shuffle input collection by key before calling this transformation? > I look at its source code and it doesn’t do any shuffle so wonder how it > works when let’s say there are duplicates and the duplicated elements are > processed concurrently on multiple workers. > > Thank you > -Binh >