Deduplicate usage

Binh Nguyen Van Thu, 02 Mar 2023 10:18:44 -0800

Hi,

I am writing a pipeline and want to apply deduplication. I look at
Deduplicate transform that Beam provides and wonder about its usage. Do I
need to shuffle input collection by key before calling this transformation?
I look at its source code and it doesn’t do any shuffle so wonder how it
works when let’s say there are duplicates and the duplicated elements are
processed concurrently on multiple workers.


Thank you
-Binh

Deduplicate usage

Reply via email to