Hi, I am writing a pipeline and want to apply deduplication. I look at Deduplicate transform that Beam provides and wonder about its usage. Do I need to shuffle input collection by key before calling this transformation? I look at its source code and it doesn’t do any shuffle so wonder how it works when let’s say there are duplicates and the duplicated elements are processed concurrently on multiple workers.
Thank you -Binh