Hi Billy,
A CoGroup does not have any freedom in its execution strategy.
It requires that both inputs are partitioned on the grouping keys and are
then performs a local sort-merge join, i.e, both inputs are sorted.
Existing partitioning or sort orders can be reused.
Since there is only one execut
We have a cogroup where sometimes we cogroup like this:
Dataset z = larger.coGroup(small).where...
The strategy is printed as hash on key and a sort asc on the other key. Which
is which? Naively, we'd want to hash larger and sort the small? Or is that
wrong?
What factors would impact the perfo