Hi everybody, i need to execute a cogroup on sorted groups. I explain it better: I have two datasets i.e. (key, value), I want to cogroup on key and then the have both iterator sorted by value how can i get it? I know iterator should be collected to be sorted but i want to avoid it. what happens if i partition datasets separately by key, then sort partition and finally cogroup by key? can I assume they keep the order on key?
which is the drawback in doing this? I expect to have two data shuffling one partition and one for cogroup thanks Best michele