We have a cogroup where sometimes we cogroup like this:

Dataset z = larger.coGroup(small).where...

The strategy is printed as hash on key and a sort asc on the other key. Which 
is which? Naively, we'd want to hash larger and sort the small? Or is that 
wrong?

What factors would impact the performance of the cogroup? We use cogroup to 
calculate a new set of records for a key from the previous calculated set with 
some modifications from (small). We're temporally milestoning records using 
cogroup btw, that's the use case.


Thanks



Billy Newport
Data Architecture, Goldman, Sachs & Co.
30 Hudson | 37th Floor | Jersey City, NJ
Tel:  +1 (212) 8557773 |  Cell:  +1 (507) 254-0134
Email: billy.newp...@gs.com<mailto:edward.new...@gs.com>, KD2DKQ

Reply via email to