Re: Aggregate order semantics when spilling

2015-01-20 Thread Justin Uang
Hi Andrew, Thanks for your response! For our use case, we aren't actually grouping, but rather updating running aggregates. I just picked grouping because it made the example easier to write out. However, when we merge combiners, the combiners have to have data that are adjacent to each other in t

Re: Aggregate order semantics when spilling

2015-01-20 Thread Andrew Or
Hi Justin, I believe the intended semantics of groupByKey or cogroup is that the ordering *within a key *is not preserved if you spill. In fact, the test cases for the ExternalAppendOnlyMap only assert that the Set representation of the results is as expected (see this line

Aggregate order semantics when spilling

2015-01-20 Thread Justin Uang
Hi, I am trying to aggregate a key based on some timestamp, and I believe that spilling to disk is changing the order of the data fed into the combiner. I have some timeseries data that is of the form: ("key", "date", "other data") Partition 1 ("A", 2, ...) ("B", 4, ...) ("A", 1,