I think that roughly, an approach like the compacting hash table is the
right one.
Go ahead and take a stab at it, if you want, ping us if you run into
obstacles.

Here are a few thoughts on the hash-aggregator from discussions between
Fabian and me:

1) It may be worth to have a specialized implementations for aggregates of
constant-length values. Such as counts, or sum/max/min of types like
int/long/double. The value can be updated in place, there is no need ever
for compaction logic. The compacting logic is in the end more tricky than
it seems at the first glance, it took quite a few cycles to eliminate most
bugs.

2) I would try to tailor the hash aggregator to a "combiner" functionality
initially. That means no spilling on memory shortage, but eviction of
pre-aggregated results into the output stream. That is probably easier to
do and the most powerful improvement over the current capabilities.

Happy coding!

Greetings,
Stephan




On Thu, Oct 1, 2015 at 8:33 PM, Gábor Gévay <gga...@gmail.com> wrote:

> Hello,
>
> I would really like to see FLINK-2237 solved.
> I would implement this feature over the weekend, if the
> CompactingHashTable can be used to solve it (see my comment there).
> Could you please give me some advice on whether is this a viable
> approach, or you perhaps see some difficulties that I'm not aware of?
>
> Best,
> Gabor
>

Reply via email to