Re: Time-based frequency table at scale

2020-03-11 Thread Nicolas Paris
Hi, did you try exploding the arrays, then doing the aggregation/count and at the end applying a udf to add the 0 values ? my experience is working on arrays is usually a bad idea. sakag writes: > Hi all, > > We have a rather interesting use case, and are struggling to come up with an > appr

Re: Time-based frequency table at scale

2020-03-11 Thread Enrico Minack
An interesting puzzle indeed. What is your measure of "that scales"? Does not fail, does not spill, does not need a huge amount of memory / disk, is O(N), processes X records per second and core? Enrico Am 11.03.20 um 16:59 schrieb sakag: Hi all, We have a rather interesting use case, an