For grouping with each: look into grouping sets
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-multi-dimensional-aggregation.html
Am Di., 11. Juni 2019 um 06:09 Uhr schrieb Rishi Shah <
rishishah.s...@gmail.com>:
> Thank you both for your input!
>
> To calculate moving average o
Thank you both for your input!
To calculate moving average of active users, could you comment on whether
to go for RDD based implementation or dataframe? If dataframe, will window
function work here?
In general, how would spark behave when working with dataframe with date,
week, month, quarter, y
Depending on what accuracy is needed, hyperloglogs can be an interesting
alternative
https://en.m.wikipedia.org/wiki/HyperLogLog
> Am 09.06.2019 um 15:59 schrieb big data :
>
> From m opinion, Bitmap is the best solution for active users calculation.
> Other solution almost bases on count(dist
From m opinion, Bitmap is the best solution for active users calculation. Other
solution almost bases on count(distinct) calculation process, which is more
slower.
If you 've implemented Bitmap solution including how to build Bitmap, how to
load Bitmap, then Bitmap is the best choice.
在 2019/6
Hi All,
Is there a best practice around calculating daily, weekly, monthly,
quarterly, yearly active users?
One approach is to create a window of daily bitmap and aggregate it based
on period later. However I was wondering if anyone has a better approach to
tackling this problem..
--
Regards,