subject:"\[Pyspark 2.4\] Best way to define activity within different time window"

Re: [Pyspark 2.4] Best way to define activity within different time window

2019-06-11 Thread Georg Heiler

For grouping with each: look into grouping sets https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-multi-dimensional-aggregation.html Am Di., 11. Juni 2019 um 06:09 Uhr schrieb Rishi Shah < rishishah.s...@gmail.com>: > Thank you both for your input! > > To calculate moving average o

Re: [Pyspark 2.4] Best way to define activity within different time window

2019-06-10 Thread Rishi Shah

Thank you both for your input! To calculate moving average of active users, could you comment on whether to go for RDD based implementation or dataframe? If dataframe, will window function work here? In general, how would spark behave when working with dataframe with date, week, month, quarter, y

Re: [Pyspark 2.4] Best way to define activity within different time window

2019-06-09 Thread Jörn Franke

Depending on what accuracy is needed, hyperloglogs can be an interesting alternative https://en.m.wikipedia.org/wiki/HyperLogLog > Am 09.06.2019 um 15:59 schrieb big data : > > From m opinion, Bitmap is the best solution for active users calculation. > Other solution almost bases on count(dist

Re: [Pyspark 2.4] Best way to define activity within different time window

2019-06-09 Thread big data

From m opinion, Bitmap is the best solution for active users calculation. Other solution almost bases on count(distinct) calculation process, which is more slower. If you 've implemented Bitmap solution including how to build Bitmap, how to load Bitmap, then Bitmap is the best choice. 在 2019/6

[Pyspark 2.4] Best way to define activity within different time window

2019-06-05 Thread Rishi Shah

Hi All, Is there a best practice around calculating daily, weekly, monthly, quarterly, yearly active users? One approach is to create a window of daily bitmap and aggregate it based on period later. However I was wondering if anyone has a better approach to tackling this problem.. -- Regards,

Re: [Pyspark 2.4] Best way to define activity within different time window

Re: [Pyspark 2.4] Best way to define activity within different time window

Re: [Pyspark 2.4] Best way to define activity within different time window

Re: [Pyspark 2.4] Best way to define activity within different time window

[Pyspark 2.4] Best way to define activity within different time window

5 matches

Site Navigation

Mail list logo

Footer information