subject:"\[Spark\-Streaming\] moving average on categorical data with time windowing"

Re: [Spark-Streaming] moving average on categorical data with time windowing

2021-04-26 Thread Sean Owen

You might be able to do this with multiple aggregations on avg(col("col1") == "cat1") etc, but how about pivoting the DataFrame first so that you get columns like "cat1" being 1 or 0? you would end up with columns x categories new columns if you want to count all categories in all cols. But then it

[Spark-Streaming] moving average on categorical data with time windowing

2021-04-26 Thread halil

Hello everyone, I am trying to apply moving average on categorical data like below, which is a synthetic data generated by myself. sqltimestamp,col1,col2,col3,col4,col5 1618574879,cat1,cat4,cat2,cat5,cat3 1618574880,cat1,cat3,cat4,cat2,cat5 1618574881,cat5,cat3,cat4,cat2,cat1 1618574882,cat2,