Re: Handling skew in window functions

2021-04-28 Thread Mich Talebzadeh
Hi Michael, I guess as ever your mileage varies. My suggestion is that you try saling and see whether it will retain the ordering. The most significant column will be step_id so I guess it will be OK. HTH Mich view my Linkedin profile

Re: Handling skew in window functions

2021-04-28 Thread Michael Doo
Hi Mich, Thank you for the suggestions. I took a look at the other thread you mentioned. One feature of my code that I'm not sure would be affected by salting is the use of collect_list(). My understanding is that collect_list() will retain the row ordering of values. You can see in my Window defi

Re: Handling skew in window functions

2021-04-27 Thread Mich Talebzadeh
Hi, Let us go back and understand this behaviour. Sounds like your partitioning with (user_id, group_id) results in skewed data. We just had a similar skewed data issue/thread with title "Tasks are skewed to one executor" Have a look at that one and see whether any of those suggestions like ad