from:"Michael Doo"

Re: Handling skew in window functions

2021-04-28 Thread Michael Doo

e it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from >

Handling skew in window functions

2021-04-27 Thread Michael Doo

Hello! I have a data set that I'm trying to process in PySpark. The data (on disk as Parquet) contains user IDs, session IDs, and metadata related to each session. I'm adding a number of columns to my dataframe that are the result of aggregating over a window. The issue I'm running into is that al