Re: Sorting each partitions and writing to CSVs

2017-01-24 Thread Ivan Gozali
For those interested, after digging further, I was able to consistently reproduce the issue with a synthetic dataset. My findings are documented here: https://gist.github.com/igozali/d327a85646abe7ab10c2ae479bed431f -- Regards, Ivan Gozali Lecida Email: i...@lecida.com On Wed, Jan 18, 2017 at

Sorting each partitions and writing to CSVs

2017-01-18 Thread Ivan Gozali
Hello, I have a use case that seems relatively simple to solve using Spark, but can't seem to figure out a sure way to do this. I have a dataset which contains time series data for various users. All I'm looking to do is: - partition this dataset by user ID - sort the time series data for