For those interested, after digging further, I was able to consistently
reproduce the issue with a synthetic dataset. My findings are documented
here:
https://gist.github.com/igozali/d327a85646abe7ab10c2ae479bed431f
--
Regards,
Ivan Gozali
Lecida
Email: i...@lecida.com
On Wed, Jan 18, 2017 at
Hello,
I have a use case that seems relatively simple to solve using Spark, but
can't seem to figure out a sure way to do this.
I have a dataset which contains time series data for various users. All I'm
looking to do is:
- partition this dataset by user ID
- sort the time series data for