Re: How to get top N elements in a DataSet?

2017-01-24 Thread Ivan Mushketyk
; > > > I think you can use MapPartition for that. > > So basically: > > > > dataset // assuming some partitioning that can be reused to avoid a > shuffle > > .sortPartition(1, Order.DESCENDING) > > .mapPartition(new ReturnFirstTen()) > > .

How to get top N elements in a DataSet?

2017-01-24 Thread Ivan Mushketyk
Hi, I have a dataset of tuples with two fields ids and ratings and I need to find 10 elements with the highest rating in this dataset. I found a solution, but I think it's suboptimal and I think there should be a better way to do it. The best thing that I came up with is to partition dataset by r