Re: Need to order iterator values in spark dataframe

2020-04-01 Thread Ranjan, Abhinav
key, then sortWithinPartition, and the groupByKey. Since data are already hash-partitioned by key, Spark should not shuffle the data hence change the sort wihtin each partition: ds.repartition($"key").sortWithinPartitions($"code").groupBy($"key") Enrico Am 26.03.20 um

Need to order iterator values in spark dataframe

2020-03-26 Thread Ranjan, Abhinav
Hi, I have a dataframe which has data like: key                         |    code    |    code_value 1                            |    c1        |    11 1                            |    c2        |    12 1                            |    c2        |    9 1                            |    c3   

override collect_list

2019-11-26 Thread Ranjan, Abhinav
Hi all, I want to collect some rows in a list by using the spark's collect_list function. However, the no. of rows getting in the list is overflowing the memory. Is there any way to force the collection of rows onto the disk rather than in memory, or else instead of collecting it as a list,