Re: scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance

2020-07-13 Thread Ivan Petrov
What do you mean "without conversion"? def flatten(rdd: RDD[NestedStructure]): Dataset[MyCaseClass] = { rdd.flatMap { nestedElement => flatten(nestedElement) /** List[MyCaseClass] */ } .toDS() } Can it be better? вт, 14 июл. 2020 г. в 01:13, Sean Owen : > Wouldn't toDS() do this withou

Re: scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance

2020-07-13 Thread Sean Owen
Wouldn't toDS() do this without conversion? On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov wrote: > > Hi! > I'm trying to understand the cost of RDD to Dataset conversion > It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 > records > It takes around 15 minutes to convert them