What do you mean "without conversion"? def flatten(rdd: RDD[NestedStructure]): Dataset[MyCaseClass] = { rdd.flatMap { nestedElement => flatten(nestedElement) /** List[MyCaseClass] */ } .toDS() } Can it be better?
вт, 14 июл. 2020 г. в 01:13, Sean Owen <sro...@gmail.com>: > Wouldn't toDS() do this without conversion? > > On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov <capacyt...@gmail.com> wrote: > > > > Hi! > > I'm trying to understand the cost of RDD to Dataset conversion > > It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 > records > > It takes around 15 minutes to convert them to Dataset[MyCaseClass] > > The shema of MyCaseClass is > > str01: String, > > str02: String, > > str03: String, > > str04: String, > > long01: Long, > > long02: Long, > > double01: Double, > > map: Map[String, Double] > > > > What can i do in order to run it faster? >