Hi all,
For now it's possible to convert RDD of case class to DataFrame:
case class Person(name: String, age: Int)
val people: RDD[Person] = ...
val df = sqlContext.createDataFrame(people)
but backward conversion is not possible with existing API, so currently
code looks like this (example from documentation):
teenagers.map(t => "Name: " + t.getAs[String]("name"))
whereas it would be much more convenient to use RDD of case class:
teenagers.rdd[Person].map("Name: " + _.name)
I've implemented proof of concept library that allows to convert
DataFrame to typed RDD with "Pimp my library" pattern. It adds some
typesafety (conversion fails before running distributed operation if
some fields have incompatible types) and it's much more convenient when
working with nested rows, for example:
case class Room(number: Int, visitors: Seq[Person])
roomsDf.explode[Seq[Row], Person]("visitors", "visitor")(_.map(rowToPerson))
Would the community be interested in having this functionality in core?
Regards,
Vyacheslav