Hi all,

For now it's possible to convert RDD of case class to DataFrame:

case class Person(name: String, age: Int)

val people: RDD[Person] = ...
val df = sqlContext.createDataFrame(people)

but backward conversion is not possible with existing API, so currently code looks like this (example from documentation):

teenagers.map(t => "Name: " + t.getAs[String]("name"))

whereas it would be much more convenient to use RDD of case class:

teenagers.rdd[Person].map("Name: " + _.name)


I've implemented proof of concept library that allows to convert DataFrame to typed RDD with "Pimp my library" pattern. It adds some typesafety (conversion fails before running distributed operation if some fields have incompatible types) and it's much more convenient when working with nested rows, for example:

case class Room(number: Int, visitors: Seq[Person])

roomsDf.explode[Seq[Row], Person]("visitors", "visitor")(_.map(rowToPerson))

Would the community be interested in having this functionality in core?

Regards,
Vyacheslav

Reply via email to