Dear Spark developers,

I would like to create a dataframe with one column. However, the 
createDataFrame method accepts at least a Product:

val data = Seq(1.0, 2.0)
val rdd = sc.parallelize(data, 2)
val df = sqlContext.createDataFrame(rdd)
[fail]<console>:25: error: overloaded method value createDataFrame with 
alternatives:
 [A <: Product](data: Seq[A])(implicit evidence$2: 
reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and>
  [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: 
reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame
cannot be applied to (org.apache.spark.rdd.RDD[Double])
       val df = sqlContext.createDataFrame(rdd)

So, if I zip rdd with index, then it is OK:
val df = sqlContext.createDataFrame(rdd.zipWithIndex)
[success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint]

Also, if I use the case class, it also seems to work:
case class Hack(x: Double)
val caseRDD = rdd.map( x => Hack(x))
val df = sqlContext.createDataFrame(caseRDD)
[success]df: org.apache.spark.sql.DataFrame = [x: double]

What is the recommended way of creating a dataframe with one column?

Best regards, Alexander

Reply via email to