You could use `Tuple1(x)` instead of `Hack` On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander < alexander.ula...@hpe.com> wrote:
> Dear Spark developers, > > > > I would like to create a dataframe with one column. However, the > createDataFrame method accepts at least a Product: > > > > val data = Seq(1.0, 2.0) > > val rdd = sc.parallelize(data, 2) > > val df = sqlContext.createDataFrame(rdd) > > [fail]<console>:25: error: overloaded method value createDataFrame with > alternatives: > > [A <: Product](data: Seq[A])(implicit evidence$2: > reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and> > > [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: > reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame > > cannot be applied to (org.apache.spark.rdd.RDD[Double]) > > val df = sqlContext.createDataFrame(rdd) > > > > So, if I zip rdd with index, then it is OK: > > val df = sqlContext.createDataFrame(rdd.zipWithIndex) > > [success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint] > > > > Also, if I use the case class, it also seems to work: > > case class Hack(x: Double) > > val caseRDD = rdd.map( x => Hack(x)) > > val df = sqlContext.createDataFrame(caseRDD) > > [success]df: org.apache.spark.sql.DataFrame = [x: double] > > > > What is the recommended way of creating a dataframe with one column? > > > > Best regards, Alexander >