For an example, see the ml-feature word2vec user guide <https://spark.apache.org/docs/latest/ml-features.html#word2vec>
On Mon, Sep 14, 2015 at 11:03 AM, Feynman Liang <fli...@databricks.com> wrote: > You could use `Tuple1(x)` instead of `Hack` > > On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > >> Dear Spark developers, >> >> >> >> I would like to create a dataframe with one column. However, the >> createDataFrame method accepts at least a Product: >> >> >> >> val data = Seq(1.0, 2.0) >> >> val rdd = sc.parallelize(data, 2) >> >> val df = sqlContext.createDataFrame(rdd) >> >> [fail]<console>:25: error: overloaded method value createDataFrame with >> alternatives: >> >> [A <: Product](data: Seq[A])(implicit evidence$2: >> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and> >> >> [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: >> reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame >> >> cannot be applied to (org.apache.spark.rdd.RDD[Double]) >> >> val df = sqlContext.createDataFrame(rdd) >> >> >> >> So, if I zip rdd with index, then it is OK: >> >> val df = sqlContext.createDataFrame(rdd.zipWithIndex) >> >> [success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint] >> >> >> >> Also, if I use the case class, it also seems to work: >> >> case class Hack(x: Double) >> >> val caseRDD = rdd.map( x => Hack(x)) >> >> val df = sqlContext.createDataFrame(caseRDD) >> >> [success]df: org.apache.spark.sql.DataFrame = [x: double] >> >> >> >> What is the recommended way of creating a dataframe with one column? >> >> >> >> Best regards, Alexander >> > >