Re: how to make a dataframe of Array[Doubles] ?

Michael Armbrust Tue, 15 Dec 2015 10:54:27 -0800

You don't have to turn your array into a tuple, but you do need to have a
product that wraps it (this is how we get names for the columns).


case class MyData(data: Array[Double])
val df = Seq(MyData(Array(1.0, 2.0, 3.0, 4.0)), ...).toDF()

On Mon, Dec 14, 2015 at 9:35 PM, Jeff Zhang <zjf...@gmail.com> wrote:

> Please use tuple instead of array. ( the element must implement trait
> Product if you want to convert RDD to DF)
>
> val testvec = Array( (1.0, 2.0, 3.0, 4.0), (5.0, 6.0, 7.0, 8.0))
>
> On Tue, Dec 15, 2015 at 1:12 PM, AlexG <swift...@gmail.com> wrote:
>
>> My attempts to create a dataframe of Array[Doubles], I get an error about
>> RDD[Array[Double]] not having a toDF function:
>>
>> import sqlContext.implicits._
>> val testvec = Array( Array(1.0, 2.0, 3.0, 4.0), Array(5.0, 6.0, 7.0, 8.0))
>> val testrdd = sc.parallelize(testvec)
>> testrdd.toDF
>>
>> gives
>>
>> <console>:29: error: value toDF is not a member of
>> org.apache.spark.rdd.RDD[Array[Double]]
>>               testrdd.toD
>>
>> on the other hand, if I make the dataframe more complicated, e.g.
>> Tuple2[String, Array[Double]], the transformation goes through:
>>
>> val testvec = Array( ("row 1", Array(1.0, 2.0, 3.0, 4.0)), ("row 2",
>> Array(5.0, 6.0, 7.0, 8.0)) )
>> val testrdd = sc.parallelize(testvec)
>> testrdd.toDF
>>
>> gives
>> testrdd: org.apache.spark.rdd.RDD[(String, Array[Double])] =
>> ParallelCollectionRDD[1] at parallelize at <console>:29
>> res3: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]
>>
>> What's the cause of this, and how can I get around it to create a
>> dataframe
>> of Array[Double]? My end goal is to store that dataframe in Parquet (yes,
>> I
>> do want to store all the values in a single column, not individual
>> columns)
>>
>> I am using Spark 1.5.2
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-make-a-dataframe-of-Array-Doubles-tp25704.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: how to make a dataframe of Array[Doubles] ?

Reply via email to