I might be missing you point but I don't get it.
My understanding is that I need a RDD containing Rows but how do I get it?

I started with a DataFrame
run a map on it and got the RDD [string,string,string,strng] not I want to
convert it back to a DataFrame and failing....

Why?


On Sun, Dec 20, 2015 at 4:49 PM Ted Yu <yuzhih...@gmail.com> wrote:

> See the comment for createDataFrame(rowRDD: RDD[Row], schema: StructType)
> method:
>
>    * Creates a [[DataFrame]] from an [[RDD]] containing [[Row]]s using the
> given schema.
>    * It is important to make sure that the structure of every [[Row]] of
> the provided RDD matches
>    * the provided schema. Otherwise, there will be runtime exception.
>    * Example:
>    * {{{
>    *  import org.apache.spark.sql._
>    *  import org.apache.spark.sql.types._
>    *  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>    *
>    *  val schema =
>    *    StructType(
>    *      StructField("name", StringType, false) ::
>    *      StructField("age", IntegerType, true) :: Nil)
>    *
>    *  val people =
>    *    sc.textFile("examples/src/main/resources/people.txt").map(
>    *      _.split(",")).map(p => Row(p(0), p(1).trim.toInt))
>    *  val dataFrame = sqlContext.createDataFrame(people, schema)
>    *  dataFrame.printSchema
>    *  // root
>    *  // |-- name: string (nullable = false)
>    *  // |-- age: integer (nullable = true)
>
> Cheers
>
> On Sun, Dec 20, 2015 at 6:31 AM, Eran Witkon <eranwit...@gmail.com> wrote:
>
>> Hi,
>>
>> I have an RDD
>> jsonGzip
>> res3: org.apache.spark.rdd.RDD[(String, String, String, String)] =
>> MapPartitionsRDD[8] at map at <console>:65
>>
>> which I want to convert to a DataFrame with schema
>> so I created a schema:
>>
>> al schema =
>>   StructType(
>>     StructField("cty", StringType, false) ::
>>       StructField("hse", StringType, false) ::
>>         StructField("nm", StringType, false) ::
>>           StructField("yrs", StringType, false) ::Nil)
>>
>> and called
>>
>> val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
>> <console>:36: error: overloaded method value createDataFrame with 
>> alternatives:
>>   (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: 
>> Class[_])org.apache.spark.sql.DataFrame <and>
>>   (rdd: org.apache.spark.rdd.RDD[_],beanClass: 
>> Class[_])org.apache.spark.sql.DataFrame <and>
>>   (rowRDD: 
>> org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: 
>> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
>>   (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: 
>> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
>>  cannot be applied to (org.apache.spark.rdd.RDD[(String, String, String, 
>> String)], org.apache.spark.sql.types.StructType)
>>        val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
>>
>>
>> But as you see I don't have the right RDD type.
>>
>> So how cane I get the a dataframe with the right column names?
>>
>>
>>
>

Reply via email to