I might be missing you point but I don't get it. My understanding is that I need a RDD containing Rows but how do I get it?
I started with a DataFrame run a map on it and got the RDD [string,string,string,strng] not I want to convert it back to a DataFrame and failing.... Why? On Sun, Dec 20, 2015 at 4:49 PM Ted Yu <yuzhih...@gmail.com> wrote: > See the comment for createDataFrame(rowRDD: RDD[Row], schema: StructType) > method: > > * Creates a [[DataFrame]] from an [[RDD]] containing [[Row]]s using the > given schema. > * It is important to make sure that the structure of every [[Row]] of > the provided RDD matches > * the provided schema. Otherwise, there will be runtime exception. > * Example: > * {{{ > * import org.apache.spark.sql._ > * import org.apache.spark.sql.types._ > * val sqlContext = new org.apache.spark.sql.SQLContext(sc) > * > * val schema = > * StructType( > * StructField("name", StringType, false) :: > * StructField("age", IntegerType, true) :: Nil) > * > * val people = > * sc.textFile("examples/src/main/resources/people.txt").map( > * _.split(",")).map(p => Row(p(0), p(1).trim.toInt)) > * val dataFrame = sqlContext.createDataFrame(people, schema) > * dataFrame.printSchema > * // root > * // |-- name: string (nullable = false) > * // |-- age: integer (nullable = true) > > Cheers > > On Sun, Dec 20, 2015 at 6:31 AM, Eran Witkon <eranwit...@gmail.com> wrote: > >> Hi, >> >> I have an RDD >> jsonGzip >> res3: org.apache.spark.rdd.RDD[(String, String, String, String)] = >> MapPartitionsRDD[8] at map at <console>:65 >> >> which I want to convert to a DataFrame with schema >> so I created a schema: >> >> al schema = >> StructType( >> StructField("cty", StringType, false) :: >> StructField("hse", StringType, false) :: >> StructField("nm", StringType, false) :: >> StructField("yrs", StringType, false) ::Nil) >> >> and called >> >> val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema) >> <console>:36: error: overloaded method value createDataFrame with >> alternatives: >> (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: >> Class[_])org.apache.spark.sql.DataFrame <and> >> (rdd: org.apache.spark.rdd.RDD[_],beanClass: >> Class[_])org.apache.spark.sql.DataFrame <and> >> (rowRDD: >> org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: >> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and> >> (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: >> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame >> cannot be applied to (org.apache.spark.rdd.RDD[(String, String, String, >> String)], org.apache.spark.sql.types.StructType) >> val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema) >> >> >> But as you see I don't have the right RDD type. >> >> So how cane I get the a dataframe with the right column names? >> >> >> >