How to convert and RDD to DF?

Eran Witkon Sun, 20 Dec 2015 06:32:16 -0800

Hi,

I have an RDD
jsonGzip
res3: org.apache.spark.rdd.RDD[(String, String, String, String)] =
MapPartitionsRDD[8] at map at <console>:65


which I want to convert to a DataFrame with schema
so I created a schema:

al schema =
  StructType(
    StructField("cty", StringType, false) ::
      StructField("hse", StringType, false) ::
        StructField("nm", StringType, false) ::
          StructField("yrs", StringType, false) ::Nil)

and called

val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
<console>:36: error: overloaded method value createDataFrame with alternatives:
  (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass:
Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.rdd.RDD[_],beanClass:
Class[_])org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema:
org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
<and>
  (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema:
org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
 cannot be applied to (org.apache.spark.rdd.RDD[(String, String,
String, String)], org.apache.spark.sql.types.StructType)
       val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)


But as you see I don't have the right RDD type.

So how cane I get the a dataframe with the right column names?

How to convert and RDD to DF?

Reply via email to