Hi,
I have an RDD
jsonGzip
res3: org.apache.spark.rdd.RDD[(String, String, String, String)] =
MapPartitionsRDD[8] at map at <console>:65
which I want to convert to a DataFrame with schema
so I created a schema:
al schema =
StructType(
StructField("cty", StringType, false) ::
StructField("hse", StringType, false) ::
StructField("nm", StringType, false) ::
StructField("yrs", StringType, false) ::Nil)
and called
val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
<console>:36: error: overloaded method value createDataFrame with alternatives:
(rdd: org.apache.spark.api.java.JavaRDD[_],beanClass:
Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.rdd.RDD[_],beanClass:
Class[_])org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema:
org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
<and>
(rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema:
org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
cannot be applied to (org.apache.spark.rdd.RDD[(String, String,
String, String)], org.apache.spark.sql.types.StructType)
val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
But as you see I don't have the right RDD type.
So how cane I get the a dataframe with the right column names?