Basically I'm attempting to convert a JSON stream to Parquet and I get this error without the .values or .map(_._2) :
value saveAsNewAPIHadoopFile is not a member of org.apache.spark.streaming.dstream.ReceiverInputDStream[(String, String)] On Thu, Oct 9, 2014 at 10:15 PM, Sean Owen <so...@cloudera.com> wrote: > Your RDD does not contain pairs, since you ".map(_._2)" (BTW that can > just be ".values"). "Hadoop files" means "SequenceFiles" and those > store key-value pairs. That's why the method only appears for > RDD[(K,V)]. > > On Fri, Oct 10, 2014 at 3:50 AM, Buntu Dev <buntu...@gmail.com> wrote: > > Thanks Sean, but I'm importing > org.apache.spark.streaming.StreamingContext._ > > > > Here are the spark imports: > > > > import org.apache.spark.streaming._ > > > > import org.apache.spark.streaming.StreamingContext._ > > > > import org.apache.spark.streaming.kafka._ > > > > import org.apache.spark.SparkConf > > > > .... > > > > val stream = KafkaUtils.createStream(ssc, zkQuorum, group, > > topicpMap).map(_._2) stream.saveAsNewAPIHadoopFile > (destination, > > classOf[Void], classOf[Group], classOf[ExampleOutputFormat], conf) > > > > .... > > > > Anything else I might be missing? >