Re: How to save ReceiverInputDStream to Hadoop using saveAsNewAPIHadoopFile

Buntu Dev Thu, 09 Oct 2014 22:59:17 -0700

Basically I'm attempting to convert a JSON stream to Parquet and I get this
error without the .values or .map(_._2) :


 value saveAsNewAPIHadoopFile is not a member of
org.apache.spark.streaming.dstream.ReceiverInputDStream[(String, String)]




On Thu, Oct 9, 2014 at 10:15 PM, Sean Owen <so...@cloudera.com> wrote:

> Your RDD does not contain pairs, since you ".map(_._2)" (BTW that can
> just be ".values"). "Hadoop files" means "SequenceFiles" and those
> store key-value pairs. That's why the method only appears for
> RDD[(K,V)].
>
> On Fri, Oct 10, 2014 at 3:50 AM, Buntu Dev <buntu...@gmail.com> wrote:
> > Thanks Sean, but I'm importing
> org.apache.spark.streaming.StreamingContext._
> >
> > Here are the spark imports:
> >
> > import org.apache.spark.streaming._
> >
> > import org.apache.spark.streaming.StreamingContext._
> >
> > import org.apache.spark.streaming.kafka._
> >
> > import org.apache.spark.SparkConf
> >
> > ....
> >
> >     val stream = KafkaUtils.createStream(ssc, zkQuorum, group,
> > topicpMap).map(_._2)             stream.saveAsNewAPIHadoopFile
> (destination,
> > classOf[Void], classOf[Group], classOf[ExampleOutputFormat], conf)
> >
> > ....
> >
> > Anything else I might be missing?
>

Re: How to save ReceiverInputDStream to Hadoop using saveAsNewAPIHadoopFile

Reply via email to