Re: How to save ReceiverInputDStream to Hadoop using saveAsNewAPIHadoopFile

Akhil Das Fri, 10 Oct 2014 00:02:50 -0700

You can convert this ReceiverInputDStream
<http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.ReceiverInputDStream>
into PairRDDFuctions
<http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions>
and call the saveAsNewAPIHadoopFile.


Thanks
Best Regards

On Fri, Oct 10, 2014 at 11:28 AM, Buntu Dev <buntu...@gmail.com> wrote:

> Basically I'm attempting to convert a JSON stream to Parquet and I get
> this error without the .values or .map(_._2) :
>
>  value saveAsNewAPIHadoopFile is not a member of
> org.apache.spark.streaming.dstream.ReceiverInputDStream[(String, String)]
>
>
>
>
> On Thu, Oct 9, 2014 at 10:15 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> Your RDD does not contain pairs, since you ".map(_._2)" (BTW that can
>> just be ".values"). "Hadoop files" means "SequenceFiles" and those
>> store key-value pairs. That's why the method only appears for
>> RDD[(K,V)].
>>
>> On Fri, Oct 10, 2014 at 3:50 AM, Buntu Dev <buntu...@gmail.com> wrote:
>> > Thanks Sean, but I'm importing
>> org.apache.spark.streaming.StreamingContext._
>> >
>> > Here are the spark imports:
>> >
>> > import org.apache.spark.streaming._
>> >
>> > import org.apache.spark.streaming.StreamingContext._
>> >
>> > import org.apache.spark.streaming.kafka._
>> >
>> > import org.apache.spark.SparkConf
>> >
>> > ....
>> >
>> >     val stream = KafkaUtils.createStream(ssc, zkQuorum, group,
>> > topicpMap).map(_._2)             stream.saveAsNewAPIHadoopFile
>> (destination,
>> > classOf[Void], classOf[Group], classOf[ExampleOutputFormat], conf)
>> >
>> > ....
>> >
>> > Anything else I might be missing?
>>
>
>

Re: How to save ReceiverInputDStream to Hadoop using saveAsNewAPIHadoopFile

Reply via email to