Re: Reading parquet files into Spark Streaming

Sebastian Piu Sat, 27 Aug 2016 11:19:41 -0700

Forgot to paste the link...
http://ramblings.azurewebsites.net/2016/01/26/save-parquet-rdds-in-apache-spark/


On Sat, 27 Aug 2016, 19:18 Sebastian Piu, <sebastian....@gmail.com> wrote:

> Hi Renato,
>
> Check here on how to do it, it is in Java but you can translate it to
> Scala if that is what you need.
>
> Cheers
>
> On Sat, 27 Aug 2016, 14:24 Renato Marroquín Mogrovejo, <
> renatoj.marroq...@gmail.com> wrote:
>
>> Hi Akhilesh,
>>
>> Thanks for your response.
>> I am using Spark 1.6.1 and what I am trying to do is to ingest parquet
>> files into the Spark Streaming, not in batch operations.
>>
>>     val ssc = new StreamingContext(sc, Seconds(5))
>>
>> ssc.sparkContext.hadoopConfiguration.set("parquet.read.support.class",
>> "parquet.avro.AvroReadSupport")
>>
>>     val sqlContext = new SQLContext(sc)
>>
>>     import sqlContext.implicits._
>>
>>     val oDStream = ssc.fileStream[Void, Order,
>> ParquetInputFormat]("TempData/origin/")
>>
>>     oDStream.foreachRDD(relation => {
>>       if (relation.count() == 0)
>>         println("Nothing received")
>>       else {
>>         val rDF = relation.toDF().as[Order]
>>         println(rDF.first())
>>       }
>>     })
>>
>> But that doesn't work. Any ideas?
>>
>>
>> Best,
>>
>> Renato M.
>>
>> 2016-08-27 9:01 GMT+02:00 Akhilesh Pathodia <pathodia.akhil...@gmail.com>
>> :
>>
>>> Hi Renato,
>>>
>>> Which version of Spark are you using?
>>>
>>> If spark version is 1.3.0 or more then you can use SqlContext to read
>>> the parquet file which will give you DataFrame. Please follow the below
>>> link:
>>>
>>>
>>> https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#loading-data-programmatically
>>>
>>> Thanks,
>>> Akhilesh
>>>
>>> On Sat, Aug 27, 2016 at 3:26 AM, Renato Marroquín Mogrovejo <
>>> renatoj.marroq...@gmail.com> wrote:
>>>
>>>> Anybody? I think Rory also didn't get an answer from the list ...
>>>>
>>>>
>>>> https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3ccac+fre14pv5nvqhtbvqdc+6dkxo73odazfqslbso8f94ozo...@mail.gmail.com%3E
>>>>
>>>>
>>>>
>>>> 2016-08-26 17:42 GMT+02:00 Renato Marroquín Mogrovejo <
>>>> renatoj.marroq...@gmail.com>:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to use parquet files as input for DStream operations, but
>>>>> I can't find any documentation or example. The only thing I found was [1]
>>>>> but I also get the same error as in the post (Class
>>>>> parquet.avro.AvroReadSupport not found).
>>>>> Ideally I would like to do have something like this:
>>>>>
>>>>> val oDStream = ssc.fileStream[Void, Order,
>>>>> ParquetInputFormat[Order]]("data/")
>>>>>
>>>>> where Order is a case class and the files inside "data" are all
>>>>> parquet files.
>>>>> Any hints would be highly appreciated. Thanks!
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Renato M.
>>>>>
>>>>> [1]
>>>>> http://stackoverflow.com/questions/35413552/how-do-i-read-in-parquet-files-using-ssc-filestream-and-what-is-the-nature
>>>>>
>>>>
>>>>
>>>
>>

Re: Reading parquet files into Spark Streaming

Reply via email to