Forgot to paste the link... http://ramblings.azurewebsites.net/2016/01/26/save-parquet-rdds-in-apache-spark/
On Sat, 27 Aug 2016, 19:18 Sebastian Piu, <sebastian....@gmail.com> wrote: > Hi Renato, > > Check here on how to do it, it is in Java but you can translate it to > Scala if that is what you need. > > Cheers > > On Sat, 27 Aug 2016, 14:24 Renato Marroquín Mogrovejo, < > renatoj.marroq...@gmail.com> wrote: > >> Hi Akhilesh, >> >> Thanks for your response. >> I am using Spark 1.6.1 and what I am trying to do is to ingest parquet >> files into the Spark Streaming, not in batch operations. >> >> val ssc = new StreamingContext(sc, Seconds(5)) >> >> ssc.sparkContext.hadoopConfiguration.set("parquet.read.support.class", >> "parquet.avro.AvroReadSupport") >> >> val sqlContext = new SQLContext(sc) >> >> import sqlContext.implicits._ >> >> val oDStream = ssc.fileStream[Void, Order, >> ParquetInputFormat]("TempData/origin/") >> >> oDStream.foreachRDD(relation => { >> if (relation.count() == 0) >> println("Nothing received") >> else { >> val rDF = relation.toDF().as[Order] >> println(rDF.first()) >> } >> }) >> >> But that doesn't work. Any ideas? >> >> >> Best, >> >> Renato M. >> >> 2016-08-27 9:01 GMT+02:00 Akhilesh Pathodia <pathodia.akhil...@gmail.com> >> : >> >>> Hi Renato, >>> >>> Which version of Spark are you using? >>> >>> If spark version is 1.3.0 or more then you can use SqlContext to read >>> the parquet file which will give you DataFrame. Please follow the below >>> link: >>> >>> >>> https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#loading-data-programmatically >>> >>> Thanks, >>> Akhilesh >>> >>> On Sat, Aug 27, 2016 at 3:26 AM, Renato Marroquín Mogrovejo < >>> renatoj.marroq...@gmail.com> wrote: >>> >>>> Anybody? I think Rory also didn't get an answer from the list ... >>>> >>>> >>>> https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3ccac+fre14pv5nvqhtbvqdc+6dkxo73odazfqslbso8f94ozo...@mail.gmail.com%3E >>>> >>>> >>>> >>>> 2016-08-26 17:42 GMT+02:00 Renato Marroquín Mogrovejo < >>>> renatoj.marroq...@gmail.com>: >>>> >>>>> Hi all, >>>>> >>>>> I am trying to use parquet files as input for DStream operations, but >>>>> I can't find any documentation or example. The only thing I found was [1] >>>>> but I also get the same error as in the post (Class >>>>> parquet.avro.AvroReadSupport not found). >>>>> Ideally I would like to do have something like this: >>>>> >>>>> val oDStream = ssc.fileStream[Void, Order, >>>>> ParquetInputFormat[Order]]("data/") >>>>> >>>>> where Order is a case class and the files inside "data" are all >>>>> parquet files. >>>>> Any hints would be highly appreciated. Thanks! >>>>> >>>>> >>>>> Best, >>>>> >>>>> Renato M. >>>>> >>>>> [1] >>>>> http://stackoverflow.com/questions/35413552/how-do-i-read-in-parquet-files-using-ssc-filestream-and-what-is-the-nature >>>>> >>>> >>>> >>> >>