I am not looking for Spark Sql specifically. My usecase is that I need to save an RDD as a parquet file in hdfs at the end of a batch and load it back and convert it into an RDD in the next batch. The RDD has a String and a Long as the key/value pairs.
On Wed, Nov 4, 2015 at 11:52 PM, Stefano Baghino < [email protected]> wrote: > What scenario would you like to optimize for? If you have something more > specific regarding your use case, the mailing list can surely provide you > with some very good advice. > > If you just want to save an RDD as Avro you can use a module from > Databricks (the README on GitHub > <https://github.com/databricks/spark-avro> also gives you some example), > otherwise Parquet is natively supported by Spark SQL, the official > documentation contains useful examples > <http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files> > . > > On Thu, Nov 5, 2015 at 12:09 AM, swetha <[email protected]> wrote: > >> Hi, >> >> What is the efficient approach to save an RDD as a file in HDFS and >> retrieve >> it back? I was thinking between Avro, Parquet and SequenceFileFormart. We >> currently use SequenceFileFormart for one of our use cases. >> >> Any example on how to store and retrieve an RDD in an Avro and Parquet >> file >> formats would be of great help. >> >> Thanks, >> Swetha >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-approach-to-store-an-RDD-as-a-file-in-HDFS-and-read-it-back-as-an-RDD-tp25279.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > -- > BR, > Stefano Baghino > > Software Engineer @ Radicalbit >
