java/scala? I think there is everything in dataframes tutorial *e.g. if u have dataframe and working from java - toJavaRDD <https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrame.html#toJavaRDD()>* ()
On 5 November 2015 at 21:13, swetha kasireddy <[email protected]> wrote: > How to convert a parquet file that is saved in hdfs to an RDD after > reading the file from hdfs? > > On Thu, Nov 5, 2015 at 10:02 AM, Igor Berman <[email protected]> > wrote: > >> Hi, >> we are using avro with compression(snappy). As soon as you have enough >> partitions, the saving won't be a problem imho. >> in general hdfs is pretty fast, s3 is less so >> the issue with storing data is that you will loose your partitioner(even >> though rdd has it) at loading moment. There is PR that tries to solve this. >> >> >> On 5 November 2015 at 01:09, swetha <[email protected]> wrote: >> >>> Hi, >>> >>> What is the efficient approach to save an RDD as a file in HDFS and >>> retrieve >>> it back? I was thinking between Avro, Parquet and SequenceFileFormart. We >>> currently use SequenceFileFormart for one of our use cases. >>> >>> Any example on how to store and retrieve an RDD in an Avro and Parquet >>> file >>> formats would be of great help. >>> >>> Thanks, >>> Swetha >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-approach-to-store-an-RDD-as-a-file-in-HDFS-and-read-it-back-as-an-RDD-tp25279.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >
