Spark uses the Hadoop InputFormat and OutputFormat classes, so you can simply 
create a JobConf to read the data and pass that to SparkContext.hadoopFile. 
There are some examples for Parquet usage here: 
http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/ and here: 
http://engineering.ooyala.com/blog/using-parquet-and-scrooge-spark.

Matei

On Apr 27, 2014, at 11:41 PM, Sai Prasanna <ansaiprasa...@gmail.com> wrote:

> Hi All,
> 
> I want to store a csv-text file in Parquet format in HDFS and then do some 
> processing in Spark.
> 
> Somehow my search to find the way to do was futile. More help was available 
> for parquet with impala. 
> 
> Any guidance here? Thanks !!
> 

Reply via email to