Re: Parsing CSV files in Spark

2015-02-06 Thread Mohit Jaggi
As Sean said, this is just a few lines of code. You can see an example here: https://github.com/AyasdiOpenSource/bigdf/blob/master/src/main/scala/com/ayasdi/bigdf/DF.scala#L660 > On Feb 6, 201

Re: Parsing CSV files in Spark

2015-02-06 Thread Charles Feduke
I've been doing a bunch of work with CSVs in Spark, mostly saving them as a merged CSV (instead of the various part-n files). You might find the following links useful: - This article is about combining the part files and outputting a header as the first line in the merged results: http://jav

Re: Parsing CSV files in Spark

2015-02-06 Thread Sean Owen
You can do this manually without much trouble: get your files on a distributed store like HDFS, read them with textFile, filter out headers, parse with a CSV library like Commons CSV, select columns, format and store the result. That's tens of lines of code. However you probably want to start by l