Just an update on this - I found that the script by Amazon was the culprit - not exactly sure why. When I installed Spark manually onto the EMR (and did the manual configuration of all the EMR stuff), it worked fine.
On Mon, Dec 22, 2014 at 11:37 AM, Adam Gilmore <dragoncu...@gmail.com> wrote: > Hi all, > > I've just launched a new Amazon EMR cluster and used the script at: > > s3://support.elasticmapreduce/spark/install-spark > > to install Spark (this script was upgraded to support 1.2). > > I know there are tools to launch a Spark cluster in EC2, but I want to use > EMR. > > Everything installs fine; however, when I go to read from a Parquet file, > I end up with (the main exception): > > Caused by: java.lang.NoSuchMethodError: > parquet.hadoop.ParquetInputSplit.<init>(Lorg/apache/hadoop/fs/Path;JJJ[Ljava/lang/String;[JLjava/lang/String;Ljava/util/Map;)V > at > parquet.hadoop.TaskSideMetadataSplitStrategy.generateTaskSideMDSplits(ParquetInputFormat.java:578) > ... 55 more > > It seems to me like a version mismatch somewhere. Where is the > parquet-hadoop jar coming from? Is it built into a fat jar for Spark? > > Any help would be appreciated. Note that 1.1.1 worked fine with Parquet > files. >