Just an update on this - I found that the script by Amazon was the culprit
- not exactly sure why.  When I installed Spark manually onto the EMR (and
did the manual configuration of all the EMR stuff), it worked fine.

On Mon, Dec 22, 2014 at 11:37 AM, Adam Gilmore <dragoncu...@gmail.com>
wrote:

> Hi all,
>
> I've just launched a new Amazon EMR cluster and used the script at:
>
> s3://support.elasticmapreduce/spark/install-spark
>
> to install Spark (this script was upgraded to support 1.2).
>
> I know there are tools to launch a Spark cluster in EC2, but I want to use
> EMR.
>
> Everything installs fine; however, when I go to read from a Parquet file,
> I end up with (the main exception):
>
> Caused by: java.lang.NoSuchMethodError:
> parquet.hadoop.ParquetInputSplit.<init>(Lorg/apache/hadoop/fs/Path;JJJ[Ljava/lang/String;[JLjava/lang/String;Ljava/util/Map;)V
>         at
> parquet.hadoop.TaskSideMetadataSplitStrategy.generateTaskSideMDSplits(ParquetInputFormat.java:578)
>         ... 55 more
>
> It seems to me like a version mismatch somewhere.  Where is the
> parquet-hadoop jar coming from?  Is it built into a fat jar for Spark?
>
> Any help would be appreciated.  Note that 1.1.1 worked fine with Parquet
> files.
>

Reply via email to