Re: Deploying Samza Jobs Using S3 and YARN on AWS

Jagadish Venkatraman Fri, 15 Sep 2017 11:45:56 -0700

Thank you Xiaochuan for your question!

You should ensure that *every machine in your cluster* has the S3 jar file
in its YARN class-path. From your error, it looks like the machine you are
running on does not have the JAR file corresponding to *S3AFileSystem*.


>> Whats the right way to set this up? Should I just copy over the required
AWS jars to the Hadoop conf directory

I'd lean on the side of simplicity and the *scp* route seems to address
most of your needs.

>> Should I be editing run-job.sh or run-class.sh?

You should not have to edit any of these files. Once you fix your
class-paths by copying those relevant JARs, it should just work.

Please let us know if you need more assistance.

--
Jagdish


On Fri, Sep 15, 2017 at 11:07 AM, XiaoChuan Yu <[email protected]> wrote:

> Hi,
>
> I'm trying to deploy a Samza job using YARN and S3 where I upload the zip
> package to S3 and point yarn.package.path to it.
> Does anyone know what kind of set up steps is required for this?
>
> What I've tried so far is to get Hello Samza to be run this way in AWS.
>
> However I ran into the following exception:
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
> FileSystem.java:2578)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
> ...
>
> Running "$YARN_HOME/bin/yarn classpath" gives the following:
> /home/ec2-user/deploy/yarn/etc/hadoop
> /home/ec2-user/deploy/yarn/etc/hadoop
> /home/ec2-user/deploy/yarn/etc/hadoop
> /home/ec2-user/deploy/yarn/share/hadoop/common/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/common/*
> /home/ec2-user/deploy/yarn/share/hadoop/hdfs
> /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/hdfs/*
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/*
> /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/*
> /contrib/capacity-scheduler/*.jar
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
>
> I manually copied the required AWS related jars to
> /home/ec2-user/deploy/yarn/share/hadoop/common.
> I checked that it is loadable by running "yarn
> org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not
> found" error instead of class not found.
>
> From the console output of run-job.sh I see the following in class path:
> 1. All jars under the lib directory of the zip package
> 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory)
>
> The class path from run-job.sh seem to be missing the AWS related jars
> required for S3AFileSystem.
> Whats the right way to set this up?
> Should I just copy over the required AWS jars to the Hadoop conf directory
> (2.)?
> Should I be editing run-job.sh or run-class.sh?
>
> Thanks,
> Xiaochuan Yu
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Re: Deploying Samza Jobs Using S3 and YARN on AWS

Reply via email to