Thank you Xiaochuan for your question! You should ensure that *every machine in your cluster* has the S3 jar file in its YARN class-path. From your error, it looks like the machine you are running on does not have the JAR file corresponding to *S3AFileSystem*.
>> Whats the right way to set this up? Should I just copy over the required AWS jars to the Hadoop conf directory I'd lean on the side of simplicity and the *scp* route seems to address most of your needs. >> Should I be editing run-job.sh or run-class.sh? You should not have to edit any of these files. Once you fix your class-paths by copying those relevant JARs, it should just work. Please let us know if you need more assistance. -- Jagdish On Fri, Sep 15, 2017 at 11:07 AM, XiaoChuan Yu <xiaochuan...@kik.com> wrote: > Hi, > > I'm trying to deploy a Samza job using YARN and S3 where I upload the zip > package to S3 and point yarn.package.path to it. > Does anyone know what kind of set up steps is required for this? > > What I've tried so far is to get Hello Samza to be run this way in AWS. > > However I ran into the following exception: > Exception in thread "main" java.lang.RuntimeException: > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass( > FileSystem.java:2578) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > ... > > Running "$YARN_HOME/bin/yarn classpath" gives the following: > /home/ec2-user/deploy/yarn/etc/hadoop > /home/ec2-user/deploy/yarn/etc/hadoop > /home/ec2-user/deploy/yarn/etc/hadoop > /home/ec2-user/deploy/yarn/share/hadoop/common/lib/* > /home/ec2-user/deploy/yarn/share/hadoop/common/* > /home/ec2-user/deploy/yarn/share/hadoop/hdfs > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/* > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/* > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* > /home/ec2-user/deploy/yarn/share/hadoop/yarn/* > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/* > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/* > /contrib/capacity-scheduler/*.jar > /home/ec2-user/deploy/yarn/share/hadoop/yarn/* > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* > > I manually copied the required AWS related jars to > /home/ec2-user/deploy/yarn/share/hadoop/common. > I checked that it is loadable by running "yarn > org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not > found" error instead of class not found. > > From the console output of run-job.sh I see the following in class path: > 1. All jars under the lib directory of the zip package > 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory) > > The class path from run-job.sh seem to be missing the AWS related jars > required for S3AFileSystem. > Whats the right way to set this up? > Should I just copy over the required AWS jars to the Hadoop conf directory > (2.)? > Should I be editing run-job.sh or run-class.sh? > > Thanks, > Xiaochuan Yu > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University