I found out that it was necessary to include "hadoop-aws" as a part of the package submitted to YARN similar to the instructions for deploying from HDFS <https://samza.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html> . However, due to a dependency conflict on the AWS SDK between our code and "hadoop-aws", we can't actually include it. We are now planning to make use of HTTP FS instead.
On Fri, Sep 15, 2017 at 2:45 PM Jagadish Venkatraman <jagadish1...@gmail.com> wrote: > Thank you Xiaochuan for your question! > > You should ensure that *every machine in your cluster* has the S3 jar file > in its YARN class-path. From your error, it looks like the machine you are > running on does not have the JAR file corresponding to *S3AFileSystem*. > > >> Whats the right way to set this up? Should I just copy over the required > AWS jars to the Hadoop conf directory > > I'd lean on the side of simplicity and the *scp* route seems to address > most of your needs. > > >> Should I be editing run-job.sh or run-class.sh? > > You should not have to edit any of these files. Once you fix your > class-paths by copying those relevant JARs, it should just work. > > Please let us know if you need more assistance. > > -- > Jagdish > > > On Fri, Sep 15, 2017 at 11:07 AM, XiaoChuan Yu <xiaochuan...@kik.com> > wrote: > > > Hi, > > > > I'm trying to deploy a Samza job using YARN and S3 where I upload the zip > > package to S3 and point yarn.package.path to it. > > Does anyone know what kind of set up steps is required for this? > > > > What I've tried so far is to get Hello Samza to be run this way in AWS. > > > > However I ran into the following exception: > > Exception in thread "main" java.lang.RuntimeException: > > java.lang.ClassNotFoundException: Class > > org.apache.hadoop.fs.s3a.S3AFileSystem not found > > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112) > > at org.apache.hadoop.fs.FileSystem.getFileSystemClass( > > FileSystem.java:2578) > > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > > ... > > > > Running "$YARN_HOME/bin/yarn classpath" gives the following: > > /home/ec2-user/deploy/yarn/etc/hadoop > > /home/ec2-user/deploy/yarn/etc/hadoop > > /home/ec2-user/deploy/yarn/etc/hadoop > > /home/ec2-user/deploy/yarn/share/hadoop/common/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/common/* > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/* > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/* > > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/* > > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/* > > /contrib/capacity-scheduler/*.jar > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/* > > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* > > > > I manually copied the required AWS related jars to > > /home/ec2-user/deploy/yarn/share/hadoop/common. > > I checked that it is loadable by running "yarn > > org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not > > found" error instead of class not found. > > > > From the console output of run-job.sh I see the following in class path: > > 1. All jars under the lib directory of the zip package > > 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory) > > > > The class path from run-job.sh seem to be missing the AWS related jars > > required for S3AFileSystem. > > Whats the right way to set this up? > > Should I just copy over the required AWS jars to the Hadoop conf > directory > > (2.)? > > Should I be editing run-job.sh or run-class.sh? > > > > Thanks, > > Xiaochuan Yu > > > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University >