Hi, I'm trying to deploy a Samza job using YARN and S3 where I upload the zip package to S3 and point yarn.package.path to it. Does anyone know what kind of set up steps is required for this?
What I've tried so far is to get Hello Samza to be run this way in AWS. However I ran into the following exception: Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) ... Running "$YARN_HOME/bin/yarn classpath" gives the following: /home/ec2-user/deploy/yarn/etc/hadoop /home/ec2-user/deploy/yarn/etc/hadoop /home/ec2-user/deploy/yarn/etc/hadoop /home/ec2-user/deploy/yarn/share/hadoop/common/lib/* /home/ec2-user/deploy/yarn/share/hadoop/common/* /home/ec2-user/deploy/yarn/share/hadoop/hdfs /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/* /home/ec2-user/deploy/yarn/share/hadoop/hdfs/* /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* /home/ec2-user/deploy/yarn/share/hadoop/yarn/* /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/* /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/* /contrib/capacity-scheduler/*.jar /home/ec2-user/deploy/yarn/share/hadoop/yarn/* /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/* I manually copied the required AWS related jars to /home/ec2-user/deploy/yarn/share/hadoop/common. I checked that it is loadable by running "yarn org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not found" error instead of class not found. >From the console output of run-job.sh I see the following in class path: 1. All jars under the lib directory of the zip package 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory) The class path from run-job.sh seem to be missing the AWS related jars required for S3AFileSystem. Whats the right way to set this up? Should I just copy over the required AWS jars to the Hadoop conf directory (2.)? Should I be editing run-job.sh or run-class.sh? Thanks, Xiaochuan Yu