Deploying Samza Jobs Using S3 and YARN on AWS

XiaoChuan Yu Fri, 15 Sep 2017 11:08:01 -0700

Hi,

I'm trying to deploy a Samza job using YARN and S3 where I upload the zip
package to S3 and point yarn.package.path to it.
Does anyone know what kind of set up steps is required for this?


What I've tried so far is to get Hello Samza to be run this way in AWS.

However I ran into the following exception:
Exception in thread "main" java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
...

Running "$YARN_HOME/bin/yarn classpath" gives the following:
/home/ec2-user/deploy/yarn/etc/hadoop
/home/ec2-user/deploy/yarn/etc/hadoop
/home/ec2-user/deploy/yarn/etc/hadoop
/home/ec2-user/deploy/yarn/share/hadoop/common/lib/*
/home/ec2-user/deploy/yarn/share/hadoop/common/*
/home/ec2-user/deploy/yarn/share/hadoop/hdfs
/home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/*
/home/ec2-user/deploy/yarn/share/hadoop/hdfs/*
/home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
/home/ec2-user/deploy/yarn/share/hadoop/yarn/*
/home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/*
/home/ec2-user/deploy/yarn/share/hadoop/mapreduce/*
/contrib/capacity-scheduler/*.jar
/home/ec2-user/deploy/yarn/share/hadoop/yarn/*
/home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*

I manually copied the required AWS related jars to
/home/ec2-user/deploy/yarn/share/hadoop/common.
I checked that it is loadable by running "yarn
org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not
found" error instead of class not found.

>From the console output of run-job.sh I see the following in class path:
1. All jars under the lib directory of the zip package
2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory)

The class path from run-job.sh seem to be missing the AWS related jars
required for S3AFileSystem.
Whats the right way to set this up?
Should I just copy over the required AWS jars to the Hadoop conf directory
(2.)?
Should I be editing run-job.sh or run-class.sh?

Thanks,
Xiaochuan Yu

Deploying Samza Jobs Using S3 and YARN on AWS

Reply via email to