Re: Deploying Samza Jobs Using S3 and YARN on AWS

XiaoChuan Yu Sat, 23 Sep 2017 15:24:42 -0700

I found out that it was necessary to include "hadoop-aws" as a part of the
package submitted to YARN similar to the instructions for deploying from
HDFS
<https://samza.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html>
.
However, due to a dependency conflict on the AWS SDK between our code and
"hadoop-aws", we can't actually include it.
We are now planning to make use of HTTP FS instead.


On Fri, Sep 15, 2017 at 2:45 PM Jagadish Venkatraman <jagadish1...@gmail.com>
wrote:

> Thank you Xiaochuan for your question!
>
> You should ensure that *every machine in your cluster* has the S3 jar file
> in its YARN class-path. From your error, it looks like the machine you are
> running on does not have the JAR file corresponding to *S3AFileSystem*.
>
> >> Whats the right way to set this up? Should I just copy over the required
> AWS jars to the Hadoop conf directory
>
> I'd lean on the side of simplicity and the *scp* route seems to address
> most of your needs.
>
> >> Should I be editing run-job.sh or run-class.sh?
>
> You should not have to edit any of these files. Once you fix your
> class-paths by copying those relevant JARs, it should just work.
>
> Please let us know if you need more assistance.
>
> --
> Jagdish
>
>
> On Fri, Sep 15, 2017 at 11:07 AM, XiaoChuan Yu <xiaochuan...@kik.com>
> wrote:
>
> > Hi,
> >
> > I'm trying to deploy a Samza job using YARN and S3 where I upload the zip
> > package to S3 and point yarn.package.path to it.
> > Does anyone know what kind of set up steps is required for this?
> >
> > What I've tried so far is to get Hello Samza to be run this way in AWS.
> >
> > However I ran into the following exception:
> > Exception in thread "main" java.lang.RuntimeException:
> > java.lang.ClassNotFoundException: Class
> > org.apache.hadoop.fs.s3a.S3AFileSystem not found
> > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
> > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
> > FileSystem.java:2578)
> > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
> > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
> > ...
> >
> > Running "$YARN_HOME/bin/yarn classpath" gives the following:
> > /home/ec2-user/deploy/yarn/etc/hadoop
> > /home/ec2-user/deploy/yarn/etc/hadoop
> > /home/ec2-user/deploy/yarn/etc/hadoop
> > /home/ec2-user/deploy/yarn/share/hadoop/common/lib/*
> > /home/ec2-user/deploy/yarn/share/hadoop/common/*
> > /home/ec2-user/deploy/yarn/share/hadoop/hdfs
> > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/lib/*
> > /home/ec2-user/deploy/yarn/share/hadoop/hdfs/*
> > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
> > /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/lib/*
> > /home/ec2-user/deploy/yarn/share/hadoop/mapreduce/*
> > /contrib/capacity-scheduler/*.jar
> > /home/ec2-user/deploy/yarn/share/hadoop/yarn/*
> > /home/ec2-user/deploy/yarn/share/hadoop/yarn/lib/*
> >
> > I manually copied the required AWS related jars to
> > /home/ec2-user/deploy/yarn/share/hadoop/common.
> > I checked that it is loadable by running "yarn
> > org.apache.hadoop.fs.s3a.S3AFileSystem" which gives the "Main method not
> > found" error instead of class not found.
> >
> > From the console output of run-job.sh I see the following in class path:
> > 1. All jars under the lib directory of the zip package
> > 2. /home/ec2-user/deploy/yarn/etc/hadoop (Hadoop conf directory)
> >
> > The class path from run-job.sh seem to be missing the AWS related jars
> > required for S3AFileSystem.
> > Whats the right way to set this up?
> > Should I just copy over the required AWS jars to the Hadoop conf
> directory
> > (2.)?
> > Should I be editing run-job.sh or run-class.sh?
> >
> > Thanks,
> > Xiaochuan Yu
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Re: Deploying Samza Jobs Using S3 and YARN on AWS

Reply via email to