Thank you! That definitely did the trick.
I was trying to use ZEPPELIN_CLASSPATH_OVERRIDES to load the jars and couldn’t
figure out why it wasn’t working. Also
mode.asInstanceOf[Hdfs].conf.get("tmpjars").split(",").foreach(println) was
exactly the command that I was looking for to diagnose this problem. Switching
over to loading jars through the args.string got everything working.
On Thu, Dec 22, 2016 at 3:37 PM Prasad Wagle
<
mailto:Prasad Wagle <[email protected]>
> wrote:
a, pre, code, a:link, body { word-wrap: break-word !important; }
Hi Paul,
It looks like the cascading jars are not distributed to the YARN cluster. Can
you please try adding "zeppelin/interpreter/
scalding/*" to the args.string property of the scalding interpreter?
Here's the args.string we use:
-libjars /home/zeppelin-user/zeppelin/
interpreter/scalding/*,/home/
zeppelin-user/deploy-bundle-
201608111417/libs/* -Dscalding.reducer.estimator.
classes=com.twitter.scalding.
reducer_estimation.
InputSizeReducerEstimator -Delephantbird.use.combine.
input.format=true -Delephantbird.combine.split.
size=134217728 --hdfs --repl
tmpjars contains jars that are distributed to the YARN cluster. You can see its
contents with the command below:
%scalding
mode.asInstanceOf[Hdfs].conf.get("tmpjars").split(",").foreach(println)
Thanks,
Prasad
On Thu, Dec 22, 2016 at 9:31 AM, Paul Brenner
<
mailto:[email protected]
>
wrote:
I'm trying to get Scalding working on Zeppelin while using YARN. I followed the
steps in the docs
https://share.polymail.io/v1/z/b/NTg1YzBkOTc5ZTkx/2oW5SQjbADW8zb9nS3JO5g421bQMXDLTC0FeCJ_WR7eecFsW9CWa-tzokB9aSLwG5t9yQ9B6QpcS8AmXjjFFxJ31Thy9lN7HSvilaEeoI6Az7C53CrnFmUoMnta-EYrRI5uEQhbztPSzTrQle-3E00nNiVc7M6poouix37ZlX2VacVqONwmxpu6FSMs2x-_t20QRzFz8S7lneRPUBtpzIyxBRLcRL4CMf1AeMxQIVl3FkoStgA==
to build the interpreter and set up the classpath override. When I run in
local mode, code executes properly. However when I run on my cluster via YARN
my jobs fail with:
Error: java.lang.
ClassNotFoundException: cascading.CascadingException
or
Error: java.lang.
ClassNotFoundException: cascading.tuple.TupleException
What is even stranger to me is that I can go into Zeppelin and execute:
import cascading.tuple.TupleException import cascading.CascadingException
And both appear to have no problem finding those classes. It is only when I try
to actually use scalding (on YARN), like loading data into a typed pipe and
dumping that I get the ClassNotFoundException. Any ideas on how to debug or
what to fix?
(I already posted this on Stack Overflow with no luck:
https://share.polymail.io/v1/z/b/NTg1YzBkOTc5ZTkx/2oW5SQjbADW8zb9nS3JO5g421bQMXDLTC0FeCJ_WR7eecFsW9CWa-tzokB9aSLwG5t9yQ9B6QpcS8AmXjjFFxJ31Thy9lN7HSvilaEeoI6Az7C53CrnFmUoMnta-EYrRI5uEQhbztPSzB6ElJ-PAwFLHk1sne6d3tKW61fUlXHcQZkGEK0jyp-yaS91unqy7hlcPyFvqV_lgfB3NDNd4PGFMQK4UWIUMywUEsXGgXRQGgho1Hlj-0iyZtbcZrE3vMNtVzjO8HWJpq45DjiV-N210k6sXeYh5YrKTEiEpER4B
)