Re: Spark job fails when zeppelin.spark.useNew is true

James Srinivasan Thu, 23 May 2019 14:06:18 -0700

We use Geomesa on Accumulo with Spark and Zeppelin on a Kerberized cluster
(hdp3). We've had a number of issues, but that one doesn't look familiar.
>From memory, we had to:


Build geomesa spark with Accumulo version to match our cluster, and
libthrift to match Accumulo, and another version change I forget right now.

Build zeppelin 0.8.1 with libthrift to match hive (different to Accumulo)
(or use hwx zeppelin)

Sometimes specify geomesa jar in the config file SPARK_SUBMIT_OPTIONS
rather than in the interpreter config. This was definitely an issue <0.8

On Thu, 23 May 2019, 21:53 Krentz, <cpkre...@gmail.com> wrote:

> Hello - I am looking for insight into an issue I have been having with our
> Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark jar to
> the Spark interpreter. The notebook paragraphs run fine until we try to
> access the data, at which point we get an "Unread Block Data" error from
> the Spark process. However, this error only occurs when the interpreter
> setting "zeppelin.spark.useNew" is set to true. If this parameter is set to
> false, the paragraph works just fine. Here is a paragraph that fails:
>
> %sql
> select linktype,count(linktype) from linkageview group by linktype
>
> The error we get as a result is this:
> java.lang.IllegalStateException: unread block data
> at
> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> If I drill down and inspect the Spark job itself, I get an error saying
> "readObject can't find class
> org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full
> stack trace is attached. We dug into and opened up the __spark_conf and
> __spark_libs files associated with the Spark job (under
> /user/root/.sparkStaging/application_<pid>/ but they did not contain the
> jar  file containing this method. However, it was not present in both the
> spark.useNew true version false version.
>
> Basically I am just trying to figure out why the spark.useNew option would
> cause the error to happen when it works fine turned off. We can move
> forward with it turned off for now, but I would like to get to the bottom
> of this issue in case there is something deeper going wrong.
>
> Thanks so much,
> Chris Krentz
>
>
>

Re: Spark job fails when zeppelin.spark.useNew is true

Reply via email to