I add the jar by editing the Spark interpreter on the interpreters page and adding the path to the jar at the bottom. I am not familiar with the spark.jars method. Is there a guide for that somewhere? Could that cause the difference between spark.useNew being set to true versus false?
On Thu, May 23, 2019 at 9:16 PM Jeff Zhang <zjf...@gmail.com> wrote: > >>> adding a Geomesa-Accumulo-Spark jar to the Spark interpreter. > > How do you add jar to spark interpreter ? It is encouraged to add jar via > spark.jars > > > Krentz <cpkre...@gmail.com> 于2019年5月24日周五 上午4:53写道: > >> Hello - I am looking for insight into an issue I have been having with >> our Zeppelin cluster for a while. We are adding a Geomesa-Accumulo-Spark >> jar to the Spark interpreter. The notebook paragraphs run fine until we try >> to access the data, at which point we get an "Unread Block Data" error from >> the Spark process. However, this error only occurs when the interpreter >> setting "zeppelin.spark.useNew" is set to true. If this parameter is set to >> false, the paragraph works just fine. Here is a paragraph that fails: >> >> %sql >> select linktype,count(linktype) from linkageview group by linktype >> >> The error we get as a result is this: >> java.lang.IllegalStateException: unread block data >> at >> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2783) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1605) >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287) >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211) >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069) >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573) >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:258) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> >> If I drill down and inspect the Spark job itself, I get an error saying >> "readObject can't find class >> org.apache.accumulo.core.client.mapreduce.impl.BatchInputSplit." The full >> stack trace is attached. We dug into and opened up the __spark_conf and >> __spark_libs files associated with the Spark job (under >> /user/root/.sparkStaging/application_<pid>/ but they did not contain the >> jar file containing this method. However, it was not present in both the >> spark.useNew true version false version. >> >> Basically I am just trying to figure out why the spark.useNew option >> would cause the error to happen when it works fine turned off. We can move >> forward with it turned off for now, but I would like to get to the bottom >> of this issue in case there is something deeper going wrong. >> >> Thanks so much, >> Chris Krentz >> >> >> > > -- > Best Regards > > Jeff Zhang >