I had the similar problem. And I spent few days before I resolved it.
The problem is not in Thrift, Zeppelin, etc.:
some libraries / dependencies that are present on your local machine
(somehow found on your classpath) are missing on Master Server. In my
case, it was Fasterxml Jackson libraries. On my local machine, there was
a clash between Jackson 2.5.3 (Zeppelin uses it?) and Jackson 2.3.1 or
2.2.1 (Spark uses it?). So, I removed Jackson 2.5.3 from Zeppelin lib
folder. My local Zeppelin worked perfectly.
Then, I copied Zeppelin installation to cluster server and got this
error. When I returned back Jackson library - everything works.
So, some serializing / deserializing library that works with xml / json
can't be found by Zeppelin on server (check permissions either).
On 12/09/2015 04:09 AM, Hoc Phan wrote:
Hi all
I am using Cloudera 5.5 Express with Spark 1.5 installed across the
cluster. I have tested Pyspark in command line and it works. So my
cluster is fine
However when I use Zeppelin with Spark cluster, I got error below just
doing simple thing like:
%pyspark
print "abcd"
*_Error:_*
org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:220)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:205)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:211)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:207)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:304)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at
java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
If I set local[*], it is fine. If I set master
as spark://cdhe1master.fbdl.local:7077, it gave error above
I checked my master hostname and port, all are correct and working
I followed instructions here
https://zeppelin.incubator.apache.org/docs/0.5.5-incubating/interpreter/spark.html and
have SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark
Any idea?