Re: Works with local mode but not cluster mode (Cloudera 5.5 Spark)

tsh Wed, 09 Dec 2015 01:14:29 -0800

I had the similar problem. And I spent few days before I resolved it.


The problem is not in Thrift, Zeppelin, etc.:

some libraries / dependencies that are present on your local machine(somehow found on your classpath) are missing on Master Server. In mycase, it was Fasterxml Jackson libraries. On my local machine, there wasa clash between Jackson 2.5.3 (Zeppelin uses it?) and Jackson 2.3.1 or2.2.1 (Spark uses it?). So, I removed Jackson 2.5.3 from Zeppelin libfolder. My local Zeppelin worked perfectly.Then, I copied Zeppelin installation to cluster server and got thiserror. When I returned back Jackson library - everything works.So, some serializing / deserializing library that works with xml / jsoncan't be found by Zeppelin on server (check permissions either).




On 12/09/2015 04:09 AM, Hoc Phan wrote:

Hi all
I am using Cloudera 5.5 Express with Spark 1.5 installed across thecluster. I have tested Pyspark in command line and it works. So mycluster is fineHowever when I use Zeppelin with Spark cluster, I got error below justdoing simple thing like:
%pyspark
print "abcd"

*_Error:_*
org.apache.thrift.transport.TTransportException atorg.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)atorg.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)atorg.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)atorg.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)atorg.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)atorg.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:220)atorg.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:205)atorg.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:211)atorg.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:207)at org.apache.zeppelin.scheduler.Job.run(Job.java:170) atorg.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:304)atjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) atjava.util.concurrent.FutureTask.run(FutureTask.java:262) atjava.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)atjava.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)
If I set local[*], it is fine. If I set masteras spark://cdhe1master.fbdl.local:7077, it gave error above
I checked my master hostname and port, all are correct and working
I followed instructions herehttps://zeppelin.incubator.apache.org/docs/0.5.5-incubating/interpreter/spark.html andhave SPARK_HOME=/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark
Any idea?

Re: Works with local mode but not cluster mode (Cloudera 5.5 Spark)

Reply via email to