Yeah it does look like the guava issue. That thread is what what has me currently adding com.datastax.spark:spark-cassandra-connector_2.10:1.6.0-M2 and com.google.guava:guava:16.0.1 to Dependencies in the interpreter configuration under spark. No dice though. still throw the error
On Tue, Apr 19, 2016 at 4:33 PM, Sanne de Roever <sanne.de.roe...@gmail.com> wrote: > There might be some usefull information in the thread Guava 16.0 Cassandra > Error using Zeppelin 0.60/Spark 1.6.1/Cassandra 3.4 > > On Tue, Apr 19, 2016 at 4:12 PM, George Webster <webste...@gmail.com> wrote: >> >> Hey guys, >> >> I am trying to get zeppelin to work with my HDP cluster with spark, >> cassandra, and yarn. Unfortunately, for the last three days I have >> been trying multiple compilation options, configuration settings, >> adjusting pom.xml, etc. I just cannot seem to get it working. Have any >> of you been able to get Zeppelin to work with yarn/spark/cassandra? >> Any help would be very much appreciated. >> >> >> install command: >> mvn clean package -Pbuild-distr -Pcassandra-spark-1.5 >> -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests >> >> Environment setup: >> hdp-select status hadoop-client | sed 's/hadoop-client - \(.*\)/\1/' = >> 2.4.0.0-169 >> hadoop version = Hadoop 2.7.1.2.4.0.0-169 >> spark = 1.6.0 >> HDP has 9 nodes. I am only configuring the master node(which hosts >> zeppelin) and not using the ambari zeppelin service. >> >> config: >> zeppelin-env.sh >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> export SPARK_HOME=/usr/hdp/current/spark-client >> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.4.0.0-169" >> >> interpreter config: >> "id": "2BGDFG2NR", >> "name": "spark", >> "group": "spark", >> "properties": { >> "spark.executor.memory": "", >> "args": "", >> "zeppelin.spark.printREPLOutput": "true", >> "spark.cores.max": "", >> "zeppelin.dep.additionalRemoteRepository": >> "spark-packages,http://dl.bintray.com/spark-packages/maven,false;", >> "zeppelin.spark.sql.stacktrace": "false", >> "zeppelin.spark.concurrentSQL": "false", >> "zeppelin.spark.useHiveContext": "true", >> "zeppelin.pyspark.python": "python", >> "zeppelin.dep.localrepo": "local-repo", >> "spark.cassandra.auth.password": "[password]", >> "zeppelin.interpreter.localRepo": >> "/home/zeppelin/zeppelin_live/local-repo/2BGDFG2NR", >> "spark.cassandra.connection.host": "10.0.4.80", >> "spark.yarn.am.extraJavaOptions": >> "-Dhdp.version\u003d2.4.0.0-169", >> "zeppelin.spark.maxResult": "1000", >> "master": "yarn-client", >> "spark.app.name": "Zeppelin", >> "spark.cassandra.auth.username": "zeppelin", >> "spark.driver.extraJavaOptions": "-Dhdp.version\u003d2.4.0.0-169" >> }, >> "interpreterGroup": [ >> { >> "class": "org.apache.zeppelin.spark.SparkInterpreter", >> "name": "spark" >> }, >> { >> "class": "org.apache.zeppelin.spark.PySparkInterpreter", >> "name": "pyspark" >> }, >> { >> "class": "org.apache.zeppelin.spark.SparkSqlInterpreter", >> "name": "sql" >> }, >> { >> "class": "org.apache.zeppelin.spark.DepInterpreter", >> "name": "dep" >> } >> ], >> "dependencies": [ >> { >> "groupArtifactVersion": >> "com.datastax.spark:spark-cassandra-connector_2.10:1.6.0-M2", >> "local": false >> }, >> { >> "groupArtifactVersion": "com.google.guava:guava:16.0.1", >> "local": false >> } >> ], >> "option": { >> "remote": true, >> "perNoteSession": false >> } >> }, >> >> Notebook: >> I have the following paragraphs: >> 1) import com.datastax.spark.connector._ >> ----result---- >> import com.datastax.spark.connector._ >> >> 2) val rdd = sc.cassandraTable("testing", "results") >> ----result---- >> rdd: >> com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] >> = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15 >> >> 3) rdd.count >> ----result---- >> org.apache.spark.SparkException: Job aborted due to stage failure: >> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 >> in stage 1.0 (TID 21, vm-10-155-208-69.cloud.mwn.de): >> java.io.IOException: Failed to open native connection to Cassandra at >> {10.0.4.80, 10.0.4.81, 10.0.4.82}:9042 >> at >> com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:162) >> at >> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) >> at >> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) >> at >> com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) >> at >> com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) >> at >> com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) >> at >> com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:319) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.NoSuchMethodError: >> >> com.google.common.util.concurrent.Futures.withFallback(Lcom/google/common/util/concurrent/ListenableFuture;Lcom/google/common/util/concurrent/FutureFallback;Ljava/util/concurrent/Executor;)Lcom/google/common/util/concurrent/ListenableFuture; >> at com.datastax.driver.core.Connection.initAsync(Connection.java:177) >> at com.datastax.driver.core.Connection$Factory.open(Connection.java:731) >> at >> com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:251) >> at >> com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:199) >> at >> com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77) >> at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1414) >> at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393) >> at >> com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:155) >> >> looking at the logs: >> INFO [2016-04-19 15:39:34,612] ({dispatcher-event-loop-5} >> Logging.scala[logInfo]:58) - Starting task 1.1 in stage 0.0 (TID 3, >> [master], partition 1,RACK_LOCAL, 4863 bytes) >> INFO [2016-04-19 15:39:34,616] ({task-result-getter-1} >> Logging.scala[logInfo]:58) - Lost task 0.0 in stage 0.0 (TID 0) on >> executor [master]: java.io.IOException (Failed to open native >> connection to Cassandra at {10.0.4.80, 10.0.4.81, 10.0.4.82}:9042) >> [duplicate 1] >> INFO [2016-04-19 15:39:36,279] ({dispatcher-event-loop-3} >> Logging.scala[logInfo]:58) - Starting task 0.1 in stage 0.0 (TID 4, >> [worker 1], partition 0,RACK_LOCAL, 4863 bytes) >> INFO [2016-04-19 15:39:36,281] ({task-result-getter-2} >> Logging.scala[logInfo]:58) - Lost task 2.0 in stage 0.0 (TID 2) on >> executor [worker 1]: java.io.IOException (Failed to open native >> connection to Cassandra at {10.0.4.80, 10.0.4.81, 10.0.4.82}:9042) >> [duplicate 2] >> INFO [2016-04-19 15:39:36,862] ({dispatcher-event-lop-2} >> Logging.scala[logInfo]:58) - Starting task 2.1 in stage 0.0 (TID 5, >> [master], partition 2,RACK_LOCAL, 4863 bytes) >> INFO [2016-04-19 15:39:36,863] ({task-result-getter-3} >> Logging.scala[logInfo]:58) - Lost task 1.1 in stage 0.0 (TID 3) on >> executor [master]: java.io.IOException (Failed to open native >> connection to Cassandra at {10.0.4.80, 10.0.4.81, 10.0.4.82}:9042) >> [duplicate 3] >> >> and it just keeps going > >