Which version of Zeppelin do you use? On Wed, Sep 9, 2015 at 7:29 AM Sajeevan Achuthan < [email protected]> wrote:
> Any help? > > On 9 September 2015 at 00:57, Sajeevan Achuthan < > [email protected]> wrote: > >> >> Similar bug reported before ZEPPELIN-108 >> <https://issues.apache.org/jira/browse/ZEPPELIN-108> >> >> >> On 8 September 2015 at 14:33, Sajeevan Achuthan < >> [email protected]> wrote: >> >>> Hi Todd, >>> Thanks for the quick reply. I tried that option too and I go the >>> error below. Any idea? >>> >>> <console>:102: error: overloaded method constructor StreamingContext >>> with alternatives: (path: String,sparkContext: >>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext >>> <and> (path: String,hadoopConf: >>> org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext >>> <and> (conf: org.apache.spark.SparkConf,batchDuration: >>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext >>> <and> (sparkContext: >>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration: >>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext >>> cannot be applied to >>> (org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext, >>> org.apache.spark.streaming.Duration) val ssc = new StreamingContext(sc, >>> Milliseconds(2000)) >>> >>> On 8 September 2015 at 13:53, Todd Nist <[email protected]> wrote: >>> >>>> You are passing a new SparkConf to the StreamingContext, which will >>>> cause the creation of a new SparkContext: >>>> >>>> *StreamingContext(conf: **SparkConf* >>>> <https://spark.apache.org/docs/1.4.1/api/scala/org/apache/spark/SparkConf.html> >>>> *, batchDuration: **Duration* >>>> <https://spark.apache.org/docs/1.4.1/api/scala/org/apache/spark/streaming/Duration.html> >>>> *)* >>>> >>>> Create a StreamingContext by providing the configuration necessary for >>>> a new SparkContext. >>>> >>>> Is there a reason you can not use the existing SparkContext created by >>>> Zeppelin? Then you can just do something like: >>>> >>>> val ssc = new StreamingContext(sc, Milliseconds( >>>> SparkStreamingBatchInterval)) >>>> >>>> ssc.checkpoint(SparkCheckpointDir) >>>> >>>> ... >>>> >>>> Where "sc" is the Zeppelin provided SparkContext. >>>> >>>> -Todd >>>> >>>> >>>> >>>> On Tue, Sep 8, 2015 at 8:11 AM, Sajeevan Achuthan < >>>> [email protected]> wrote: >>>> >>>>> Hi >>>>> The problem is the Spark is allowing to create two contexts, See the >>>>> log below. Could you please let me know , how to fix this problem? >>>>> >>>>> WARN [2015-09-08 13:09:01,191] ({pool-2-thread-5} >>>>> Logging.scala[logWarning]:92) - Multiple running SparkContexts detected in >>>>> the same JVM! >>>>> org.apache.spark.SparkException: Only one SparkContext may be running >>>>> in this JVM (see SPARK-2243). To ignore this error, set >>>>> spark.driver.allowMultipleContexts = true. The currently running >>>>> SparkContext was created at: >>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:81) >>>>> >>>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:301) >>>>> >>>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146) >>>>> >>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:423) >>>>> >>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73) >>>>> >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) >>>>> >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) >>>>> >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:277) >>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>> >>>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>> >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>>>> >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>> >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>> java.lang.Thread.run(Thread.java:745) >>>>> at >>>>> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2083) >>>>> at >>>>> org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2065) >>>>> at scala.Option.foreach(Option.scala:236) >>>>> at >>>>> org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2065) >>>>> at >>>>> org.apache.spark.SparkContext$.setActiveContext(SparkContext.scala:2151) >>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:2023) >>>>> at >>>>> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:834) >>>>> at >>>>> org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:80) >>>>> at >>>>> $line58.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45) >>>>> at >>>>> $line58.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50) >>>>> at >>>>> $line58.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52) >>>>> at >>>>> $line58.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54) >>>>> at >>>>> $line58.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56) >>>>> at >>>>> $line58.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58) >>>>> at $line58.$ >>>>> /Saj >>>>> >>>>> On 8 September 2015 at 12:25, Todd Nist <[email protected]> wrote: >>>>> >>>>>> I do not see that your importing the following: >>>>>> >>>>>> import org.apache.spark.sql._ >>>>>> >>>>>> Which I believe is where you will find the DataFrame.toDF function. >>>>>> >>>>>> HTH. >>>>>> >>>>>> -Todd >>>>>> >>>>>> On Mon, Sep 7, 2015 at 5:49 PM, Sajeevan Achuthan < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Moon, >>>>>>> Thanks for the reply, I tried that option too. Unfortunately, I >>>>>>> tried that option too and I got the error >>>>>>> data: org.apache.spark.streaming.dstream.DStream[CELL_KPIS] = >>>>>>> org.apache.spark.streaming.dstream.MappedDStream@5f3ea8bb >>>>>>> <console>:49: error: value toDF is not a member of >>>>>>> org.apache.spark.rdd.RDD[CELL_KPIS] >>>>>>> accessLogs.toDF.registerTempTable("RAS") ^ >>>>>>> Any idea? >>>>>>> >>>>>>> On 7 September 2015 at 17:30, moon soo Lee <[email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I think you will need to convert RDD to data frame using .toDF(), >>>>>>>> like >>>>>>>> accessLogs.toDF.registerTempTable("RAS") >>>>>>>> >>>>>>>> Thanks, >>>>>>>> moon >>>>>>>> >>>>>>>> On Mon, Sep 7, 2015 at 3:34 AM Sajeevan Achuthan < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Zeppelin, an excellent tool. I am trying to implement a streaming >>>>>>>>> application. I get an error while deploying my application. See my >>>>>>>>> code >>>>>>>>> below >>>>>>>>> >>>>>>>>> >>>>>>>>> import org.apache.spark.SparkContext >>>>>>>>> import org.apache.spark.SparkContext._ >>>>>>>>> import org.apache.spark.SparkConf >>>>>>>>> import org.apache.spark.streaming.StreamingContext >>>>>>>>> import org.apache.spark.streaming.Seconds >>>>>>>>> import org.apache.spark.sql.SQLContext >>>>>>>>> val sparkConf = new >>>>>>>>> SparkConf().setAppName("PEPA").setMaster("local[*]").set("spark.driver.allowMultipleContexts", >>>>>>>>> "true") >>>>>>>>> >>>>>>>>> import org.apache.spark.streaming.kafka._ >>>>>>>>> val ssc = new StreamingContext(sparkConf, Seconds(2)) >>>>>>>>> >>>>>>>>> ssc.checkpoint("checkpoint") >>>>>>>>> val topicMap = Map("incoming"->1) >>>>>>>>> >>>>>>>>> val record = KafkaUtils.createStream(ssc, "localhost", >>>>>>>>> "1", topicMap).map(_._2) >>>>>>>>> record.print() >>>>>>>>> case class >>>>>>>>> CELL_KPIS(ECELL_Name:String,CGI:String,Number_of_Times_Interf:Double,TAOF:Double,PHL:Double,NPCCHL:Double,LRSRP:Double,NC:Double) >>>>>>>>> val data = >>>>>>>>> record.map(s=>s.split(",")).filter(s=>s(0)!="\"ECELL_Name\"").map( >>>>>>>>> s=>CELL_KPIS(s(0), s(1), s(2).toDouble, s(3).toDouble, >>>>>>>>> s(5).toDouble,s(6).toDouble, s(7).toDouble, s(8).toDouble) >>>>>>>>> ) >>>>>>>>> data.foreachRDD {accessLogs => >>>>>>>>> import sqlContext.implicits._ >>>>>>>>> accessLogs.registerTempTable("RAS") >>>>>>>>> >>>>>>>>> } >>>>>>>>> ssc.start() >>>>>>>>> ssc.awaitTermination() >>>>>>>>> >>>>>>>>> And I get error >>>>>>>>> import org.apache.spark.SparkContext import >>>>>>>>> org.apache.spark.SparkContext._ import org.apache.spark.SparkConf >>>>>>>>> import >>>>>>>>> org.apache.spark.streaming.StreamingContext import >>>>>>>>> org.apache.spark.streaming.Seconds import >>>>>>>>> org.apache.spark.sql.SQLContext >>>>>>>>> sparkConf: org.apache.spark.SparkConf = >>>>>>>>> org.apache.spark.SparkConf@2e5779a >>>>>>>>> import org.apache.spark.streaming.kafka._ ssc: >>>>>>>>> org.apache.spark.streaming.StreamingContext = >>>>>>>>> org.apache.spark.streaming.StreamingContext@48621ee1 topicMap: >>>>>>>>> scala.collection.immutable.Map[String,Int] = Map(incoming -> 1) >>>>>>>>> record: >>>>>>>>> org.apache.spark.streaming.dstream.DStream[String] = >>>>>>>>> org.apache.spark.streaming.dstream.MappedDStream@6290e75e defined >>>>>>>>> class CELL_KPIS data: >>>>>>>>> org.apache.spark.streaming.dstream.DStream[CELL_KPIS] >>>>>>>>> = org.apache.spark.streaming.dstream.MappedDStream@4bda38c3 >>>>>>>>> >>>>>>>>> <console>:55: error: value registerTempTable is not a member of >>>>>>>>> org.apache.spark.rdd.RDD[CELL_KPIS] >>>>>>>>> accessLogs.registerTempTable("RAS") >>>>>>>>> >>>>>>>>> *My configuration for Zeppelin* >>>>>>>>> >>>>>>>>> >>>>>>>>> export MASTER=spark://localhost:7077 >>>>>>>>> export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_05 >>>>>>>>> export ZEPPELIN_PORT=9090 >>>>>>>>> export ZEPPELIN_SPARK_CONCURRENTSQL=false >>>>>>>>> export ZEPPELIN_SPARK_USEHIVECONTEXT=false >>>>>>>>> #'export MASTER=local[*] >>>>>>>>> export SPARK_HOME=/home/anauser/spark-1.3/spark-1.3.0-bin-cdh4 >>>>>>>>> >>>>>>>>> *Interpreter configuration for spark * >>>>>>>>> >>>>>>>>> "2AW247KM7": { "id": "2AW247KM7", "name": "spark", "group": >>>>>>>>> "spark", "properties": { "spark.cores.max": "", "spark.yarn.jar": "", >>>>>>>>> "master": "local[*]", "zeppelin.spark.maxResult": "1000", >>>>>>>>> "zeppelin.dep.localrepo": "local-repo", "spark.app.name": "APP3", >>>>>>>>> "spark.executor.memory": "5G", "zeppelin.spark.useHiveContext": >>>>>>>>> "false", >>>>>>>>> "spark.driver.allowMultipleContexts": "true", "args": "", >>>>>>>>> "spark.home": >>>>>>>>> "/home/anauser/spark-1.3/spark-1.3.0-bin-cdh4", >>>>>>>>> "zeppelin.spark.concurrentSQL": "true", "zeppelin.pyspark.python": >>>>>>>>> "python" >>>>>>>>> }, "interpreterGroup": [ { "class": >>>>>>>>> "org.apache.zeppelin.spark.SparkInterpreter", "name": "spark" }, { >>>>>>>>> "class": >>>>>>>>> "org.apache.zeppelin.spark.PySparkInterpreter", "name": "pyspark" }, { >>>>>>>>> "class": "org.apache.zeppelin.spark.SparkSqlInterpreter", "name": >>>>>>>>> "sql" }, >>>>>>>>> { "class": "org.apache.zeppelin.spark.DepInterpreter", "name": "dep" >>>>>>>>> } ], >>>>>>>>> "option": { "remote": true } } >>>>>>>>> Is there any problem in my code or setup ? >>>>>>>>> Any help very much appreciated. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
