Ah... sorry, so stupid. You are right. I've moved %sql select firstField, secondField, count(1) from lines group by firstField, secondField order by firstField, secondField
to the next paragraph and it works! What does it mean? zeppelin-interpreter-spark-spark interpreter complain every second on: WARN [2017-05-02 17:05:42,396] ({dispatcher-event-loop-16} ScriptBasedMapping.java[runResolveCommand]:254) - Exception running /etc/hadoop/conf.cloudera.yarn7/topology.py 17.134.172.218 java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn7/topology.py" (in directory "/"): error=2, No such file or directory does spark really need it? What I'm supposed to do? 2017-05-02 15:16 GMT+02:00 David Howell <david.how...@zipmoney.com.au>: > Hi Serega, > > I see this in the error log “error: ';' expected but ',' found.” > > > > Are you running the %sql in the same paragraph as the %spark? I don’t > think that is supported. I think you have to shift the %sql to a new > paragraph, you can then run the spark and then the sql separately. > > > > *From: *Serega Sheypak <serega.shey...@gmail.com> > *Sent: *Tuesday, 2 May 2017 9:58 PM > *To: *users@zeppelin.apache.org > *Subject: *Can't run simple example with scala and spark SQL. Some non > obvious syntax error in SQL > > > Here is my sample notebook: > %spark > val linesText = sc.textFile("hdfs://cluster/user/me/lines.txt") > > case class Line(id:Long, firstField:String, secondField:String) > > val lines = linesText.map{ line => > val splitted = line.split(" ") > println("splitted => " + splitted) > Line(splitted(0).toLong, splitted(1), splitted(2)) > } > > lines.toDF().registerTempTable("lines") > > %sql select firstField, secondField, count(1) from lines group by > firstField, secondField order by firstField, secondField > > 1. I can see that spark job was started on my YARN cluster. > 2. It failed > UI shows exception. Can't understand what do I do wrong: > %sql select firstField, secondField, count(1) from lines group by > firstField, secondField order by firstField, secondField ^ > 3. There is suspicious output in zeppelin log: > > INFO [2017-05-02 11:50:02,846] ({pool-2-thread-8} > SchedulerFactory.java[jobFinished]:137) > - Job paragraph_1493724118696_868476558 finished by scheduler > org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_ > session1712472970 > ERROR [2017-05-02 11:50:18,809] ({qtp1286783232-166} > NotebookServer.java[onMessage]:380) - Can't handle message > java.lang.NullPointerException > at org.apache.zeppelin.socket.NotebookServer. > addNewParagraphIfLastParagraphIsExecuted(NotebookServer.java:1713) > at org.apache.zeppelin.socket.NotebookServer. > persistAndExecuteSingleParagraph(NotebookServer.java:1741) > at org.apache.zeppelin.socket.NotebookServer.runAllParagraphs( > NotebookServer.java:1641) > at org.apache.zeppelin.socket.NotebookServer.onMessage( > NotebookServer.java:291) > at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText( > NotebookSocket.java:59) > at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver. > onTextMessage(JettyListenerEventDriver.java:128) > at org.eclipse.jetty.websocket.common.message.SimpleTextMessage. > messageComplete(SimpleTextMessage.java:69) > at org.eclipse.jetty.websocket.common.events.AbstractEventDriver. > appendMessage(AbstractEventDriver.java:65) > at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver. > onTextFrame(JettyListenerEventDriver.java:122) > at org.eclipse.jetty.websocket.common.events.AbstractEventDriver. > incomingFrame(AbstractEventDriver.java:161) > at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame( > WebSocketSession.java:309) > at org.eclipse.jetty.websocket.common.extensions. > ExtensionStack.incomingFrame(ExtensionStack.java:214) > at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220) > at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258) > at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection. > readParse(AbstractWebSocketConnection.java:632) > at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection. > onFillable(AbstractWebSocketConnection.java:480) > at org.eclipse.jetty.io.AbstractConnection$2.run( > AbstractConnection.java:544) > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:635) > at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > INFO [2017-05-02 11:50:18,811] ({pool-2-thread-14} > SchedulerFactory.java[jobStarted]:131) > - Job paragraph_1493724118696_868476558 started by scheduler > org.apache.zeppelin.interpreter.remote.RemoteInterpretershared_ > session1712472970 > INFO [2017-05-02 11:50:18,812] ({pool-2-thread-14} > Paragraph.java[jobRun]:363) - run paragraph 20170502-112158_458502255 using > spark org.apache.zeppelin.interpreter.LazyOpenInterpreter@24ca4045 > WARN [2017-05-02 11:50:19,810] ({pool-2-thread-14} > NotebookServer.java[afterStatusChange]:2162) > - Job 20170502-112158_458502255 is finished, status: ERROR, exception: > null, result: %text linesText: org.apache.spark.rdd.RDD[String] = > hdfs://path/to/my/file.txt MapPartitionsRDD[21] at textFile at <console>:27 > defined class Line > lines: org.apache.spark.rdd.RDD[Line] = MapPartitionsRDD[22] at map at > <console>:31 > warning: there was one deprecation warning; re-run with -deprecation for > details > <console>:1: error: ';' expected but ',' found. > %sql select firstField, secondField, count(1) from lines group by > firstField, secondField order by firstField, secondField > > > 4. Interpreter log is also confusing: > INFO [2017-05-02 11:41:52,706] ({pool-2-thread-2} > Logging.scala[logInfo]:54) - Warehouse location for Hive client (version > 1.1.0) is file:/spark-warehouse > INFO [2017-05-02 11:41:52,983] ({pool-2-thread-2} > PerfLogger.java[PerfLogBegin]:122) - <PERFLOG method=create_database > from=org.apache.hadoop.hive.metastore.RetryingHMSHandler> > INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} > HiveMetaStore.java[logInfo]:795) - 0: create_database: > Database(name:default, description:default database, > locationUri:file:/spark-warehouse, parameters:{}) > INFO [2017-05-02 11:41:52,984] ({pool-2-thread-2} > HiveMetaStore.java[logAuditEvent]:388) > - ugi=zblenessy ip=unknown-ip-addr cmd=create_database: > Database(name:default, description:default database, > locationUri:file:/spark-warehouse, parameters:{}) > ERROR [2017-05-02 11:41:52,992] ({pool-2-thread-2} > RetryingHMSHandler.java[invokeInternal]:189) > - AlreadyExistsException(message:Database default already exists) > at org.apache.hadoop.hive.metastore.HiveMetaStore$ > HMSHandler.create_database(HiveMetaStore.java:944) > > What should I do with failed metastore DB creation? Is it fine? > > I'm stuck, can you give me some ideas please? >