Tim Gautier created ZEPPELIN-3727:
-------------------------------------

             Summary: Spark commands execute correctly, but log extreme number 
of errors
                 Key: ZEPPELIN-3727
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3727
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.7.3
            Reporter: Tim Gautier


I'm running EMR 5.16.0 on AWS. If I try to run any Spark SQL queries against my 
RDBMS using the Scala interpreter, they seem to execute just fine, however the 
log file fills with this exception over and over again:
{noformat}
ERROR [2018-08-16 22:04:36,601] ({pool-2-thread-2} 
SparkInterpreter.java[getProgressFromStage_1_1x]:1503) - Error on getting 
progress information 
java.lang.NoSuchMethodException: 
org.apache.zeppelin.spark.SparkInterpreter$1.stageIdToData() 
       at java.lang.Class.getMethod(Class.java:1786) 
       at 
org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1487)
 
       at 
org.apache.zeppelin.spark.SparkInterpreter.getProgressFromStage_1_1x(SparkInterpreter.java:1510)
 
       at 
org.apache.zeppelin.spark.SparkInterpreter.getProgress(SparkInterpreter.java:1430)
 
       at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:117)
 
       at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:555)
 
       at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1762)
 
       at 
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:1747)
 
       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
       at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
 
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
       at java.lang.Thread.run(Thread.java:748)
{noformat}
This simple code will trigger it (hitting my own database), though I'm not 
convinced it has anything to do with Spark SQL, but instead with long running 
commands.
{code:java}
import org.apache.spark.sql._

val dbConnectionMap = Map(
"url" -> "<redacted>",
"driver" -> "com.mysql.jdbc.Driver"
)

val sql = """(select item_name from product_catalog) as product_catalog"""
val products = spark.read.format("jdbc").options(dbConnectionMap + ("dbtable" 
-> sql)).load.cache

products.count
{code}
This wouldn't be a big concern since the execution works, except that after a 
couple hours of analyzing data, I started getting file system errors. It turned 
out to be caused by the log file taking up all the hard drive space, 33GB!

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to