Philipp Dallig created ZEPPELIN-5737: ----------------------------------------
Summary: Deadlock during Interpreter Creation Key: ZEPPELIN-5737 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5737 Project: Zeppelin Issue Type: Bug Components: Interpreters Affects Versions: 0.10.1 Reporter: Philipp Dallig I encountered the following deadlock when starting the Python interpreter. Triggering the deadlock is relatively simple. While starting the interpreter simply stop the interpreter via Rest-API. {code} Found one Java-level deadlock: ============================= "Thread-29": waiting to lock monitor 0x00007fde240084d8 (object 0x00000000804c7120, a org.apache.zeppelin.interpreter.LazyOpenInterpreter), which is held by "FIFOScheduler-interpreter_1515166446-Worker-1" "FIFOScheduler-interpreter_1515166446-Worker-1": waiting to lock monitor 0x00007fde20242928 (object 0x00000000804941e0, a org.apache.zeppelin.interpreter.InterpreterGroup), which is held by "pool-3-thread-8" "pool-3-thread-8": waiting to lock monitor 0x00007fde202429d8 (object 0x00000000804c71b8, a org.apache.zeppelin.spark.PySparkInterpreter), which is held by "FIFOScheduler-interpreter_1515166446-Worker-1" Java stack information for the threads listed above: =================================================== "Thread-29": at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:63) - waiting to lock <0x00000000804c7120> (a org.apache.zeppelin.interpreter.LazyOpenInterpreter) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:118) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.lambda$cancel$2(RemoteInterpreterServer.java:950) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$$Lambda$2428/1999550584.run(Unknown Source) at java.lang.Thread.run(Thread.java:748) "FIFOScheduler-interpreter_1515166446-Worker-1": at org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:293) - waiting to lock <0x00000000804941e0> (a org.apache.zeppelin.interpreter.InterpreterGroup) at org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:333) at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:57) - locked <0x00000000804bc9f8> (a org.apache.zeppelin.spark.IPySparkInterpreter) at org.apache.zeppelin.python.PythonInterpreter.open(PythonInterpreter.java:91) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:94) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70) - locked <0x00000000804c71b8> (a org.apache.zeppelin.spark.PySparkInterpreter) - locked <0x00000000804c7120> (a org.apache.zeppelin.interpreter.LazyOpenInterpreter) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:861) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:769) at org.apache.zeppelin.scheduler.Job.run(Job.java:172) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132) at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42) at org.apache.zeppelin.scheduler.FIFOScheduler$$Lambda$268/1225679228.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) "pool-3-thread-8": at org.apache.zeppelin.interpreter.LazyOpenInterpreter.isOpen(LazyOpenInterpreter.java:100) - waiting to lock <0x00000000804c71b8> (a org.apache.zeppelin.spark.PySparkInterpreter) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.close(RemoteInterpreterServer.java:496) - locked <0x00000000804941e0> (a org.apache.zeppelin.interpreter.InterpreterGroup) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1757) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1736) at org.apache.zeppelin.shaded.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.zeppelin.shaded.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.zeppelin.shaded.org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Found 1 deadlock. {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)