Hi Jongyoul, I ran experiments below and found that we need multiple connections for parallel execution in Vertica.
*Same Connection* scala> val connection = DriverManager.getConnection(url, username, password) connection: java.sql.Connection = com.vertica.jdbc.VerticaConnectionImpl@5d56ea0f scala> val thread1 = new Thread { | override def run { | val statement = connection.createStatement() | println("Thread1: before execute at " + Calendar.getInstance().getTime()) | val resultSet = statement.executeQuery("SELECT sleep(10) as col1, 1 as col2") | resultSet.next() | val col1 = resultSet.getString("col1") | val col2 = resultSet.getString("col2") | resultSet.next() | println("Thread1: col " + col1 + " " + col2 + " at " + Calendar.getInstance().getTime()) | } | } thread1: Thread = Thread[Thread-3,5,main] scala> val thread2 = new Thread { | override def run { | val statement = connection.createStatement() | println("Thread2: before execute at " + Calendar.getInstance().getTime()) | val resultSet = statement.executeQuery("SELECT sleep(10) as col1, 2 as col2") | resultSet.next() | val col1 = resultSet.getString("col1") | val col2 = resultSet.getString("col2") | resultSet.next() | println("Thread2: col " + col1 + " " + col2 + " at " + Calendar.getInstance().getTime()) | } | } thread2: Thread = Thread[Thread-4,5,main] scala> thread1.start scala> thread2.start Thread1: before execute at Sat Dec 05 12:38:59 PST 2015 Thread2: before execute at Sat Dec 05 12:38:59 PST 2015 Thread1: col 0 1 at Sat Dec 05 12:39:19 PST 2015 Thread2: col 0 2 at Sat Dec 05 12:39:19 PST 2015 *Two Connections* scala> val thread1 = new Thread { | override def run { | val connection = DriverManager.getConnection(url, username, password) | val statement = connection.createStatement() | println("Thread1: before execute at " + Calendar.getInstance().getTime()) | val resultSet = statement.executeQuery("SELECT sleep(10) as col1, 1 as col2") | resultSet.next() | val col1 = resultSet.getString("col1") | val col2 = resultSet.getString("col2") | println("Thread1: col " + col1 + " " + col2 + " at " + Calendar.getInstance().getTime()) | } | } thread1: Thread = Thread[Thread-3,5,main] scala> val thread2 = new Thread { | override def run { | val connection = DriverManager.getConnection(url, username, password) | val statement = connection.createStatement() | println("Thread2: before execute at " + Calendar.getInstance().getTime()) | val resultSet = statement.executeQuery("SELECT sleep(10) as col1, 2 as col2") | resultSet.next() | val col1 = resultSet.getString("col1") | val col2 = resultSet.getString("col2") | println("Thread2: col " + col1 + " " + col2 + " at " + Calendar.getInstance().getTime()) | } | } thread2: Thread = Thread[Thread-4,5,main] scala> thread1.start scala> thread2.start Thread2: before execute at Sat Dec 05 12:34:16 PST 2015 Thread1: before execute at Sat Dec 05 12:34:16 PST 2015 Thread1: col 0 1 at Sat Dec 05 12:34:26 PST 2015 Thread2: col 0 2 at Sat Dec 05 12:34:26 PST 2015 Does that make sense? Thanks for the great work! Prasad On Fri, Dec 4, 2015 at 6:18 PM, Jongyoul Lee <jongy...@gmail.com> wrote: > Hi Prasad, > > First of all, Thanks for testing multiple executions and documenting the > bug. I'll fix the bug. Concerning the concurrent executions, that > interpreter make a two statements from one connection. I'm not good at > vertica, could you please confirm supporting multi statements from one > connection and verify running multiple statements from one connection? > Moreover, I appreciate the advice which is better strategy between multiple > statements and multiple connections. I expect that second query is started > before finishing the first one as you can see in your log, In your case, > > 1. the first query started > 2. the second query started > 3. the second query ended > 4. the first query ended > > I, however why those queries finished after 2 minutes. I guess it's about > vertica's behavior of jdbc, and it doesn't seems that vertica runs multiple > statements in one connection. with this situation, even though two > statements try to execute with parallel, sequentially executes internally. > If it's true, I'll change to use multiple connections. > > Regards, > Jongyoul > > On Sat, Dec 5, 2015 at 3:25 AM, Prasad Wagle <prasadwa...@gmail.com> > wrote: > >> Hi Jongyoul, >> >> Thanks for merging ZEPPELIN-440. I downloaded changes from >> https://github.com/apache/incubator-zeppelin/pull/455 and here's what I >> found. >> >> When I ran two Vertica queries, I got the error below: >> ERROR [2015-12-04 09:18:47,153] ({pool-2-thread-3} Job.java[run]:183) - >> Job failed >> java.lang.AbstractMethodError: >> com.vertica.jdbc.VerticaConnectionImpl.isValid(I)Z >> at >> org.apache.zeppelin.hive.HiveInterpreter.getConnection(HiveInterpreter.java:163) >> at >> org.apache.zeppelin.hive.HiveInterpreter.getStatement(HiveInterpreter.java:196) >> at >> org.apache.zeppelin.hive.HiveInterpreter.executeSql(HiveInterpreter.java:208) >> at >> org.apache.zeppelin.hive.HiveInterpreter.interpret(HiveInterpreter.java:292) >> at >> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >> at >> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >> at >> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) >> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >> at >> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> >> When I removed the connection.isValid check, the queries ran without any >> errors. >> if (connection.isClosed()) { // || !connection.isValid(10)) { >> >> However, I noticed that even though the jobs for the two queries (select >> sleep(60)) were started around the same time, they both finished 120 >> seconds after they were started. I also looked at the sessions in Vertica >> and found that 'select sleep(60), 4' was started after 'select sleep(60), >> 3' completed. Is this behavior expected? Our goal is to prevent one long >> Vertica query from delaying other Vertica queries. >> >> MacBook-Pro logs $ cat >> zeppelin-interpreter-hive-pwagle-MacBook-Pro.local.log >> >> INFO [2015-12-04 09:28:44,545] ({Thread-0} >> RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server >> on port 54257 >> >> INFO [2015-12-04 09:28:44,818] ({pool-1-thread-3} >> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate >> interpreter org.apache.zeppelin.hive.HiveInterpreter >> >> INFO [2015-12-04 09:28:44,862] ({pool-2-thread-2} >> SchedulerFactory.java[jobStarted]:132) - Job >> remoteInterpretJob_1449250124861 started by scheduler >> org.apache.zeppelin.hive.HiveInterpreter1131923628 >> >> ERROR [2015-12-04 09:28:44,865] ({pool-2-thread-2} >> HiveInterpreter.java[open]:128) - hive will be ignored. driver.hive and >> hive.url is mandatory. >> >> INFO [2015-12-04 09:28:44,865] ({pool-2-thread-2} >> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command: >> 'select sleep(60), 3' >> >> INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3} >> SchedulerFactory.java[jobStarted]:132) - Job >> remoteInterpretJob_1449250133312 started by scheduler >> org.apache.zeppelin.hive.HiveInterpreter1131923628 >> >> INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3} >> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command: >> 'select sleep(60), 4' >> >> INFO [2015-12-04 09:30:48,970] ({pool-2-thread-3} >> SchedulerFactory.java[jobFinished]:138) - Job >> remoteInterpretJob_1449250133312 finished by scheduler >> org.apache.zeppelin.hive.HiveInterpreter1131923628 >> >> INFO [2015-12-04 09:30:48,970] ({pool-2-thread-2} >> SchedulerFactory.java[jobFinished]:138) - Job >> remoteInterpretJob_1449250124861 finished by scheduler >> org.apache.zeppelin.hive.HiveInterpreter1131923628 >> >> >> Thanks for your help! >> >> Prasad >> >> On Thu, Dec 3, 2015 at 9:12 PM, Jongyoul Lee <jongy...@gmail.com> wrote: >> >>> Hi Prasad, >>> >>> PostgresSqlInterpreter doesn't seems to support parallel execution. I >>> merged ZEPPELIN-440 for supporting it today. Could you please use this >>> interpreter? You can find docs and changes in >>> https://github.com/apache/incubator-zeppelin/pull/455 >>> >>> Regards, >>> Jongyoul >>> >>> On Fri, Dec 4, 2015 at 10:32 AM, Prasad Wagle <prasadwa...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> We are evaluating Zeppelin and are excited about its potential. Right >>>> now, we are using it to access Presto and Vertica. We downloaded >>>> https://github.com/apache/incubator-zeppelin/pull/68 for the Presto >>>> interpreter and are using PostgreSqlInterpreter to access Vertica. >>>> >>>> The main issue we are facing is related to serial execution of queries. >>>> Since the default scheduler is FIFOScheduler, when a Vertica query is >>>> running in one notebook, all other Vertica queries issued from the same or >>>> different notes are pending. >>>> >>>> We changed the getScheduler method in PostgreSqlInterpreter to return >>>> ParallelScheduler. >>>> return SchedulerFactory.singleton().createOrGetParallelScheduler( >>>> PostgreSqlInterpreter.class.getName() + this.hashCode(), 2); >>>> >>>> But this resulted in the exception below. Is this expected behavior? >>>> Does the PostgreSqlInterpreter have to be redesigned to support parallel >>>> execution of queries? >>>> >>>> In zeppelin-interpreter-psql-pwagle-MacBook-Pro.local.log: >>>> >>>> INFO [2015-12-03 10:33:35,891] ({pool-1-thread-2} >>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate >>>> interpreter org.apache.zeppelin.postgresql.PostgreSqlInterpreter >>>> INFO [2015-12-03 10:33:35,940] ({pool-2-thread-2} >>>> SchedulerFactory.java[jobStarted]:132) - Job >>>> remoteInterpretJob_1449167615939 started by scheduler >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>> INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2} >>>> PostgreSqlInterpreter.java[open]:132) - Open psql connection! >>>> INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2} >>>> PostgreSqlInterpreter.java[close]:182) - Close psql connection! >>>> INFO [2015-12-03 10:33:36,511] ({pool-2-thread-2} >>>> SqlCompleter.java[getSqlKeywordsCompletions]:144) - JDBC >>>> DriverName:/vertica.jar-sql.keywords >>>> INFO [2015-12-03 10:33:37,779] ({pool-2-thread-3} >>>> SchedulerFactory.java[jobStarted]:132) - Job >>>> remoteInterpretJob_1449167617779 started by scheduler >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2} >>>> PostgreSqlInterpreter.java[open]:153) - Successfully created psql >>>> connection >>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2} >>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command ' >>>> select sleep(30), 1;' >>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-3} >>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command ' >>>> select sleep(30), 2;' >>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-3} >>>> PostgreSqlInterpreter.java[executeSql]:271) - Cannot run >>>> select sleep(30), 2; >>>> java.sql.SQLException: [Vertica][JDBC](12060) ResultSet object has been >>>> closed. >>>> at com.vertica.jdbc.SForwardResultSet.checkIfOpen(Unknown Source) >>>> at com.vertica.jdbc.SForwardResultSet.next(Unknown Source) >>>> at >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:235) >>>> at >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289) >>>> at >>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299) >>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>> at >>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:744) >>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-2} Job.java[run]:183) - >>>> Job failed >>>> org.apache.zeppelin.interpreter.InterpreterException: >>>> java.lang.NullPointerException >>>> at >>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:61) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299) >>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>> at >>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:744) >>>> Caused by: java.lang.NullPointerException >>>> at >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:262) >>>> at >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289) >>>> at >>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>> ... 11 more >>>> INFO [2015-12-03 10:34:49,489] ({pool-2-thread-3} >>>> SchedulerFactory.java[jobFinished]:138) - Job >>>> remoteInterpretJob_1449167617779 finished by scheduler >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>> INFO [2015-12-03 10:34:49,490] ({pool-2-thread-2} >>>> SchedulerFactory.java[jobFinished]:138) - Job >>>> remoteInterpretJob_1449167615939 finished by scheduler >>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>> >>>> >>>> Thanks, >>>> Prasad >>>> >>> >>> >>> >>> -- >>> 이종열, Jongyoul Lee, 李宗烈 >>> http://madeng.net >>> >> >> > > > -- > 이종열, Jongyoul Lee, 李宗烈 > http://madeng.net > -- Prasad Wagle 408.476.6261 | twitter.com/prasadwagle | prasadwa...@gmail.com | www.linkedin.com/in/prasadwagle