Hi Jongyoul,

I ran experiments below and found that we need multiple connections for
parallel execution in Vertica.

*Same Connection*
scala> val connection = DriverManager.getConnection(url, username, password)
connection: java.sql.Connection =
com.vertica.jdbc.VerticaConnectionImpl@5d56ea0f

scala> val thread1 = new Thread {
     |   override def run {
     |     val statement = connection.createStatement()
     |     println("Thread1: before execute at " +
Calendar.getInstance().getTime())
     |     val resultSet = statement.executeQuery("SELECT sleep(10) as
col1, 1 as col2")
     |     resultSet.next()
     |     val col1 = resultSet.getString("col1")
     |     val col2 = resultSet.getString("col2")
     |     resultSet.next()
     |     println("Thread1: col " + col1 + " " + col2 + " at " +
Calendar.getInstance().getTime())
     |   }
     | }
thread1: Thread = Thread[Thread-3,5,main]

scala> val thread2 = new Thread {
     |   override def run {
     |     val statement = connection.createStatement()
     |     println("Thread2: before execute at " +
Calendar.getInstance().getTime())
     |     val resultSet = statement.executeQuery("SELECT sleep(10) as
col1, 2 as col2")
     |     resultSet.next()
     |     val col1 = resultSet.getString("col1")
     |     val col2 = resultSet.getString("col2")
     |     resultSet.next()
     |     println("Thread2: col " + col1 + " " + col2 + " at " +
Calendar.getInstance().getTime())
     |   }
     | }
thread2: Thread = Thread[Thread-4,5,main]

scala> thread1.start

scala> thread2.start
Thread1: before execute at Sat Dec 05 12:38:59 PST 2015
Thread2: before execute at Sat Dec 05 12:38:59 PST 2015
Thread1: col 0 1 at Sat Dec 05 12:39:19 PST 2015
Thread2: col 0 2 at Sat Dec 05 12:39:19 PST 2015

*Two Connections*

scala> val thread1 = new Thread {
     |   override def run {
     |     val connection = DriverManager.getConnection(url, username,
password)
     |     val statement = connection.createStatement()
     |     println("Thread1: before execute at " +
Calendar.getInstance().getTime())
     |     val resultSet = statement.executeQuery("SELECT sleep(10) as
col1, 1 as col2")
     |     resultSet.next()
     |     val col1 = resultSet.getString("col1")
     |     val col2 = resultSet.getString("col2")
     |     println("Thread1: col " + col1 + " " + col2 + " at " +
Calendar.getInstance().getTime())
     |   }
     | }
thread1: Thread = Thread[Thread-3,5,main]

scala> val thread2 = new Thread {
     |   override def run {
     |     val connection = DriverManager.getConnection(url, username,
password)
     |     val statement = connection.createStatement()
     |     println("Thread2: before execute at " +
Calendar.getInstance().getTime())
     |     val resultSet = statement.executeQuery("SELECT sleep(10) as
col1, 2 as col2")
     |     resultSet.next()
     |     val col1 = resultSet.getString("col1")
     |     val col2 = resultSet.getString("col2")
     |     println("Thread2: col " + col1 + " " + col2 + " at " +
Calendar.getInstance().getTime())
     |   }
     | }
thread2: Thread = Thread[Thread-4,5,main]

scala> thread1.start

scala> thread2.start

Thread2: before execute at Sat Dec 05 12:34:16 PST 2015
Thread1: before execute at Sat Dec 05 12:34:16 PST 2015
Thread1: col 0 1 at Sat Dec 05 12:34:26 PST 2015
Thread2: col 0 2 at Sat Dec 05 12:34:26 PST 2015


Does that make sense?

Thanks for the great work!

Prasad

On Fri, Dec 4, 2015 at 6:18 PM, Jongyoul Lee <jongy...@gmail.com> wrote:

> Hi Prasad,
>
> First of all, Thanks for testing multiple executions and documenting the
> bug. I'll fix the bug. Concerning the concurrent executions, that
> interpreter make a two statements from one connection. I'm not good at
> vertica, could you please confirm supporting multi statements from one
> connection and verify running multiple statements from one connection?
> Moreover, I appreciate the advice which is better strategy between multiple
> statements and multiple connections. I expect that second query is started
> before finishing the first one as you can see in your log, In your case,
>
> 1. the first query started
> 2. the second query started
> 3. the second query ended
> 4. the first query ended
>
> I, however why those queries finished after 2 minutes. I guess it's about
> vertica's behavior of jdbc, and it doesn't seems that vertica runs multiple
> statements in one connection. with this situation, even though two
> statements try to execute with parallel, sequentially executes internally.
> If it's true, I'll change to use multiple connections.
>
> Regards,
> Jongyoul
>
> On Sat, Dec 5, 2015 at 3:25 AM, Prasad Wagle <prasadwa...@gmail.com>
> wrote:
>
>> Hi Jongyoul,
>>
>> Thanks for merging ZEPPELIN-440. I downloaded changes from
>> https://github.com/apache/incubator-zeppelin/pull/455 and here's what I
>> found.
>>
>> When I ran two Vertica queries, I got the error below:
>> ERROR [2015-12-04 09:18:47,153] ({pool-2-thread-3} Job.java[run]:183) -
>> Job failed
>> java.lang.AbstractMethodError:
>> com.vertica.jdbc.VerticaConnectionImpl.isValid(I)Z
>>         at
>> org.apache.zeppelin.hive.HiveInterpreter.getConnection(HiveInterpreter.java:163)
>>         at
>> org.apache.zeppelin.hive.HiveInterpreter.getStatement(HiveInterpreter.java:196)
>>         at
>> org.apache.zeppelin.hive.HiveInterpreter.executeSql(HiveInterpreter.java:208)
>>         at
>> org.apache.zeppelin.hive.HiveInterpreter.interpret(HiveInterpreter.java:292)
>>         at
>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>         at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>         at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
>>         at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>         at
>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:744)
>>
>> When I removed the connection.isValid check, the queries ran without any
>> errors.
>>       if (connection.isClosed()) { // || !connection.isValid(10)) {
>>
>> However, I noticed that even though the jobs for the two queries (select
>> sleep(60)) were started around the same time, they both finished 120
>> seconds after they were started. I also looked at the sessions in Vertica
>> and found that 'select sleep(60), 4' was started after 'select sleep(60),
>> 3' completed. Is this behavior expected? Our goal is to prevent one long
>> Vertica query from delaying other Vertica queries.
>>
>> MacBook-Pro logs $ cat
>> zeppelin-interpreter-hive-pwagle-MacBook-Pro.local.log
>>
>>  INFO [2015-12-04 09:28:44,545] ({Thread-0}
>> RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server
>> on port 54257
>>
>>  INFO [2015-12-04 09:28:44,818] ({pool-1-thread-3}
>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate
>> interpreter org.apache.zeppelin.hive.HiveInterpreter
>>
>>  INFO [2015-12-04 09:28:44,862] ({pool-2-thread-2}
>> SchedulerFactory.java[jobStarted]:132) - Job
>> remoteInterpretJob_1449250124861 started by scheduler
>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>
>> ERROR [2015-12-04 09:28:44,865] ({pool-2-thread-2}
>> HiveInterpreter.java[open]:128) - hive will be ignored. driver.hive and
>> hive.url is mandatory.
>>
>>  INFO [2015-12-04 09:28:44,865] ({pool-2-thread-2}
>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command:
>> 'select sleep(60), 3'
>>
>>  INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3}
>> SchedulerFactory.java[jobStarted]:132) - Job
>> remoteInterpretJob_1449250133312 started by scheduler
>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>
>>  INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3}
>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command:
>> 'select sleep(60), 4'
>>
>>  INFO [2015-12-04 09:30:48,970] ({pool-2-thread-3}
>> SchedulerFactory.java[jobFinished]:138) - Job
>> remoteInterpretJob_1449250133312 finished by scheduler
>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>
>>  INFO [2015-12-04 09:30:48,970] ({pool-2-thread-2}
>> SchedulerFactory.java[jobFinished]:138) - Job
>> remoteInterpretJob_1449250124861 finished by scheduler
>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>
>>
>> Thanks for your help!
>>
>> Prasad
>>
>> On Thu, Dec 3, 2015 at 9:12 PM, Jongyoul Lee <jongy...@gmail.com> wrote:
>>
>>> Hi Prasad,
>>>
>>> PostgresSqlInterpreter doesn't seems to support parallel execution. I
>>> merged ZEPPELIN-440 for supporting it today. Could you please use  this
>>> interpreter? You can find docs and changes in
>>> https://github.com/apache/incubator-zeppelin/pull/455
>>>
>>> Regards,
>>> Jongyoul
>>>
>>> On Fri, Dec 4, 2015 at 10:32 AM, Prasad Wagle <prasadwa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are evaluating Zeppelin and are excited about its potential. Right
>>>> now, we are using it to access Presto and Vertica. We downloaded
>>>> https://github.com/apache/incubator-zeppelin/pull/68 for the Presto
>>>> interpreter and are using PostgreSqlInterpreter to access Vertica.
>>>>
>>>> The main issue we are facing is related to serial execution of queries.
>>>> Since the default scheduler is FIFOScheduler, when a Vertica query is
>>>> running in one notebook, all other Vertica queries issued from the same or
>>>> different notes are pending.
>>>>
>>>> We changed the getScheduler method in PostgreSqlInterpreter to return
>>>> ParallelScheduler.
>>>>     return SchedulerFactory.singleton().createOrGetParallelScheduler(
>>>>             PostgreSqlInterpreter.class.getName() + this.hashCode(), 2);
>>>>
>>>> But this resulted in the exception below. Is this expected behavior?
>>>> Does the PostgreSqlInterpreter have to be redesigned to support parallel
>>>> execution of queries?
>>>>
>>>> In zeppelin-interpreter-psql-pwagle-MacBook-Pro.local.log:
>>>>
>>>>  INFO [2015-12-03 10:33:35,891] ({pool-1-thread-2}
>>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate
>>>> interpreter org.apache.zeppelin.postgresql.PostgreSqlInterpreter
>>>>  INFO [2015-12-03 10:33:35,940] ({pool-2-thread-2}
>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>> remoteInterpretJob_1449167615939 started by scheduler
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>  INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2}
>>>> PostgreSqlInterpreter.java[open]:132) - Open psql connection!
>>>>  INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2}
>>>> PostgreSqlInterpreter.java[close]:182) - Close psql connection!
>>>>  INFO [2015-12-03 10:33:36,511] ({pool-2-thread-2}
>>>> SqlCompleter.java[getSqlKeywordsCompletions]:144) - JDBC
>>>> DriverName:/vertica.jar-sql.keywords
>>>>  INFO [2015-12-03 10:33:37,779] ({pool-2-thread-3}
>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>> remoteInterpretJob_1449167617779 started by scheduler
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2}
>>>> PostgreSqlInterpreter.java[open]:153) - Successfully created psql 
>>>> connection
>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2}
>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command '
>>>> select sleep(30), 1;'
>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-3}
>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command '
>>>> select sleep(30), 2;'
>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-3}
>>>> PostgreSqlInterpreter.java[executeSql]:271) - Cannot run
>>>> select sleep(30), 2;
>>>> java.sql.SQLException: [Vertica][JDBC](12060) ResultSet object has been
>>>> closed.
>>>> at com.vertica.jdbc.SForwardResultSet.checkIfOpen(Unknown Source)
>>>> at com.vertica.jdbc.SForwardResultSet.next(Unknown Source)
>>>> at
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:235)
>>>> at
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289)
>>>> at
>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>> at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299)
>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>> at
>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-2} Job.java[run]:183) -
>>>> Job failed
>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>> java.lang.NullPointerException
>>>> at
>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:61)
>>>> at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299)
>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>> at
>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> Caused by: java.lang.NullPointerException
>>>> at
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:262)
>>>> at
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289)
>>>> at
>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>> ... 11 more
>>>>  INFO [2015-12-03 10:34:49,489] ({pool-2-thread-3}
>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>> remoteInterpretJob_1449167617779 finished by scheduler
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>  INFO [2015-12-03 10:34:49,490] ({pool-2-thread-2}
>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>> remoteInterpretJob_1449167615939 finished by scheduler
>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>
>>>>
>>>> Thanks,
>>>> Prasad
>>>>
>>>
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>



-- 
Prasad Wagle
408.476.6261  |  twitter.com/prasadwagle | prasadwa...@gmail.com |
www.linkedin.com/in/prasadwagle

Reply via email to