If we use Streamer, then we always have `happens-before` broken. This is ok, because Streamer is for data loading, not for usual operating.
We are not inventing any bicycles, just separating concerns: Batching and Streaming. My point here is that they should not depend on each other at all: Batching can work with or without Streaming, as well as Streaming can work with or without Batching. Your proposal is a set of non-obvious rules for them to work. I see no reasons for these complications. Sergi 2016-12-08 15:49 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > Sergi, > > If user call single *execute() *operation, than most likely it is not > batching. We should not rely on strange case where user perform batching > without using standard and well-adopted batching JDBC API. The main problem > with streamer is that it is async and hence break happens-before guarantees > in a single thread: SELECT after INSERT might not return inserted value. > > Honestly, I do not really understand why we are trying to re-invent a > bicycle here. There is standard API - let's just use it and make flexible > enough to take advantage of IgniteDataStreamer if needed. > > Is there any use case which is not covered with this solution? Or let me > ask from the opposite side - are there any well-known JDBC drivers which > perform batching/streaming from non-batched update statements? > > Vladimir. > > On Thu, Dec 8, 2016 at 3:38 PM, Sergi Vladykin <sergi.vlady...@gmail.com> > wrote: > > > Vladimir, > > > > I see no reason to forbid Streamer usage from non-batched statement > > execution. > > It is common that users already have their ETL tools and you can't be > sure > > if they use batching or not. > > > > Alex, > > > > I guess we have to decide on Streaming first and then we will discuss > > Batching separately, ok? Because this decision may become important for > > batching implementation. > > > > Sergi > > > > 2016-12-08 15:31 GMT+03:00 Andrey Gura <ag...@apache.org>: > > > > > Alex, > > > > > > In most cases JdbcQueryTask should be executed locally on client node > > > started by JDBC driver. > > > > > > JdbcQueryTask.QueryResult res = > > > loc ? qryTask.call() : > > > ignite.compute(ignite.cluster().forNodeId(nodeId)).call(qryTask); > > > > > > Is it valid behavior after introducing DML functionality? > > > > > > In cases when user wants to execute query on specific node he should > > > fully understand what he wants and what can go in wrong way. > > > > > > > > > On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko > > > <alexander.a.pasche...@gmail.com> wrote: > > > > Sergi, > > > > > > > > JDBC batching might work quite differently from driver to driver. > Say, > > > > MySQL happily rewrites queries as I had suggested in the beginning of > > > > this thread (it's not the only strategy, but one of the possible > > > > options) - and, BTW, would like to hear at least an opinion about it. > > > > > > > > On your first approach, section before streamer: you suggest that we > > > > send single statement and multiple param sets as a single query task, > > > > am I right? (Just to make sure that I got you properly.) If so, do > you > > > > also mean that API (namely JdbcQueryTask) between server and client > > > > should also change? Or should new API means be added to facilitate > > > > batching tasks? > > > > > > > > - Alex > > > > > > > > 2016-12-08 15:05 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com > >: > > > >> Guys, > > > >> > > > >> I discussed this feature with Dmitriy and we came to conclusion that > > > >> batching in JDBC and Data Streaming in Ignite have different > semantics > > > and > > > >> performance characteristics. Thus they are independent features > (they > > > may > > > >> work together, may separately, but this is another story). > > > >> > > > >> Let me explain. > > > >> > > > >> This is how JDBC batching works: > > > >> - Add N sets of parameters to a prepared statement. > > > >> - Manually execute prepared statement. > > > >> - Repeat until all the data is loaded. > > > >> > > > >> > > > >> This is how data streamer works: > > > >> - Keep adding data. > > > >> - Streamer will buffer and load buffered per-node batches when they > > are > > > big > > > >> enough. > > > >> - Close streamer to make sure that everything is loaded. > > > >> > > > >> As you can see we have a difference in semantics of when we send > data: > > > if > > > >> in our JDBC we will allow sending batches to nodes without calling > > > >> `execute` (and probably we will need to make `execute` to no-op > here), > > > then > > > >> we are violating semantics of JDBC, if we will disallow this > behavior, > > > then > > > >> this batching will underperform. > > > >> > > > >> Thus I suggest keeping these features (JDBC Batching and JDBC > > > Streaming) as > > > >> separate features. > > > >> > > > >> As I already said they can work together: Batching will batch > > parameters > > > >> and on `execute` they will go to the Streamer in one shot and > Streamer > > > will > > > >> deal with the rest. > > > >> > > > >> Sergi > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > > > >> > > > >>> Hi Alex, > > > >>> > > > >>> To my understanding there are two possible approaches to batching > in > > > JDBC > > > >>> layer: > > > >>> > > > >>> 1) Rely on default batching API. Specifically > > > >>> *PreparedStatement.addBatch()* [1] > > > >>> and others. This is nice and clear API, users are used to it, and > > it's > > > >>> adoption will minimize user code changes when migrating from other > > JDBC > > > >>> sources. We simply copy updates locally and then execute them all > at > > > once > > > >>> with only a single network hop to servers. *IgniteDataStreamer* can > > be > > > used > > > >>> underneath. > > > >>> > > > >>> 2) Or we can have separate connection flag which will move all > > > >>> INSERT/UPDATE/DELETE statements through streamer. > > > >>> > > > >>> I prefer the first approach > > > >>> > > > >>> Also we need to keep in mind that data streamer has poor > performance > > > when > > > >>> adding single key-value pairs due to high overhead on concurrency > and > > > other > > > >>> bookkeeping. Instead, it is better to pre-batch key-value pairs > > before > > > >>> giving them to streamer. > > > >>> > > > >>> Vladimir. > > > >>> > > > >>> [1] > > > >>> https://docs.oracle.com/javase/8/docs/api/java/sql/ > > > PreparedStatement.html# > > > >>> addBatch-- > > > >>> > > > >>> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko < > > > >>> alexander.a.pasche...@gmail.com> wrote: > > > >>> > > > >>> > Hello Igniters, > > > >>> > > > > >>> > One of the major improvements to DML has to be support of batch > > > >>> > statements. I'd like to discuss its implementation. The suggested > > > >>> > approach is to rewrite given query turning it from few INSERTs > into > > > >>> > single statement and processing arguments accordingly. I suggest > > this > > > >>> > as long as the whole point of batching is to make as little > > > >>> > interactions with cluster as possible and to make operations as > > > >>> > condensed as possible, and in case of Ignite it means that we > > should > > > >>> > send as little JdbcQueryTasks as possible. And, as long as a > query > > > >>> > task holds single query and its arguments, this approach will not > > > >>> > require any changes to be done to current design and won't break > > any > > > >>> > backward compatibility - all dirty work on rewriting will be done > > by > > > >>> > JDBC driver. > > > >>> > Without rewriting, we could introduce some new query task for > batch > > > >>> > operations, but that would make impossible sending such requests > > from > > > >>> > newer clients to older servers (say, servers of version 1.8.0, > > which > > > >>> > does not know about batching, let alone older versions). > > > >>> > I'd like to hear comments and suggestions from the community. > > Thanks! > > > >>> > > > > >>> > - Alex > > > >>> > > > > >>> > > > > > >