Auto flush freq is already there, I just forgot to mention it in the comments. Will add the rest today.
— Alex 19 дек. 2016 г. 10:29 PM пользователь "Denis Magda" <dma...@apache.org> написал: > Alexander, > > A couple of comments in regards to the streaming mode. > > I would rename rename the existed property to “ignite.jdbc.streaming” and > add additional ones that will help to manage and tune the streaming > behavior: > ignite.jdbc.streaming.perNodeBufferSize > ignite.jdbc.streaming.perNodeParallelOperations > ignite.jdbc.streaming.autoFlushFrequency > > > Any other thoughts? > > — > Denis > > > On Dec 19, 2016, at 8:02 AM, Alexander Paschenko < > alexander.a.pasche...@gmail.com> wrote: > > > > OK folks, both data streamer support and batching support have been > implemented. > > > > Resulting design fully conforms to what Dima suggested initially - > > these two concepts are separated. > > > > Streamed statements are turned on by connection flag, stream auto > > flush timeout can be tuned in the same way; these statements support > > INSERT and MERGE w/o subquery as well as fast key bounded DELETE and > > UPDATE; each prepared statement in streamed mode has its own streamer > > object and their lifecycles are the same - on close, the statement > > closes its streamer. Streaming mode is available only in "local" mode > > of connection between JDBC driver and Ignite client (default mode when > > JDBC driver creates Ignite client node by itself) - there would be no > > sense in streaming if query args would have to travel over network. > > > > Batched statements sre used via conventional JDBC API (setArgs... > > addBatch... executeBatch...), they also support INSERT and MERGE w/o > > subquery as well as fast key (and, optionally, value) bounded DELETE > > and UPDATE. These work in the similar manner to non batched statements > > and likewise rely on traditional putAll/invokeAll routines. > > Essentially, batching is just the way to pass a bigger map to > > cache.putAll without writing single very long query. This works in > > local as well as "remote" Ignite JDBC connectivity mode. > > > > More info (details are in the comments): > > > > Batching - https://issues.apache.org/jira/browse/IGNITE-4269 > > Streaming - https://issues.apache.org/jira/browse/IGNITE-4169 > > > > Regards, > > Alex > > > > 2016-12-10 23:39 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > >> Alex, > >> > >> It seams to me that replace semantic can be implemented with > >> StreamReceiver, no? > >> > >> D. > >> > >> On Sat, Dec 10, 2016 at 2:54 AM, Alexander Paschenko < > >> alexander.a.pasche...@gmail.com> wrote: > >> > >>> Sorry, "no relation w/JDBC" in my previous message should read "no > relation > >>> w/JDBC batching". > >>> > >>> — Alex > >>> 10 дек. 2016 г. 1:52 PM пользователь "Alexander Paschenko" < > >>> alexander.a.pasche...@gmail.com> написал: > >>> > >>>> Dima, > >>>> > >>>> I would like to point out that data streamer support had already been > >>>> implemented in the course of work on DML in 1.8 exactly as you are > >>>> suggesting now (turned on via connection flag; allowed only MERGE — > data > >>>> streamer can't do putIfAbsent stuff, right?; absolutely no relation > >>>> w/JDBC), *but* that patch had been reverted — by advice from Vlad > which I > >>>> believe had been agreed with you, so it didn't make it to 1.8 after > all. > >>>> Also, while it's possible to maintain INSERT vs MERGE semantic using > >>>> streamer's allowOverwrite flag, I can't see how we could mimic UPDATE > >>> here > >>>> as long as the streamer does not put data to cache only in case when > key > >>> is > >>>> present AND allowOverwrite is false, while UPDATE should not put > anything > >>>> when the key is *missing* — i.e., there's no way to emulate cache's > >>>> *replace* operation semantic with streamer (update value only if key > is > >>>> present, otherwise do nothing). > >>>> > >>>> — Alex > >>>> 9 дек. 2016 г. 10:00 PM пользователь "Dmitriy Setrakyan" < > >>>> dsetrak...@apache.org> написал: > >>>> > >>>>> On Fri, Dec 9, 2016 at 12:45 AM, Vladimir Ozerov < > voze...@gridgain.com> > >>>>> wrote: > >>>>> > >>>>>> I already expressed my concern - this is counterintuitive approach. > >>>>> Because > >>>>>> without happens-before pure streaming model can be applied only on > >>>>>> independent chunks of data. It mean that mentioned ETL use case is > not > >>>>>> feasible - ETL always depend on implicit or explicit links between > >>>>> tables, > >>>>>> and hence streaming is not applicable here. My question stands > still - > >>>>> what > >>>>>> products except of possibly Ignite do this kind of JDBC streaming? > >>>>>> > >>>>> > >>>>> Vova, we have 2 mechanisms in the product: IgniteCache.putAll() or > >>>>> DataStreamer.addData(). > >>>>> > >>>>> JDBC batching and putAll() are absolutely identical. If you see it as > >>>>> counter-intuitive, I would ask for a concrete example. > >>>>> > >>>>> As far as links between data, Ignite does not have foreign-key > >>>>> constraints, > >>>>> so DataStreamer can insert data in any order (but again, not as > part of > >>>>> JDBC batch). > >>>>> > >>>>> > >>>>>> > >>>>>> Another problem is that connection-wide property doesn't fit well in > >>>>> JDBC > >>>>>> pooling model. Users will have use different connections for > streaming > >>>>> and > >>>>>> non-streaming approaches. > >>>>>> > >>>>> > >>>>> Using DataStreamer is not possible within JDBC batching paradigm, > >>> period. > >>>>> I > >>>>> wish we could drop the high-level-feels-good discussions altogether, > >>>>> because it seems like we are spinning wheels here. > >>>>> > >>>>> There is no way to use the streamer in JDBC context, unless we add a > >>>>> connection flag. Again, if you disagree, I would prefer to see a > >>> concrete > >>>>> example explaining why. > >>>>> > >>>>> > >>>>>> Please see how Oracle did that, this is precisely what I am talking > >>>>> about: > >>>>>> https://docs.oracle.com/cd/B28359_01/java.111/b31224/oraperf > >>>>> .htm#i1056232 > >>>>>> Two batching modes - one with explicit flush, another one with > >>> implicit > >>>>>> flush, when Oracle decides on it's own when it is better to > >>> communicate > >>>>> the > >>>>>> server. Batching mode can be declared globally or on per-statement > >>>>> level. > >>>>>> Simple and flexible. > >>>>>> > >>>>>> > >>>>>> On Fri, Dec 9, 2016 at 4:40 AM, Dmitriy Setrakyan < > >>>>> dsetrak...@apache.org> > >>>>>> wrote: > >>>>>> > >>>>>>> Gents, > >>>>>>> > >>>>>>> As Sergi suggested, batching and streaming are very different > >>>>>> semantically. > >>>>>>> > >>>>>>> To use standard JDBC batching, all we need to do is convert it to a > >>>>>>> cache.putAll() method, as semantically a putAll(...) call is > >>> identical > >>>>>> to a > >>>>>>> JDBC batch. Of course, if we see and UPDATE with a WHERE clause in > >>>>>> between, > >>>>>>> then we may have to break a batch into several chunks and execute > >>> the > >>>>>>> update in between. The DataStreamer should not be used here. > >>>>>>> > >>>>>>> I believe that for streaming we need to add a special JDBC/ODBC > >>>>>> connection > >>>>>>> flag. Whenever this flag is set to true, then we only should allow > >>>>> INSERT > >>>>>>> or single-UPDATE operations and use DataStreamer API internally. > All > >>>>>>> operations other than INSERT or single-UPDATE should be prohibited. > >>>>>>> > >>>>>>> I think this design is semantically clear. Any objections? > >>>>>>> > >>>>>>> D. > >>>>>>> > >>>>>>> On Thu, Dec 8, 2016 at 5:02 AM, Sergi Vladykin < > >>>>> sergi.vlady...@gmail.com > >>>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> If we use Streamer, then we always have `happens-before` broken. > >>>>> This > >>>>>> is > >>>>>>>> ok, because Streamer is for data loading, not for usual operating. > >>>>>>>> > >>>>>>>> We are not inventing any bicycles, just separating concerns: > >>>>> Batching > >>>>>> and > >>>>>>>> Streaming. > >>>>>>>> > >>>>>>>> My point here is that they should not depend on each other at all: > >>>>>>> Batching > >>>>>>>> can work with or without Streaming, as well as Streaming can work > >>>>> with > >>>>>> or > >>>>>>>> without Batching. > >>>>>>>> > >>>>>>>> Your proposal is a set of non-obvious rules for them to work. I > >>> see > >>>>> no > >>>>>>>> reasons for these complications. > >>>>>>>> > >>>>>>>> Sergi > >>>>>>>> > >>>>>>>> > >>>>>>>> 2016-12-08 15:49 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com > >>>> : > >>>>>>>> > >>>>>>>>> Sergi, > >>>>>>>>> > >>>>>>>>> If user call single *execute() *operation, than most likely it > >>> is > >>>>> not > >>>>>>>>> batching. We should not rely on strange case where user perform > >>>>>>> batching > >>>>>>>>> without using standard and well-adopted batching JDBC API. The > >>>>> main > >>>>>>>> problem > >>>>>>>>> with streamer is that it is async and hence break happens-before > >>>>>>>> guarantees > >>>>>>>>> in a single thread: SELECT after INSERT might not return > >>> inserted > >>>>>>> value. > >>>>>>>>> > >>>>>>>>> Honestly, I do not really understand why we are trying to > >>>>> re-invent a > >>>>>>>>> bicycle here. There is standard API - let's just use it and make > >>>>>>> flexible > >>>>>>>>> enough to take advantage of IgniteDataStreamer if needed. > >>>>>>>>> > >>>>>>>>> Is there any use case which is not covered with this solution? > >>> Or > >>>>> let > >>>>>>> me > >>>>>>>>> ask from the opposite side - are there any well-known JDBC > >>> drivers > >>>>>>> which > >>>>>>>>> perform batching/streaming from non-batched update statements? > >>>>>>>>> > >>>>>>>>> Vladimir. > >>>>>>>>> > >>>>>>>>> On Thu, Dec 8, 2016 at 3:38 PM, Sergi Vladykin < > >>>>>>> sergi.vlady...@gmail.com > >>>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Vladimir, > >>>>>>>>>> > >>>>>>>>>> I see no reason to forbid Streamer usage from non-batched > >>>>> statement > >>>>>>>>>> execution. > >>>>>>>>>> It is common that users already have their ETL tools and you > >>>>> can't > >>>>>> be > >>>>>>>>> sure > >>>>>>>>>> if they use batching or not. > >>>>>>>>>> > >>>>>>>>>> Alex, > >>>>>>>>>> > >>>>>>>>>> I guess we have to decide on Streaming first and then we will > >>>>>> discuss > >>>>>>>>>> Batching separately, ok? Because this decision may become > >>>>> important > >>>>>>> for > >>>>>>>>>> batching implementation. > >>>>>>>>>> > >>>>>>>>>> Sergi > >>>>>>>>>> > >>>>>>>>>> 2016-12-08 15:31 GMT+03:00 Andrey Gura <ag...@apache.org>: > >>>>>>>>>> > >>>>>>>>>>> Alex, > >>>>>>>>>>> > >>>>>>>>>>> In most cases JdbcQueryTask should be executed locally on > >>>>> client > >>>>>>> node > >>>>>>>>>>> started by JDBC driver. > >>>>>>>>>>> > >>>>>>>>>>> JdbcQueryTask.QueryResult res = > >>>>>>>>>>> loc ? qryTask.call() : > >>>>>>>>>>> ignite.compute(ignite.cluster().forNodeId(nodeId)).call( > >>>>>> qryTask); > >>>>>>>>>>> > >>>>>>>>>>> Is it valid behavior after introducing DML functionality? > >>>>>>>>>>> > >>>>>>>>>>> In cases when user wants to execute query on specific node > >>> he > >>>>>>> should > >>>>>>>>>>> fully understand what he wants and what can go in wrong way. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko > >>>>>>>>>>> <alexander.a.pasche...@gmail.com> wrote: > >>>>>>>>>>>> Sergi, > >>>>>>>>>>>> > >>>>>>>>>>>> JDBC batching might work quite differently from driver to > >>>>>> driver. > >>>>>>>>> Say, > >>>>>>>>>>>> MySQL happily rewrites queries as I had suggested in the > >>>>>>> beginning > >>>>>>>> of > >>>>>>>>>>>> this thread (it's not the only strategy, but one of the > >>>>>> possible > >>>>>>>>>>>> options) - and, BTW, would like to hear at least an > >>> opinion > >>>>>> about > >>>>>>>> it. > >>>>>>>>>>>> > >>>>>>>>>>>> On your first approach, section before streamer: you > >>> suggest > >>>>>> that > >>>>>>>> we > >>>>>>>>>>>> send single statement and multiple param sets as a single > >>>>> query > >>>>>>>> task, > >>>>>>>>>>>> am I right? (Just to make sure that I got you properly.) > >>> If > >>>>> so, > >>>>>>> do > >>>>>>>>> you > >>>>>>>>>>>> also mean that API (namely JdbcQueryTask) between server > >>> and > >>>>>>> client > >>>>>>>>>>>> should also change? Or should new API means be added to > >>>>>>> facilitate > >>>>>>>>>>>> batching tasks? > >>>>>>>>>>>> > >>>>>>>>>>>> - Alex > >>>>>>>>>>>> > >>>>>>>>>>>> 2016-12-08 15:05 GMT+03:00 Sergi Vladykin < > >>>>>>>> sergi.vlady...@gmail.com > >>>>>>>>>> : > >>>>>>>>>>>>> Guys, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I discussed this feature with Dmitriy and we came to > >>>>>> conclusion > >>>>>>>> that > >>>>>>>>>>>>> batching in JDBC and Data Streaming in Ignite have > >>>>> different > >>>>>>>>> semantics > >>>>>>>>>>> and > >>>>>>>>>>>>> performance characteristics. Thus they are independent > >>>>>> features > >>>>>>>>> (they > >>>>>>>>>>> may > >>>>>>>>>>>>> work together, may separately, but this is another > >>> story). > >>>>>>>>>>>>> > >>>>>>>>>>>>> Let me explain. > >>>>>>>>>>>>> > >>>>>>>>>>>>> This is how JDBC batching works: > >>>>>>>>>>>>> - Add N sets of parameters to a prepared statement. > >>>>>>>>>>>>> - Manually execute prepared statement. > >>>>>>>>>>>>> - Repeat until all the data is loaded. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> This is how data streamer works: > >>>>>>>>>>>>> - Keep adding data. > >>>>>>>>>>>>> - Streamer will buffer and load buffered per-node batches > >>>>> when > >>>>>>>> they > >>>>>>>>>> are > >>>>>>>>>>> big > >>>>>>>>>>>>> enough. > >>>>>>>>>>>>> - Close streamer to make sure that everything is loaded. > >>>>>>>>>>>>> > >>>>>>>>>>>>> As you can see we have a difference in semantics of when > >>> we > >>>>>> send > >>>>>>>>> data: > >>>>>>>>>>> if > >>>>>>>>>>>>> in our JDBC we will allow sending batches to nodes > >>> without > >>>>>>> calling > >>>>>>>>>>>>> `execute` (and probably we will need to make `execute` to > >>>>>> no-op > >>>>>>>>> here), > >>>>>>>>>>> then > >>>>>>>>>>>>> we are violating semantics of JDBC, if we will disallow > >>>>> this > >>>>>>>>> behavior, > >>>>>>>>>>> then > >>>>>>>>>>>>> this batching will underperform. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thus I suggest keeping these features (JDBC Batching and > >>>>> JDBC > >>>>>>>>>>> Streaming) as > >>>>>>>>>>>>> separate features. > >>>>>>>>>>>>> > >>>>>>>>>>>>> As I already said they can work together: Batching will > >>>>> batch > >>>>>>>>>> parameters > >>>>>>>>>>>>> and on `execute` they will go to the Streamer in one shot > >>>>> and > >>>>>>>>> Streamer > >>>>>>>>>>> will > >>>>>>>>>>>>> deal with the rest. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Sergi > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov < > >>>>>>> voze...@gridgain.com > >>>>>>>>> : > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Alex, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> To my understanding there are two possible approaches to > >>>>>>> batching > >>>>>>>>> in > >>>>>>>>>>> JDBC > >>>>>>>>>>>>>> layer: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1) Rely on default batching API. Specifically > >>>>>>>>>>>>>> *PreparedStatement.addBatch()* [1] > >>>>>>>>>>>>>> and others. This is nice and clear API, users are used > >>> to > >>>>> it, > >>>>>>> and > >>>>>>>>>> it's > >>>>>>>>>>>>>> adoption will minimize user code changes when migrating > >>>>> from > >>>>>>>> other > >>>>>>>>>> JDBC > >>>>>>>>>>>>>> sources. We simply copy updates locally and then execute > >>>>> them > >>>>>>> all > >>>>>>>>> at > >>>>>>>>>>> once > >>>>>>>>>>>>>> with only a single network hop to servers. > >>>>>> *IgniteDataStreamer* > >>>>>>>> can > >>>>>>>>>> be > >>>>>>>>>>> used > >>>>>>>>>>>>>> underneath. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 2) Or we can have separate connection flag which will > >>> move > >>>>>> all > >>>>>>>>>>>>>> INSERT/UPDATE/DELETE statements through streamer. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I prefer the first approach > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Also we need to keep in mind that data streamer has poor > >>>>>>>>> performance > >>>>>>>>>>> when > >>>>>>>>>>>>>> adding single key-value pairs due to high overhead on > >>>>>>> concurrency > >>>>>>>>> and > >>>>>>>>>>> other > >>>>>>>>>>>>>> bookkeeping. Instead, it is better to pre-batch > >>> key-value > >>>>>> pairs > >>>>>>>>>> before > >>>>>>>>>>>>>> giving them to streamer. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Vladimir. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>> https://docs.oracle.com/javase/8/docs/api/java/sql/ > >>>>>>>>>>> PreparedStatement.html# > >>>>>>>>>>>>>> addBatch-- > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko < > >>>>>>>>>>>>>> alexander.a.pasche...@gmail.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hello Igniters, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> One of the major improvements to DML has to be support > >>>>> of > >>>>>>> batch > >>>>>>>>>>>>>>> statements. I'd like to discuss its implementation. > >>> The > >>>>>>>> suggested > >>>>>>>>>>>>>>> approach is to rewrite given query turning it from few > >>>>>>> INSERTs > >>>>>>>>> into > >>>>>>>>>>>>>>> single statement and processing arguments > >>> accordingly. I > >>>>>>>> suggest > >>>>>>>>>> this > >>>>>>>>>>>>>>> as long as the whole point of batching is to make as > >>>>> little > >>>>>>>>>>>>>>> interactions with cluster as possible and to make > >>>>>> operations > >>>>>>> as > >>>>>>>>>>>>>>> condensed as possible, and in case of Ignite it means > >>>>> that > >>>>>> we > >>>>>>>>>> should > >>>>>>>>>>>>>>> send as little JdbcQueryTasks as possible. And, as > >>> long > >>>>> as > >>>>>> a > >>>>>>>>> query > >>>>>>>>>>>>>>> task holds single query and its arguments, this > >>> approach > >>>>>> will > >>>>>>>> not > >>>>>>>>>>>>>>> require any changes to be done to current design and > >>>>> won't > >>>>>>>> break > >>>>>>>>>> any > >>>>>>>>>>>>>>> backward compatibility - all dirty work on rewriting > >>>>> will > >>>>>> be > >>>>>>>> done > >>>>>>>>>> by > >>>>>>>>>>>>>>> JDBC driver. > >>>>>>>>>>>>>>> Without rewriting, we could introduce some new query > >>>>> task > >>>>>> for > >>>>>>>>> batch > >>>>>>>>>>>>>>> operations, but that would make impossible sending > >>> such > >>>>>>>> requests > >>>>>>>>>> from > >>>>>>>>>>>>>>> newer clients to older servers (say, servers of > >>> version > >>>>>>> 1.8.0, > >>>>>>>>>> which > >>>>>>>>>>>>>>> does not know about batching, let alone older > >>> versions). > >>>>>>>>>>>>>>> I'd like to hear comments and suggestions from the > >>>>>> community. > >>>>>>>>>> Thanks! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - Alex > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Vladimir Ozerov > >>>>>> Senior Software Architect > >>>>>> GridGain Systems > >>>>>> www.gridgain.com > >>>>>> *+7 (960) 283 98 40* > >>>>>> > >>>>> > >>>> > >>> > >