[ 
https://issues.apache.org/jira/browse/IGNITE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409553#comment-15409553
 ] 

Alexander Paschenko edited comment on IGNITE-2294 at 8/5/16 2:59 PM:
---------------------------------------------------------------------

With respect to what has been done to support modifying operations, JDBC driver 
has been updated accordingly.

First, to compare type of query from the string against one requested by JDBC 
driver (requested by calling definite operation, say, *executeQuery* or 
*executeUpdate*), has been introduced a helper class *JdbcSqlFieldsQuery* that 
carries additional flag of expected operation type.

Then, to distinguish update cursors from those of result sets, 
*QueryCursorImpl* received result type flag as well.

*executeUpdate* and *executeBatch* methods have been properly implemented in 
*JdbcStatement* and *JdbcPreparedStatement*, the latter also received its 
deserved specific handling of arguments. Currently batch statements are simply 
executed one by one, which most likely has to be improved for bulk data loading.

Therefore, after having looked at MySQL and its JDBC driver, I would like to 
make a proposal about possible use of *data streamer* to optimize bulk 
inserts/merges.

First, let's define batch inserts as batch statements in JDBC terms (single 
query with many sets of params) and bulk statements (multi row INSERT query, 
for example). Currently *bulk* statements are efficiently handled with 
*IgniteCache.putAll*, as long as in this case the query arrives to the server 
in full, one piece. However, the whole point of data streamer is to handle 
amount of data that is unknown in advance. Therefore I see *batch* JDBC 
statements as a nice abstraction to data streamer, and the use might be as 
follows:

When the user issues an *addBatch* command on the driver, we fire either 
individual *IgniteDataStreamer.addData(K, V)*, or 
*IgniteDataStreamer.addData(Map)* (could use some splitting into chunks for 
efficiency here). When the user calls *executeBatch* command, we flush data 
streamer.

And use of data streamer might be made optional, say, via parameter of 
connection string. If the user opts not to enable data streamer and still uses 
batch, we could apply the technique that MySQL uses - namely, *rewrite* 
multiple items in batch into *single bulk insert*.

For example, suppose we have a prepared statement with query of *insert into 
Person(_key, name, surname) values (?, ?, ?)*, and then we supply two sets of 
arguments via *addBatch*. Then, when *executeBatch* is fired, we turn it all 
into *single* query *insert into Person(_key, name, surname) values (?, ?, ?), 
(?, ?, ?)* with a single set of arguments (with 3 x 2 = 6 items) that will be 
handled by *putAll*.

Again, above example is for the case when data streamer is disabled - if it's 
not, we just feed batch items to it as described above. And also we could leave 
one-by-one processing of batch items as one more option for this behavior.

To sum up: what is proposed is that we add to connection string new param with 
2 or 3 possible values, it will control how batch statements are handled - 
either items will be processed individually, or they will be fed to data 
streamer, or they will be squashed to single query then processed by *putAll*.

The main downside that I see for this approach is that this logic will have to 
be reimplemented in some way in other SQL interfaces of Ignite (.NET, ODBC, 
Visor (?)).


was (Author: al.psc):
With respect to what has been done to support modifying operations, JDBC driver 
has been updated accordingly.

First, to compare type of query from the string against one requested by JDBC 
driver (requested by calling definite operation, say, *executeQuery* or 
*executeUpdate*), has been introduced a helper class *JdbcSqlFieldsQuery* that 
carries additional flag of expected operation type.

Then, to distinguish update cursors from those of result sets, 
*QueryCursorImpl* received result type flag as well.

*executeUpdate* and *executeBatch* methods have been properly implemented in 
*JdbcStatement* and *JdbcPreparedStatement*, the latter also received its 
deserved specific handling of arguments. Currently batch statements are simply 
executed one by one, which most likely has to be improved for bulk data loading.

Therefore, after having looked at MySQL and its JDBC driver, I would like to 
make a proposal about possible use of *data streamer* to optimize bulk 
inserts/merges.

First, let's define batch inserts as batch statements in JDBC terms (single 
query with many sets of params) and bulk statements (multi row INSERT query, 
for example). Currently *bulk* statements are efficiently handled with 
*IgniteCache.putAll*, as long as in this case the query arrives to the server 
in full, one piece. However, the whole point of data streamer is to handle 
amount of data that is unknown in advance. Therefore I see *batch* JDBC 
statements as a nice abstraction to data streamer, and the use might be as 
follows:

When the user issues an *addBatch* command on the driver, we fire either 
individual *IgniteDataStreamer.addData(K, V)*, or 
*IgniteDataStreamer.addData(Map)* (could use some splitting into chunks for 
efficiency here). When the user calls *executeBatch* command, we flush data 
streamer.

And use of data streamer might be made optional, say, via parameter of 
connection string. If the user opts not to enable data streamer and still uses 
batch, we could apply the technique that MySQL uses - namely, *rewrite* 
multiple items in batch into *single bulk insert*.

For example, suppose we have a prepared statement with query of *insert into 
Person(_key, name, surname) values (?, ?, ?)*, and then we supply two sets of 
arguments via *addBatch*. Then, when *executeBatch* is fired, we turn it all 
into *single* query "insert into Person(_key, name, surname) values (?, ?, ?), 
(?, ?, ?)" with a single set of arguments (with 3 x 2 = 6 items) that will be 
handled by *putAll*.

Again, above example is for the case when data streamer is disabled - if it's 
not, we just feed batch items to it as described above. And also we could leave 
one-by-one processing of batch items as one more option for this behavior.

To sum up: what is proposed is that we add to connection string new param with 
2 or 3 possible values, it will control how batch statements are handled - 
either items will be processed individually, or they will be fed to data 
streamer, or they will be squashed to single query then processed by *putAll*.

The main downside that I see for this approach is that this logic will have to 
be reimplemented in some way in other SQL interfaces of Ignite (.NET, ODBC, 
Visor (?)).

> Implement SQL DML (insert, update, delete) clauses.
> ---------------------------------------------------
>
>                 Key: IGNITE-2294
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2294
>             Project: Ignite
>          Issue Type: Wish
>            Reporter: Sergi Vladykin
>            Assignee: Alexander Paschenko
>             Fix For: 1.7
>
>
> We need to add parsing for all the listed SQL commands and translate them 
> into respective cache operations (putIfAbstent, put, remove).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to