TX SQL: SELECT FOR UPDATE implementation

Alexander Paschenko Wed, 24 Jan 2018 07:23:57 -0800

Hello Igniters,

I'd like to bring up the discussion about implementation details of
https://issues.apache.org/jira/browse/IGNITE-6937 about
support of SELECT FOR UPDATE statements as a part of overall activity
on transactional SQL.


That kind of statements allows the user proactively obtain row level
locks for the records that they may (or may not) want to
update in further statements of the same transaction.

Suggested general approach to implementation is as follows:

- Perform two-step SELECT as usual (e.g. split, map, reduce stages), BUT
- Map nodes start giving away their first result pages for such query
only after they have traversed whole their result set
and have obtained locks for all keys corresponding to selected rows
(e.g. rows matching WHERE clause).

I've made some research on how some RDBMSs behave (MySQL, Oracle, Pg)
and found that they all also consider SELECT FOR UPDATE
finished and give away its result set only after all locks have been
acquired, so in general suggested strategy is similar to
what other databases do. Alas, everyone concerned is welcome to share
their views on how this feature could work in Ignite - as well as on
some stuff stated below. These are some caveats that I'd like to
discuss in addition to overall approach.

First: should we allow GROUP BY clause in SELECT FOR UPDATE? Oracle
and Pg do not support it in any way, while MySQL does.

Second: should we allow JOIN clause in SELECT FOR UPDATE? All
mentioned major vendors support it, but with Ignite implementation
likely
will not be straightforward - this has to do with pt. 3.

Third: in general case, how do we detect *keys* that we should lock?
Consider following statement: SELECT name FROM person FOR UPDATE.
This query does not mention table key at all. Moreover, if we have
JOIN clause in the query, it is concerned with more than one table
(=cache), thus making it even harder to detect which keys we should
lock. Suggested solution is as follows: on split stage,
append to MAP query key columns for all concerned tables (i.e. SELECT
a.name, b.surname from a join b will become
SELECT a.name, b.surname, a._key, b._key) - with that done, we'll
definitely be able to lock everything we need on MAP nodes.

Fourth: should we allow LIMIT clause in SELECT FOR UPDATE? In general,
RDBMS vendors do not recommend combining LIMIT with
SELECT FOR UPDATE. First, this is because LIMIT results are subject to
ordering, and in this case it's harder to predict what records
will be locked after all. Second, some vendors state that they lock
not just what LIMIT yields, but also *everything before it* - say,
if your SELECT returns 100 records and you apply LIMIT 50,60, then all
first 60 records will be locked even if you don't need first 50.
Moreover, in case of Ignite, any LIMIT will make query need merge
table, all data will have to be copied to reduce node first,
yadda yadda yadda, and we won't be able to actually lock anything
until all data is on reduce node. See pt. 5 for more on this.

Fifth: in fact, above question stands for all cases when we need merge
table: how things should work when we have to pull all data to
reduce node prior to actually filtering anything? Should we aim to
support such cases in SELECT FOR UPDATE?

Regards,
Alex

TX SQL: SELECT FOR UPDATE implementation

Reply via email to