Hello Igniters, I'd like to bring up the discussion about implementation details of https://issues.apache.org/jira/browse/IGNITE-6937 about support of SELECT FOR UPDATE statements as a part of overall activity on transactional SQL.
That kind of statements allows the user proactively obtain row level locks for the records that they may (or may not) want to update in further statements of the same transaction. Suggested general approach to implementation is as follows: - Perform two-step SELECT as usual (e.g. split, map, reduce stages), BUT - Map nodes start giving away their first result pages for such query only after they have traversed whole their result set and have obtained locks for all keys corresponding to selected rows (e.g. rows matching WHERE clause). I've made some research on how some RDBMSs behave (MySQL, Oracle, Pg) and found that they all also consider SELECT FOR UPDATE finished and give away its result set only after all locks have been acquired, so in general suggested strategy is similar to what other databases do. Alas, everyone concerned is welcome to share their views on how this feature could work in Ignite - as well as on some stuff stated below. These are some caveats that I'd like to discuss in addition to overall approach. First: should we allow GROUP BY clause in SELECT FOR UPDATE? Oracle and Pg do not support it in any way, while MySQL does. Second: should we allow JOIN clause in SELECT FOR UPDATE? All mentioned major vendors support it, but with Ignite implementation likely will not be straightforward - this has to do with pt. 3. Third: in general case, how do we detect *keys* that we should lock? Consider following statement: SELECT name FROM person FOR UPDATE. This query does not mention table key at all. Moreover, if we have JOIN clause in the query, it is concerned with more than one table (=cache), thus making it even harder to detect which keys we should lock. Suggested solution is as follows: on split stage, append to MAP query key columns for all concerned tables (i.e. SELECT a.name, b.surname from a join b will become SELECT a.name, b.surname, a._key, b._key) - with that done, we'll definitely be able to lock everything we need on MAP nodes. Fourth: should we allow LIMIT clause in SELECT FOR UPDATE? In general, RDBMS vendors do not recommend combining LIMIT with SELECT FOR UPDATE. First, this is because LIMIT results are subject to ordering, and in this case it's harder to predict what records will be locked after all. Second, some vendors state that they lock not just what LIMIT yields, but also *everything before it* - say, if your SELECT returns 100 records and you apply LIMIT 50,60, then all first 60 records will be locked even if you don't need first 50. Moreover, in case of Ignite, any LIMIT will make query need merge table, all data will have to be copied to reduce node first, yadda yadda yadda, and we won't be able to actually lock anything until all data is on reduce node. See pt. 5 for more on this. Fifth: in fact, above question stands for all cases when we need merge table: how things should work when we have to pull all data to reduce node prior to actually filtering anything? Should we aim to support such cases in SELECT FOR UPDATE? Regards, Alex