Agree. SELECT FOR UPDATE appears to be a command with the most loosly defined locking semantics in SQL world. So I would start with something safe, simple and straightforward. Let's add support for simple statements (no joins, not subqueries) first. Then add support for co-located complex queries. And at last add support for distributed joins*.
* Though, I think it doesn't make sense with current implementation of distributed joins, as it is pretty limited and require significant rethink from architectural perspective. Vladimir. On Thu, Feb 22, 2018 at 5:00 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > Alexander, thanks for the detailed explanation. I would start simple, > without JOINs. However, we should throw proper exceptions if unsupported > SQL is used. > > D. > > On Tue, Feb 20, 2018 at 9:45 AM, Alexander Paschenko < > alexander.a.pasche...@gmail.com> wrote: > > > Hello again, and here I go with some more thoughts on SELECT FOR UPDATE. > > After having talking to Vlad, we returned again to question about how > > SELECT FOR UPDATE should work with JOINs. > > > > It seems that distributed joins currently are better to be left aside > > as long as rework of distributed queries and joins is a separate big > > issue. > > However, probably we still should think about what we could make work > > after all with what we have in our hands now. And it seems that some > > scenarios are pretty doable - say, when there's an ordinary JOIN of > > few tables, we can simply append keys of those tables to list of > > selected columns, and that would work in case of collocated data. > > > > * Alas, major vendors (Postgres, MySQL, Oracle) all also honor > > subqueries in JOINs - say, like this: > > > > select p.id , c.id from person p inner join (select * from company) c > > on p.company_id = c.id where p.id > 3 and p.id < 10 for update; > > > > Obviously, this is a bit corner case as such JOIN is not much > > different from an ordinary multi-table one, but it gives overall idea > > of what I'm talking about - major databases lock what's been selected > > in subquery as well. > > > > That said, aforementioned case probably could also be processed as > > suggested above - that is, we take innermost query, append key to list > > of its selected columns, so that we can refer to that new column from > > outer query, etc, etc. In principle we could process even > > multiple-level nesting of queries with such approach. > > Also no major vendors, except probably MySQL, support GROUP BY with > > SELECT FOR UPDATE, and this simplifies our task a bit. :) > > > > If someone can think of an example of a query that cannot be processed > > in the way described above, please write about it. > > > > * Also it looks like no major vendors out of three mentioned above > > lock record from tables mentioned in subqueries in SELECT columns or > > in WHERE - say, this query > > > > select p.id, (select count(*) from company) from person p > > > > will lock only records in Person table. > > > > Thoughts? Should we attempt to implement JOINs support right away? Or > > would it be better to leave it be for (near) future and start with > > simple implementation that would ban JOINs altogether? > > > > Regards, > > Alex > > > > > > 2018-01-25 10:58 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > > > Alex, > > > > > > Thank you for detailed analysis. My 50 cents: > > > > > > 1) I would not allow GROUP BYs in the first place. These are good > > > candidates for further iterations IMO > > > 2) Same as p.1 > > > 3) In final TX SQL solution we woll lock rows, not keys > > > (GridCacheMapEntry). Can we simply lock every row returned from the > > query? > > > 4) Same as p.1 > > > 5) Yes, it should work the same way we would implement it for normal > > queries > > > > > > Also I am not quite understand why should we keep results on map node > > until > > > all keys are locked. This increases memory pressure on the server. > > Instead, > > > I would start sending batches to reducer immediately, but do not return > > the > > > very first result to the user until all results are collected. This way > > > pressure is moved to the client what increases cluster stability. Will > > that > > > work? > > > > > > On Wed, Jan 24, 2018 at 6:23 PM, Alexander Paschenko < > > > alexander.a.pasche...@gmail.com> wrote: > > > > > >> Hello Igniters, > > >> > > >> I'd like to bring up the discussion about implementation details of > > >> https://issues.apache.org/jira/browse/IGNITE-6937 about > > >> support of SELECT FOR UPDATE statements as a part of overall activity > > >> on transactional SQL. > > >> > > >> That kind of statements allows the user proactively obtain row level > > >> locks for the records that they may (or may not) want to > > >> update in further statements of the same transaction. > > >> > > >> Suggested general approach to implementation is as follows: > > >> > > >> - Perform two-step SELECT as usual (e.g. split, map, reduce stages), > BUT > > >> - Map nodes start giving away their first result pages for such query > > >> only after they have traversed whole their result set > > >> and have obtained locks for all keys corresponding to selected rows > > >> (e.g. rows matching WHERE clause). > > >> > > >> I've made some research on how some RDBMSs behave (MySQL, Oracle, Pg) > > >> and found that they all also consider SELECT FOR UPDATE > > >> finished and give away its result set only after all locks have been > > >> acquired, so in general suggested strategy is similar to > > >> what other databases do. Alas, everyone concerned is welcome to share > > >> their views on how this feature could work in Ignite - as well as on > > >> some stuff stated below. These are some caveats that I'd like to > > >> discuss in addition to overall approach. > > >> > > >> First: should we allow GROUP BY clause in SELECT FOR UPDATE? Oracle > > >> and Pg do not support it in any way, while MySQL does. > > >> > > >> Second: should we allow JOIN clause in SELECT FOR UPDATE? All > > >> mentioned major vendors support it, but with Ignite implementation > > >> likely > > >> will not be straightforward - this has to do with pt. 3. > > >> > > >> Third: in general case, how do we detect *keys* that we should lock? > > >> Consider following statement: SELECT name FROM person FOR UPDATE. > > >> This query does not mention table key at all. Moreover, if we have > > >> JOIN clause in the query, it is concerned with more than one table > > >> (=cache), thus making it even harder to detect which keys we should > > >> lock. Suggested solution is as follows: on split stage, > > >> append to MAP query key columns for all concerned tables (i.e. SELECT > > >> a.name, b.surname from a join b will become > > >> SELECT a.name, b.surname, a._key, b._key) - with that done, we'll > > >> definitely be able to lock everything we need on MAP nodes. > > >> > > >> Fourth: should we allow LIMIT clause in SELECT FOR UPDATE? In general, > > >> RDBMS vendors do not recommend combining LIMIT with > > >> SELECT FOR UPDATE. First, this is because LIMIT results are subject to > > >> ordering, and in this case it's harder to predict what records > > >> will be locked after all. Second, some vendors state that they lock > > >> not just what LIMIT yields, but also *everything before it* - say, > > >> if your SELECT returns 100 records and you apply LIMIT 50,60, then all > > >> first 60 records will be locked even if you don't need first 50. > > >> Moreover, in case of Ignite, any LIMIT will make query need merge > > >> table, all data will have to be copied to reduce node first, > > >> yadda yadda yadda, and we won't be able to actually lock anything > > >> until all data is on reduce node. See pt. 5 for more on this. > > >> > > >> Fifth: in fact, above question stands for all cases when we need merge > > >> table: how things should work when we have to pull all data to > > >> reduce node prior to actually filtering anything? Should we aim to > > >> support such cases in SELECT FOR UPDATE? > > >> > > >> Regards, > > >> Alex > > >> > > >