Agree. SELECT FOR UPDATE appears to be a command with the most loosly
defined locking semantics in SQL world. So I would start with something
safe, simple and straightforward. Let's add support for simple statements
(no joins, not subqueries) first. Then add support for co-located complex
queries. And at last add support for distributed joins*.

* Though, I think it doesn't make sense with current implementation of
distributed joins, as it is pretty limited and require significant rethink
from architectural perspective.

Vladimir.

On Thu, Feb 22, 2018 at 5:00 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
wrote:

> Alexander, thanks for the detailed explanation. I would start simple,
> without JOINs. However, we should throw proper exceptions if unsupported
> SQL is used.
>
> D.
>
> On Tue, Feb 20, 2018 at 9:45 AM, Alexander Paschenko <
> alexander.a.pasche...@gmail.com> wrote:
>
> > Hello again, and here I go with some more thoughts on SELECT FOR UPDATE.
> > After having talking to Vlad, we returned again to question about how
> > SELECT FOR UPDATE should work with JOINs.
> >
> > It seems that distributed joins currently are better to be left aside
> > as long as rework of distributed queries and joins is a separate big
> > issue.
> > However, probably we still should think about what we could make work
> > after all with what we have in our hands now. And it seems that some
> > scenarios are pretty doable - say, when there's an ordinary JOIN of
> > few tables, we can simply append keys of those tables to list of
> > selected columns, and that would work in case of collocated data.
> >
> > * Alas, major vendors (Postgres, MySQL, Oracle) all also honor
> > subqueries in JOINs - say, like this:
> >
> > select p.id , c.id from person p inner join (select * from company) c
> > on p.company_id = c.id where p.id > 3 and p.id < 10 for update;
> >
> > Obviously, this is a bit corner case as such JOIN is not much
> > different from an ordinary multi-table one, but it gives overall idea
> > of what I'm talking about - major databases lock what's been selected
> > in subquery as well.
> >
> > That said, aforementioned case probably could also be processed as
> > suggested above - that is, we take innermost query, append key to list
> > of its selected columns, so that we can refer to that new column from
> > outer query, etc, etc. In principle we could process even
> > multiple-level nesting of queries with such approach.
> > Also no major vendors, except probably MySQL, support GROUP BY with
> > SELECT FOR UPDATE, and this simplifies our task a bit. :)
> >
> > If someone can think of an example of a query that cannot be processed
> > in the way described above, please write about it.
> >
> > * Also it looks like no major vendors out of three mentioned above
> > lock record from tables mentioned in subqueries in SELECT columns or
> > in WHERE - say, this query
> >
> > select p.id, (select count(*) from company) from person p
> >
> > will lock only records in Person table.
> >
> > Thoughts? Should we attempt to implement JOINs support right away? Or
> > would it be better to leave it be for (near) future and start with
> > simple implementation that would ban JOINs altogether?
> >
> > Regards,
> > Alex
> >
> >
> > 2018-01-25 10:58 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> > > Alex,
> > >
> > > Thank you for detailed analysis. My 50 cents:
> > >
> > > 1) I would not allow GROUP BYs in the first place. These are good
> > > candidates for further iterations IMO
> > > 2) Same as p.1
> > > 3) In final TX SQL solution we woll lock rows, not keys
> > > (GridCacheMapEntry). Can we simply lock every row returned from the
> > query?
> > > 4) Same as p.1
> > > 5) Yes, it should work the same way we would implement it for normal
> > queries
> > >
> > > Also I am not quite understand why should we keep results on map node
> > until
> > > all keys are locked. This increases memory pressure on the server.
> > Instead,
> > > I would start sending batches to reducer immediately, but do not return
> > the
> > > very first result to the user until all results are collected. This way
> > > pressure is moved to the client what increases cluster stability. Will
> > that
> > > work?
> > >
> > > On Wed, Jan 24, 2018 at 6:23 PM, Alexander Paschenko <
> > > alexander.a.pasche...@gmail.com> wrote:
> > >
> > >> Hello Igniters,
> > >>
> > >> I'd like to bring up the discussion about implementation details of
> > >> https://issues.apache.org/jira/browse/IGNITE-6937 about
> > >> support of SELECT FOR UPDATE statements as a part of overall activity
> > >> on transactional SQL.
> > >>
> > >> That kind of statements allows the user proactively obtain row level
> > >> locks for the records that they may (or may not) want to
> > >> update in further statements of the same transaction.
> > >>
> > >> Suggested general approach to implementation is as follows:
> > >>
> > >> - Perform two-step SELECT as usual (e.g. split, map, reduce stages),
> BUT
> > >> - Map nodes start giving away their first result pages for such query
> > >> only after they have traversed whole their result set
> > >> and have obtained locks for all keys corresponding to selected rows
> > >> (e.g. rows matching WHERE clause).
> > >>
> > >> I've made some research on how some RDBMSs behave (MySQL, Oracle, Pg)
> > >> and found that they all also consider SELECT FOR UPDATE
> > >> finished and give away its result set only after all locks have been
> > >> acquired, so in general suggested strategy is similar to
> > >> what other databases do. Alas, everyone concerned is welcome to share
> > >> their views on how this feature could work in Ignite - as well as on
> > >> some stuff stated below. These are some caveats that I'd like to
> > >> discuss in addition to overall approach.
> > >>
> > >> First: should we allow GROUP BY clause in SELECT FOR UPDATE? Oracle
> > >> and Pg do not support it in any way, while MySQL does.
> > >>
> > >> Second: should we allow JOIN clause in SELECT FOR UPDATE? All
> > >> mentioned major vendors support it, but with Ignite implementation
> > >> likely
> > >> will not be straightforward - this has to do with pt. 3.
> > >>
> > >> Third: in general case, how do we detect *keys* that we should lock?
> > >> Consider following statement: SELECT name FROM person FOR UPDATE.
> > >> This query does not mention table key at all. Moreover, if we have
> > >> JOIN clause in the query, it is concerned with more than one table
> > >> (=cache), thus making it even harder to detect which keys we should
> > >> lock. Suggested solution is as follows: on split stage,
> > >> append to MAP query key columns for all concerned tables (i.e. SELECT
> > >> a.name, b.surname from a join b will become
> > >> SELECT a.name, b.surname, a._key, b._key) - with that done, we'll
> > >> definitely be able to lock everything we need on MAP nodes.
> > >>
> > >> Fourth: should we allow LIMIT clause in SELECT FOR UPDATE? In general,
> > >> RDBMS vendors do not recommend combining LIMIT with
> > >> SELECT FOR UPDATE. First, this is because LIMIT results are subject to
> > >> ordering, and in this case it's harder to predict what records
> > >> will be locked after all. Second, some vendors state that they lock
> > >> not just what LIMIT yields, but also *everything before it* - say,
> > >> if your SELECT returns 100 records and you apply LIMIT 50,60, then all
> > >> first 60 records will be locked even if you don't need first 50.
> > >> Moreover, in case of Ignite, any LIMIT will make query need merge
> > >> table, all data will have to be copied to reduce node first,
> > >> yadda yadda yadda, and we won't be able to actually lock anything
> > >> until all data is on reduce node. See pt. 5 for more on this.
> > >>
> > >> Fifth: in fact, above question stands for all cases when we need merge
> > >> table: how things should work when we have to pull all data to
> > >> reduce node prior to actually filtering anything? Should we aim to
> > >> support such cases in SELECT FOR UPDATE?
> > >>
> > >> Regards,
> > >> Alex
> > >>
> >
>

Reply via email to