Alexander, thanks for the detailed explanation. I would start simple,
without JOINs. However, we should throw proper exceptions if unsupported
SQL is used.

D.

On Tue, Feb 20, 2018 at 9:45 AM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> Hello again, and here I go with some more thoughts on SELECT FOR UPDATE.
> After having talking to Vlad, we returned again to question about how
> SELECT FOR UPDATE should work with JOINs.
>
> It seems that distributed joins currently are better to be left aside
> as long as rework of distributed queries and joins is a separate big
> issue.
> However, probably we still should think about what we could make work
> after all with what we have in our hands now. And it seems that some
> scenarios are pretty doable - say, when there's an ordinary JOIN of
> few tables, we can simply append keys of those tables to list of
> selected columns, and that would work in case of collocated data.
>
> * Alas, major vendors (Postgres, MySQL, Oracle) all also honor
> subqueries in JOINs - say, like this:
>
> select p.id , c.id from person p inner join (select * from company) c
> on p.company_id = c.id where p.id > 3 and p.id < 10 for update;
>
> Obviously, this is a bit corner case as such JOIN is not much
> different from an ordinary multi-table one, but it gives overall idea
> of what I'm talking about - major databases lock what's been selected
> in subquery as well.
>
> That said, aforementioned case probably could also be processed as
> suggested above - that is, we take innermost query, append key to list
> of its selected columns, so that we can refer to that new column from
> outer query, etc, etc. In principle we could process even
> multiple-level nesting of queries with such approach.
> Also no major vendors, except probably MySQL, support GROUP BY with
> SELECT FOR UPDATE, and this simplifies our task a bit. :)
>
> If someone can think of an example of a query that cannot be processed
> in the way described above, please write about it.
>
> * Also it looks like no major vendors out of three mentioned above
> lock record from tables mentioned in subqueries in SELECT columns or
> in WHERE - say, this query
>
> select p.id, (select count(*) from company) from person p
>
> will lock only records in Person table.
>
> Thoughts? Should we attempt to implement JOINs support right away? Or
> would it be better to leave it be for (near) future and start with
> simple implementation that would ban JOINs altogether?
>
> Regards,
> Alex
>
>
> 2018-01-25 10:58 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> > Alex,
> >
> > Thank you for detailed analysis. My 50 cents:
> >
> > 1) I would not allow GROUP BYs in the first place. These are good
> > candidates for further iterations IMO
> > 2) Same as p.1
> > 3) In final TX SQL solution we woll lock rows, not keys
> > (GridCacheMapEntry). Can we simply lock every row returned from the
> query?
> > 4) Same as p.1
> > 5) Yes, it should work the same way we would implement it for normal
> queries
> >
> > Also I am not quite understand why should we keep results on map node
> until
> > all keys are locked. This increases memory pressure on the server.
> Instead,
> > I would start sending batches to reducer immediately, but do not return
> the
> > very first result to the user until all results are collected. This way
> > pressure is moved to the client what increases cluster stability. Will
> that
> > work?
> >
> > On Wed, Jan 24, 2018 at 6:23 PM, Alexander Paschenko <
> > alexander.a.pasche...@gmail.com> wrote:
> >
> >> Hello Igniters,
> >>
> >> I'd like to bring up the discussion about implementation details of
> >> https://issues.apache.org/jira/browse/IGNITE-6937 about
> >> support of SELECT FOR UPDATE statements as a part of overall activity
> >> on transactional SQL.
> >>
> >> That kind of statements allows the user proactively obtain row level
> >> locks for the records that they may (or may not) want to
> >> update in further statements of the same transaction.
> >>
> >> Suggested general approach to implementation is as follows:
> >>
> >> - Perform two-step SELECT as usual (e.g. split, map, reduce stages), BUT
> >> - Map nodes start giving away their first result pages for such query
> >> only after they have traversed whole their result set
> >> and have obtained locks for all keys corresponding to selected rows
> >> (e.g. rows matching WHERE clause).
> >>
> >> I've made some research on how some RDBMSs behave (MySQL, Oracle, Pg)
> >> and found that they all also consider SELECT FOR UPDATE
> >> finished and give away its result set only after all locks have been
> >> acquired, so in general suggested strategy is similar to
> >> what other databases do. Alas, everyone concerned is welcome to share
> >> their views on how this feature could work in Ignite - as well as on
> >> some stuff stated below. These are some caveats that I'd like to
> >> discuss in addition to overall approach.
> >>
> >> First: should we allow GROUP BY clause in SELECT FOR UPDATE? Oracle
> >> and Pg do not support it in any way, while MySQL does.
> >>
> >> Second: should we allow JOIN clause in SELECT FOR UPDATE? All
> >> mentioned major vendors support it, but with Ignite implementation
> >> likely
> >> will not be straightforward - this has to do with pt. 3.
> >>
> >> Third: in general case, how do we detect *keys* that we should lock?
> >> Consider following statement: SELECT name FROM person FOR UPDATE.
> >> This query does not mention table key at all. Moreover, if we have
> >> JOIN clause in the query, it is concerned with more than one table
> >> (=cache), thus making it even harder to detect which keys we should
> >> lock. Suggested solution is as follows: on split stage,
> >> append to MAP query key columns for all concerned tables (i.e. SELECT
> >> a.name, b.surname from a join b will become
> >> SELECT a.name, b.surname, a._key, b._key) - with that done, we'll
> >> definitely be able to lock everything we need on MAP nodes.
> >>
> >> Fourth: should we allow LIMIT clause in SELECT FOR UPDATE? In general,
> >> RDBMS vendors do not recommend combining LIMIT with
> >> SELECT FOR UPDATE. First, this is because LIMIT results are subject to
> >> ordering, and in this case it's harder to predict what records
> >> will be locked after all. Second, some vendors state that they lock
> >> not just what LIMIT yields, but also *everything before it* - say,
> >> if your SELECT returns 100 records and you apply LIMIT 50,60, then all
> >> first 60 records will be locked even if you don't need first 50.
> >> Moreover, in case of Ignite, any LIMIT will make query need merge
> >> table, all data will have to be copied to reduce node first,
> >> yadda yadda yadda, and we won't be able to actually lock anything
> >> until all data is on reduce node. See pt. 5 for more on this.
> >>
> >> Fifth: in fact, above question stands for all cases when we need merge
> >> table: how things should work when we have to pull all data to
> >> reduce node prior to actually filtering anything? Should we aim to
> >> support such cases in SELECT FOR UPDATE?
> >>
> >> Regards,
> >> Alex
> >>
>

Reply via email to