Thanks for details! 2021-12-13 11:14 GMT+03:00, Maksim Timonin <timoninma...@apache.org>: > Hi, Ivan! > >> Does IndexQuery has separate codebase? Does it share some code with > ScanQuery > > Yes, IndexQuery mostly shares processing with ScanQuery: > requests/responses, a remote filter. Reducer (merge of query results) on an > initiator node has its own implementation for IndexQuery (MergeSort), but > it's built into existing ScanQuery processing. > > On Mon, Dec 13, 2021 at 11:00 AM Ivan Pavlukhin <vololo...@gmail.com> > wrote: > >> > Then, if there are no objections, the short-term plan is: >> >> Sounds ok for me. >> >> > AFAIR Scan and SQL queries implementations are totally different. Could >> you tell me how Index queries fit there? >> >> I suppose my questions was misleading. Actually I would like to know >> how code is organized today. AFAIR SQL and scan queries has their own >> codebases (messages, merging results from different nodes and etc). >> Does IndexQuery has separate codebase? Does it share some code with >> ScanQuery on a "query" layer (higher than BTree layer)? >> >> 2021-12-10 13:25 GMT+03:00, Maksim Timonin <timoninma...@apache.org>: >> > Hi! >> > >> >> Existing users may already have some logic in place to handle >> > inconsistencies >> > >> > Pavel, I'm not aware of such users but your comment makes sense. So, I' >> OK >> > with adding an option for ScanQuery. Naming of the option is debatable. >> The >> > name "reservePartitions" looks good, but we actually already reserve a >> > partition when it is explicitly specified in ScanQuery. Then, it can be >> > a >> > bit misleading in the case of explicitly setting this param to `false` >> with >> > a specified partition. But we can just mention it in javadocs, that the >> > setting affects only full scan. >> > >> >> AFAIR Scan and SQL queries implementations are totally different. >> >> Could >> > you tell me how Index queries fit there? >> > >> > IndexQuery scans indexes like SQL does with few optimizations of >> traversing >> > BPlusTree, also there are some differences in query processing, also >> > IndexQuery provides sorted results by default. But I don't expect >> > inconsistency in results for SQL / IndexQuery for similar queries. >> > Actually, I should add tests for that and fix failures if any. >> > >> >> In an ideal world it would be great to have only one API >> > >> > I think it's possible to teach SQL to switch to cache queries for some >> > cases, or provide such an opportunity with hints or explicit functions. >> And >> > these queries might be parts of an SQL query plan. Or we can go even >> > deeper, maybe it could be like Apache Spark stages, when we can build >> > our >> > plan with different types of queries, and the same type provides to >> > users the opportunity to run only specific stages: DataFrame.sql() or >> > DataFrame.filter(inline=true/false). >> > >> >> Perhaps IndexQuery should also cover regular entry iteration cases >> > >> > It's possible too. IndexQuery provides an opportunity to scan the PK >> index, >> > then it can start ScanQuery under the hood for some cases. >> > >> > But anyway, to make them run through the single API, we should provide >> the >> > same guarantees. >> > >> > Then, if there are no objections, the short-term plan is: >> > 1. Implement partition reservation for IndexQuery. >> > 2. Add the option for ScanQuery. >> > 3. Make tests that show that IndexQuery / SQL / ScanQuery is >> > replaceable >> > for some types of queries in terms of result consistency. >> > 4. Then discuss again, how we can integrate them together. >> > >> > >> > On Fri, Dec 10, 2021 at 11:41 AM Ivan Pavlukhin <vololo...@gmail.com> >> > wrote: >> > >> >> Actually I brought a point about using SQL queries instead of scan >> >> queries because I worry about inconsistency between different >> >> implementations. AFAIR Scan and SQL queries implementations are >> >> totally different. Could you tell me how Index queries fit there? >> >> >> >> My general ideas are as follows: >> >> 1. In an ideal world it would be great to have only one API (to rule >> >> them all) and implementation to cover all cases (i.e. scan, index, >> >> SQL). Also, I wonder how other vendors tackle this? >> >> 2. New IndexQuery might not mimic ScanQuery behavior but instead have >> >> a correct/expected one (including partition reservation). Perhaps >> >> IndexQuery should also cover regular entry iteration (scan) cases and >> >> become a new ground for generalized scan/index queries. >> >> >> >> 2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn <ptupit...@apache.org>: >> >> > IndexQuery is experimental, so we can indeed make it reserve >> partitions >> >> by >> >> > default. No objections here. >> >> > >> >> > But with ScanQuery, I think it's better to add a "reservePartitions" >> >> > property so that default behavior is not affected. >> >> > Existing users may already have some logic in place to handle >> >> > inconsistencies. So if we enable reservation by default, >> >> > they'll get an unnecessary performance degradation. >> >> > >> >> > On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin < >> timoninma...@apache.org> >> >> > wrote: >> >> > >> >> >> Hi, Ivan, Pavel! Thanks for your responses. >> >> >> >> >> >> > But is there a real need/benefit from using scan queries over >> >> primitive >> >> >> SQL queries today >> >> >> >> >> >> In addition to Pavel's response, I'd like to add about IndexQuery. >> >> >> This >> >> >> query is in experimental status, but it shows great performance and >> >> beats >> >> >> SQL for *some* type of queries. For IndexQuery we can leverage on >> >> >> understanding that we use only index and apply some optimizations: >> >> >> skip >> >> >> boundaries check (h2 Select.isConditionMet), apply filtering by >> inline >> >> IO >> >> >> instead of extracting fields from BinaryObjects, also we skip some >> >> >> H2 >> >> >> related stuff. Also it's not implemented yet, but for IndexQuery we >> >> >> can >> >> >> access data pages sequentially instead of randomly (by iterating >> >> >> with >> >> >> IndexRow cursor by batches), and I expect it may make performance >> >> >> better >> >> >> in >> >> >> some highload cases. >> >> >> >> >> >> > it is desirable that successful (but may be strange) scenarios >> >> continue >> >> >> to work after fix (and not fail e.g. due to some introduced >> exception) >> >> >> >> >> >> > Partition reservation should be opt-in if we decide to proceed. >> >> >> >> >> >> With the approach I proposed we can reach it. Without specified >> >> >> query >> >> >> timeout (and cache queries don't have a public API for setting >> >> >> timeout), >> >> >> we >> >> >> can retry our attempts to reserve partitions while we do not >> >> >> succeed. >> >> So, >> >> >> users may get some increase in query time execution under unstable >> >> >> topology. But in exchange users will get more stable results, and >> >> >> avoid >> >> >> exceptions like in the ticket I showed [1]. >> >> >> >> >> >> We can implement this logic for IndexQuery only (as it's only >> >> >> experimental >> >> >> since 2.12), and extend it on ScanQuery later after stabilizing >> >> >> with >> >> >> IndexQuery. WDYT? >> >> >> >> >> >> >> >> >> [1] https://issues.apache.org/jira/browse/IGNITE-12591 >> >> >> >> >> >> On Thu, Dec 9, 2021 at 10:39 AM Pavel Tupitsyn >> >> >> <ptupit...@apache.org >> > >> >> >> wrote: >> >> >> >> >> >> > Agree with Ivan regarding compatibility. >> >> >> > Partition reservation should be opt-in if we decide to proceed. >> >> >> > >> >> >> > > is there a real need/benefit from using >> >> >> > > scan queries over primitive SQL queries today >> >> >> > >> >> >> > SQL requires additional configuration (QueryEntity) and involves >> >> memory >> >> >> and >> >> >> > CPU overhead to maintain the indexes. >> >> >> > Especially if the query is used rarely (maybe some maintenance >> >> >> > tasks), >> >> >> scan >> >> >> > is a better option. >> >> >> > >> >> >> > On Thu, Dec 9, 2021 at 10:12 AM Ivan Pavlukhin < >> vololo...@gmail.com> >> >> >> > wrote: >> >> >> > >> >> >> > > Hi Maksim, >> >> >> > > >> >> >> > > Thank you for looking into this. While fixing wrong/surprising >> >> >> > > behavior is very important, I also have some concerns, let's >> >> >> > > say, >> >> >> > > from >> >> >> > > different angle of view: >> >> >> > > 1. From a first glance it seems that similar behavior of scan >> >> >> > > and >> >> SQL >> >> >> > > queries is a good idea. But is there a real need/benefit from >> >> >> > > using >> >> >> > > scan queries over primitive SQL queries today (e.g. SELECT * >> >> >> > > FROM >> >> >> > > table1)? >> >> >> > > 2. Also we need to carefully think about compatibility. It is >> >> >> > > desirable that successful (but may be strange) scenarios >> >> >> > > continue >> >> >> > > to >> >> >> > > work after fix (and not fail e.g. due to some introduced >> >> >> > > exception). >> >> >> > > >> >> >> > > Regarding partition reservation for long-time open cursors I >> guess >> >> >> > > some ideas might be found in LAZY mode for SQL queries [1]. >> >> >> > > >> >> >> > > [1] https://issues.apache.org/jira/browse/IGNITE-9171 >> >> >> > > >> >> >> > > 2021-12-08 20:11 GMT+03:00, Maksim Timonin >> >> >> > > <timoninma...@apache.org >> >> >: >> >> >> > > > Hi, Igniters! >> >> >> > > > >> >> >> > > > There is a known issue that ScanQuery on unstable topology >> >> >> > > > returns >> >> >> > > > incorrect results: duplicates data [1] or fails with an >> >> >> > > > exception >> >> >> [2]. >> >> >> > > The >> >> >> > > > reason is ScanQuery doesn't reserve partitions. >> >> >> > > > >> >> >> > > > IndexQuery shares the same query processing as ScanQuery, and >> >> >> > > > then >> >> >> > > > it >> >> >> > is >> >> >> > > > also affected by unstable topology. I want to fix it for >> >> >> > > > IndexQuery. >> >> >> > > > IndexQuery should provide the same stability as SQL queries >> >> >> > > > do >> - >> >> no >> >> >> > > > occasional failures or data duplication. >> >> >> > > > >> >> >> > > > I dived into the SQL query processing and found that we can >> >> >> > > > unify >> >> >> logic >> >> >> > > for >> >> >> > > > SQL queries and cache queries (ScanQuery, IndexQuery, >> >> >> > > > TextQuery): >> >> >> > > > >> >> >> > > > 1. Currently, cache queries use `GridCacheQueryAdapter#nodes` >> to >> >> >> > > > find >> >> >> > > nodes >> >> >> > > > to run a query. From the other side, SQL processing uses for >> the >> >> >> > > > same >> >> >> > > goal >> >> >> > > > `ReducePartitionMapper#nodesForPartitions`. It looks like >> >> >> > > > those >> >> >> methods >> >> >> > > > have some slight differences, but in the common, they do the >> >> >> > > > same. >> >> >> So, >> >> >> > I >> >> >> > > > propose to make a single method for both cases. >> >> >> > > > >> >> >> > > > 2. SQL processing uses `PartitionReservationManager` for >> reserve >> >> >> > > partitions >> >> >> > > > on the map side. I propose to move it to the ignite-core >> >> >> > > > module >> >> and >> >> >> > start >> >> >> > > > using it for the cache queries. >> >> >> > > > >> >> >> > > > 3. Implement retries for the cache queries in case we failed >> >> >> > > > to >> >> >> reserve >> >> >> > > > partitions on the map side. >> >> >> > > > >> >> >> > > > Currently, I see a downside of reserving partitions for cache >> >> >> queries: >> >> >> > > > cache queries are lazy. And the time of partition reservation >> >> >> > > > depends >> >> >> > on >> >> >> > > a >> >> >> > > > user's application code (how fast a cursor is iterated and >> >> closed). >> >> >> > > AFAIU, >> >> >> > > > it's not very good to have a partition in reserve too long. >> >> Please, >> >> >> > > correct >> >> >> > > > me if I'm wrong here. >> >> >> > > > >> >> >> > > > But from the other side, Ignite reserves partitions for >> >> >> > > > ScanQuery >> >> >> when >> >> >> > a >> >> >> > > > partition has been specified as a ScanQuery parameter, and >> >> >> > > > Ignite >> >> >> > > reserves >> >> >> > > > partitions for SQL with the flag lazy=true. Also: >> >> >> > > > - IndexQuery: I expect simple queries that will return a >> >> relatively >> >> >> > small >> >> >> > > > amount of data. Then partitions wouldn't be reserved too much >> >> time. >> >> >> > > > - the same is for TextQuery - it returns a limited amount of >> >> >> > > > data >> >> >> (due >> >> >> > to >> >> >> > > > the Lucene logic). >> >> >> > > > - full ScanQuery it's not in use much, AFAIK. So, it's by >> >> >> > > > default >> >> a >> >> >> > > pretty >> >> >> > > > heavy operation. >> >> >> > > > >> >> >> > > > So, I think it's safe to reserve partitions in any case. But >> >> >> > > > there >> >> >> > could >> >> >> > > be >> >> >> > > > an alternative optimistic approach, smth like that: >> >> >> > > > 1. Check that topology is stable and waiting while it's not >> >> >> stabilized. >> >> >> > > > 2. Run a query on a stable cluster. >> >> >> > > > 3. In cases when a cluster becomes unstable during query >> >> >> > > > execution >> >> >> > > > - >> >> >> > try >> >> >> > > to >> >> >> > > > reserve partitions at runtime (query should be aware of >> topology >> >> >> > changes) >> >> >> > > > and fail in case of reservation failure (if a user already >> >> >> > > > fetched >> >> >> some >> >> >> > > > data). >> >> >> > > > >> >> >> > > > I don't like the idea of this optimistic approach because in >> >> >> > > > case >> >> a >> >> >> > user >> >> >> > > > got some data, we don't have a better solution than to fail a >> >> query >> >> >> in >> >> >> > > case >> >> >> > > > of cluster instability and reservation failure. >> >> >> > > > >> >> >> > > > Igniters, WDYT? >> >> >> > > > >> >> >> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12591 >> >> >> > > > [2] https://issues.apache.org/jira/browse/IGNITE-16031 >> >> >> > > > >> >> >> > > >> >> >> > > >> >> >> > > -- >> >> >> > > >> >> >> > > Best regards, >> >> >> > > Ivan Pavlukhin >> >> >> > > >> >> >> > >> >> >> >> >> > >> >> >> >> >> >> -- >> >> >> >> Best regards, >> >> Ivan Pavlukhin >> >> >> > >> >> >> -- >> >> Best regards, >> Ivan Pavlukhin >> >
-- Best regards, Ivan Pavlukhin