Re: [DISCUSSION] Reserve partitions for CacheQueries

Ivan Pavlukhin Mon, 13 Dec 2021 02:15:24 -0800

Thanks for details!

2021-12-13 11:14 GMT+03:00, Maksim Timonin <timoninma...@apache.org>:
> Hi, Ivan!
>
>> Does IndexQuery has separate codebase? Does it share some code with
> ScanQuery
>
> Yes, IndexQuery mostly shares processing with ScanQuery:
> requests/responses, a remote filter. Reducer (merge of query results) on an
> initiator node has its own implementation for IndexQuery (MergeSort), but
> it's built into existing ScanQuery processing.
>
> On Mon, Dec 13, 2021 at 11:00 AM Ivan Pavlukhin <vololo...@gmail.com>
> wrote:
>
>> > Then, if there are no objections, the short-term plan is:
>>
>> Sounds ok for me.
>>
>> > AFAIR Scan and SQL queries implementations are totally different. Could
>> you tell me how Index queries fit there?
>>
>> I suppose my questions was misleading. Actually I would like to know
>> how code is organized today. AFAIR SQL and scan queries has their own
>> codebases (messages, merging results from different nodes and etc).
>> Does IndexQuery has separate codebase? Does it share some code with
>> ScanQuery on a "query" layer (higher than BTree layer)?
>>
>> 2021-12-10 13:25 GMT+03:00, Maksim Timonin <timoninma...@apache.org>:
>> > Hi!
>> >
>> >> Existing users may already have some logic in place to handle
>> > inconsistencies
>> >
>> > Pavel, I'm not aware of such users but your comment makes sense. So, I'
>> OK
>> > with adding an option for ScanQuery. Naming of the option is debatable.
>> The
>> > name "reservePartitions" looks good, but we actually already reserve a
>> > partition when it is explicitly specified in ScanQuery. Then, it can be
>> > a
>> > bit misleading in the case of explicitly setting this param to `false`
>> with
>> > a specified partition. But we can just mention it in javadocs, that the
>> > setting affects only full scan.
>> >
>> >> AFAIR Scan and SQL queries implementations are totally different.
>> >> Could
>> > you tell me how Index queries fit there?
>> >
>> > IndexQuery scans indexes like SQL does with few optimizations of
>> traversing
>> > BPlusTree, also there are some differences in query processing, also
>> > IndexQuery provides sorted results by default. But I don't expect
>> > inconsistency in results for SQL / IndexQuery for similar queries.
>> > Actually, I should add tests for that and fix failures if any.
>> >
>> >> In an ideal world it would be great to have only one API
>> >
>> > I think it's possible to teach SQL to switch to cache queries for some
>> > cases, or provide such an opportunity with hints or explicit functions.
>> And
>> > these queries might be parts of an SQL query plan. Or we can go even
>> > deeper, maybe it could be like Apache Spark stages, when we can build
>> > our
>> > plan with different types of queries, and the same type provides to
>> > users the opportunity to run only specific stages: DataFrame.sql() or
>> > DataFrame.filter(inline=true/false).
>> >
>> >> Perhaps IndexQuery should also cover regular entry iteration cases
>> >
>> > It's possible too. IndexQuery provides an opportunity to scan the PK
>> index,
>> > then it can start ScanQuery under the hood for some cases.
>> >
>> > But anyway, to make them run through the single API, we should provide
>> the
>> > same guarantees.
>> >
>> > Then, if there are no objections, the short-term plan is:
>> > 1. Implement partition reservation for IndexQuery.
>> > 2. Add the option for ScanQuery.
>> > 3. Make tests that show that IndexQuery / SQL / ScanQuery is
>> > replaceable
>> > for some types of queries in terms of result consistency.
>> > 4. Then discuss again, how we can integrate them together.
>> >
>> >
>> > On Fri, Dec 10, 2021 at 11:41 AM Ivan Pavlukhin <vololo...@gmail.com>
>> > wrote:
>> >
>> >> Actually I brought a point about using SQL queries instead of scan
>> >> queries because I worry about inconsistency between different
>> >> implementations. AFAIR Scan and SQL queries implementations are
>> >> totally different. Could you tell me how Index queries fit there?
>> >>
>> >> My general ideas are as follows:
>> >> 1. In an ideal world it would be great to have only one API (to rule
>> >> them all) and implementation to cover all cases (i.e. scan, index,
>> >> SQL). Also, I wonder how other vendors tackle this?
>> >> 2. New IndexQuery might not mimic ScanQuery behavior but instead have
>> >> a correct/expected one (including partition reservation). Perhaps
>> >> IndexQuery should also cover regular entry iteration (scan) cases and
>> >> become a new ground for generalized scan/index queries.
>> >>
>> >> 2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn <ptupit...@apache.org>:
>> >> > IndexQuery is experimental, so we can indeed make it reserve
>> partitions
>> >> by
>> >> > default. No objections here.
>> >> >
>> >> > But with ScanQuery, I think it's better to add a "reservePartitions"
>> >> > property so that default behavior is not affected.
>> >> > Existing users may already have some logic in place to handle
>> >> > inconsistencies. So if we enable reservation by default,
>> >> > they'll get an unnecessary performance degradation.
>> >> >
>> >> > On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin <
>> timoninma...@apache.org>
>> >> > wrote:
>> >> >
>> >> >> Hi, Ivan, Pavel! Thanks for your responses.
>> >> >>
>> >> >> > But is there a real need/benefit from using scan queries over
>> >> primitive
>> >> >> SQL queries today
>> >> >>
>> >> >> In addition to Pavel's response, I'd like to add about IndexQuery.
>> >> >> This
>> >> >> query is in experimental status, but it shows great performance and
>> >> beats
>> >> >> SQL for *some* type of queries. For IndexQuery we can leverage on
>> >> >> understanding that we use only index and apply some optimizations:
>> >> >> skip
>> >> >> boundaries check (h2 Select.isConditionMet), apply filtering by
>> inline
>> >> IO
>> >> >> instead of extracting fields from BinaryObjects, also we skip some
>> >> >> H2
>> >> >> related stuff. Also it's not implemented yet, but for IndexQuery we
>> >> >> can
>> >> >> access data pages sequentially instead of randomly (by iterating
>> >> >> with
>> >> >> IndexRow cursor by batches), and I expect it may make performance
>> >> >> better
>> >> >> in
>> >> >> some highload cases.
>> >> >>
>> >> >> > it is desirable that successful (but may be strange) scenarios
>> >> continue
>> >> >> to work after fix (and not fail e.g. due to some introduced
>> exception)
>> >> >>
>> >> >> > Partition reservation should be opt-in if we decide to proceed.
>> >> >>
>> >> >> With the approach I proposed we can reach it. Without specified
>> >> >> query
>> >> >> timeout (and cache queries don't have a public API for setting
>> >> >> timeout),
>> >> >> we
>> >> >> can retry our attempts to reserve partitions while we do not
>> >> >> succeed.
>> >> So,
>> >> >> users may get some increase in query time execution under unstable
>> >> >> topology. But in exchange users will get more stable results, and
>> >> >> avoid
>> >> >> exceptions like in the ticket I showed [1].
>> >> >>
>> >> >> We can implement this logic for IndexQuery only (as it's only
>> >> >> experimental
>> >> >> since 2.12), and extend it on ScanQuery later after stabilizing
>> >> >> with
>> >> >> IndexQuery. WDYT?
>> >> >>
>> >> >>
>> >> >> [1] https://issues.apache.org/jira/browse/IGNITE-12591
>> >> >>
>> >> >> On Thu, Dec 9, 2021 at 10:39 AM Pavel Tupitsyn
>> >> >> <ptupit...@apache.org
>> >
>> >> >> wrote:
>> >> >>
>> >> >> > Agree with Ivan regarding compatibility.
>> >> >> > Partition reservation should be opt-in if we decide to proceed.
>> >> >> >
>> >> >> > > is there a real need/benefit from using
>> >> >> > > scan queries over primitive SQL queries today
>> >> >> >
>> >> >> > SQL requires additional configuration (QueryEntity) and involves
>> >> memory
>> >> >> and
>> >> >> > CPU overhead to maintain the indexes.
>> >> >> > Especially if the query is used rarely (maybe some maintenance
>> >> >> > tasks),
>> >> >> scan
>> >> >> > is a better option.
>> >> >> >
>> >> >> > On Thu, Dec 9, 2021 at 10:12 AM Ivan Pavlukhin <
>> vololo...@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi Maksim,
>> >> >> > >
>> >> >> > > Thank you for looking into this. While fixing wrong/surprising
>> >> >> > > behavior is very important, I also have some concerns, let's
>> >> >> > > say,
>> >> >> > > from
>> >> >> > > different angle of view:
>> >> >> > > 1. From a first glance it seems that similar behavior of scan
>> >> >> > > and
>> >> SQL
>> >> >> > > queries is a good idea. But is there a real need/benefit from
>> >> >> > > using
>> >> >> > > scan queries over primitive SQL queries today (e.g. SELECT *
>> >> >> > > FROM
>> >> >> > > table1)?
>> >> >> > > 2. Also we need to carefully think about compatibility. It is
>> >> >> > > desirable that successful (but may be strange) scenarios
>> >> >> > > continue
>> >> >> > > to
>> >> >> > > work after fix (and not fail e.g. due to some introduced
>> >> >> > > exception).
>> >> >> > >
>> >> >> > > Regarding partition reservation for long-time open cursors I
>> guess
>> >> >> > > some ideas might be found in LAZY mode for SQL queries [1].
>> >> >> > >
>> >> >> > > [1] https://issues.apache.org/jira/browse/IGNITE-9171
>> >> >> > >
>> >> >> > > 2021-12-08 20:11 GMT+03:00, Maksim Timonin
>> >> >> > > <timoninma...@apache.org
>> >> >:
>> >> >> > > > Hi, Igniters!
>> >> >> > > >
>> >> >> > > > There is a known issue that ScanQuery on unstable topology
>> >> >> > > > returns
>> >> >> > > > incorrect results: duplicates data [1] or fails with an
>> >> >> > > > exception
>> >> >> [2].
>> >> >> > > The
>> >> >> > > > reason is ScanQuery doesn't reserve partitions.
>> >> >> > > >
>> >> >> > > > IndexQuery shares the same query processing as ScanQuery, and
>> >> >> > > > then
>> >> >> > > > it
>> >> >> > is
>> >> >> > > > also affected by unstable topology. I want to fix it for
>> >> >> > > > IndexQuery.
>> >> >> > > > IndexQuery should provide the same stability as SQL queries
>> >> >> > > > do
>> -
>> >> no
>> >> >> > > > occasional failures or data duplication.
>> >> >> > > >
>> >> >> > > > I dived into the SQL query processing and found that we can
>> >> >> > > > unify
>> >> >> logic
>> >> >> > > for
>> >> >> > > > SQL queries and cache queries (ScanQuery, IndexQuery,
>> >> >> > > > TextQuery):
>> >> >> > > >
>> >> >> > > > 1. Currently, cache queries use `GridCacheQueryAdapter#nodes`
>> to
>> >> >> > > > find
>> >> >> > > nodes
>> >> >> > > > to run a query. From the other side, SQL processing uses for
>> the
>> >> >> > > > same
>> >> >> > > goal
>> >> >> > > > `ReducePartitionMapper#nodesForPartitions`. It looks like
>> >> >> > > > those
>> >> >> methods
>> >> >> > > > have some slight differences, but in the common, they do the
>> >> >> > > > same.
>> >> >> So,
>> >> >> > I
>> >> >> > > > propose to make a single method for both cases.
>> >> >> > > >
>> >> >> > > > 2. SQL processing uses `PartitionReservationManager` for
>> reserve
>> >> >> > > partitions
>> >> >> > > > on the map side. I propose to move it to the ignite-core
>> >> >> > > > module
>> >> and
>> >> >> > start
>> >> >> > > > using it for the cache queries.
>> >> >> > > >
>> >> >> > > > 3. Implement retries for the cache queries in case we failed
>> >> >> > > > to
>> >> >> reserve
>> >> >> > > > partitions on the map side.
>> >> >> > > >
>> >> >> > > > Currently, I see a downside of reserving partitions for cache
>> >> >> queries:
>> >> >> > > > cache queries are lazy. And the time of partition reservation
>> >> >> > > > depends
>> >> >> > on
>> >> >> > > a
>> >> >> > > > user's application code (how fast a cursor is iterated and
>> >> closed).
>> >> >> > > AFAIU,
>> >> >> > > > it's not very good to have a partition in reserve too long.
>> >> Please,
>> >> >> > > correct
>> >> >> > > > me if I'm wrong here.
>> >> >> > > >
>> >> >> > > > But from the other side, Ignite reserves partitions for
>> >> >> > > > ScanQuery
>> >> >> when
>> >> >> > a
>> >> >> > > > partition has been specified as a ScanQuery parameter, and
>> >> >> > > > Ignite
>> >> >> > > reserves
>> >> >> > > > partitions for SQL with the flag lazy=true. Also:
>> >> >> > > > - IndexQuery: I expect simple queries that will return a
>> >> relatively
>> >> >> > small
>> >> >> > > > amount of data. Then partitions wouldn't be reserved too much
>> >> time.
>> >> >> > > > - the same is for TextQuery - it returns a limited amount of
>> >> >> > > > data
>> >> >> (due
>> >> >> > to
>> >> >> > > > the Lucene logic).
>> >> >> > > > - full ScanQuery it's not in use much, AFAIK. So, it's by
>> >> >> > > > default
>> >> a
>> >> >> > > pretty
>> >> >> > > > heavy operation.
>> >> >> > > >
>> >> >> > > > So, I think it's safe to reserve partitions in any case. But
>> >> >> > > > there
>> >> >> > could
>> >> >> > > be
>> >> >> > > > an alternative optimistic approach, smth like that:
>> >> >> > > > 1. Check that topology is stable and waiting while it's not
>> >> >> stabilized.
>> >> >> > > > 2. Run a query on a stable cluster.
>> >> >> > > > 3. In cases when a cluster becomes unstable during query
>> >> >> > > > execution
>> >> >> > > > -
>> >> >> > try
>> >> >> > > to
>> >> >> > > > reserve partitions at runtime (query should be aware of
>> topology
>> >> >> > changes)
>> >> >> > > > and fail in case of reservation failure (if a user already
>> >> >> > > > fetched
>> >> >> some
>> >> >> > > > data).
>> >> >> > > >
>> >> >> > > > I don't like the idea of this optimistic approach because in
>> >> >> > > > case
>> >> a
>> >> >> > user
>> >> >> > > > got some data, we don't have a better solution than to fail a
>> >> query
>> >> >> in
>> >> >> > > case
>> >> >> > > > of cluster instability and reservation failure.
>> >> >> > > >
>> >> >> > > > Igniters, WDYT?
>> >> >> > > >
>> >> >> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12591
>> >> >> > > > [2] https://issues.apache.org/jira/browse/IGNITE-16031
>> >> >> > > >
>> >> >> > >
>> >> >> > >
>> >> >> > > --
>> >> >> > >
>> >> >> > > Best regards,
>> >> >> > > Ivan Pavlukhin
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >>
>> >> --
>> >>
>> >> Best regards,
>> >> Ivan Pavlukhin
>> >>
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>



-- 

Best regards,
Ivan Pavlukhin

Re: [DISCUSSION] Reserve partitions for CacheQueries

Reply via email to