Re: [DISCUSS] IEP-71 Public API for secondary index search

Maksim Timonin Wed, 07 Apr 2021 01:18:08 -0700

Hi, Andrey!

Thanks for the review and your comments!


>> Is it possible to extend ScanQuery functionality to pass index condition
I investigated this way and see some issues:
1. Querying of indexes is not a scan actually. It's
a tree traverse (predicate operation is an exclusion, other operations like
gt, lt, min, max have explicit boundaries). An index query consists of
conditions that match an index structure. In general for a multi-key index
there can be multiple conditions. The ScanQuery API provides a filter as
param that for case of index query should be splitted on such conditions.
It looks like a non-trivial task.
2. Querying of an index requires a sorted result, while The ScanQuery
doesn't matter about that. So there will be a different behavior of the
iterator for scanning a cache and querying indexes. It's not much to
implement I think, but it can make ScanQuery unclear for a user.

Maybe it's a point to separate traverse (gt, lt, in, etc...) and scan
(predicate) index operations to different API. So there still will be a new
query type for the traversing.

But we will introduce some inheritors for ScanQuery, like TableScanQuery
and IndexScanQuery, for scan and filter. Then the question is about
ordering, Cache and Table scans aren't ordered, but Index is. Then we can
introduce an optional param "order" for ScanQuery too.

WDYT?

>> Functional indices
>> This task looks like a huge one because the lifecycle of such classes
should be described first
I agree with you. That this part should be investigated deeper than I did.
So let's postpone discussion about functional indexes for a while. IEP-71
declares some phases, functional indexes are part of the 2nd phase, but
users will get new functionality already from the 1st phase. Then I'll dig
into things you mentioned. Thanks for pointing them out.

>> IndexScan by the predicate is questionable
Also in comments to the IEP on the Confluence you mentioned about
deserialization that is required to get an object for predicate function.
Now I see it like that:
1. The predicate should operate only with indexed fields;
2. User win from predicate only if index is inlined properly (even a part
of rows aren't inlined due to varlen - it still can be faster then make a
ScanQuery);
3. Ignite creates a proxy object that is filled with objects that are
inlined. If a user tries to access a field that isn't inlined or not
indexed, then deserialization will start and Ignite will log.warn() about
that.

So, I think it's a valid use case. Is there smth I'm missing?





On Tue, Apr 6, 2021 at 6:21 PM Andrey Mashenkov <[email protected]>
wrote:

> Hi Maksim,
>
> Nice idea, I'd like to see this feature in Ignite.
> The motivation is clear to me, it would be nice to have fast scans and omit
> SQL overhead on planning, parsing and etc in some simple use-cases.
>
> I've left few minor comments to the IEP, but I have the next questions
> which answer I failed to find in IEP.
> 1. Is it possible to extend ScanQuery functionality to pass index condition
> as a hint/parameter rather than create a separate query type?
> This allows a user to run a query over the particular table (for
> multi-table per cache case) and use an index for some type of conditions.
>
> 2. Functional indices, as you wrote, should use Functions distributed via
> peerClassLoading mechanics.
> This means there will no class with function on server sides and such
> classes are not persistent. Seems, they can survive grid restart.
> This task looks like a huge one because the lifecycle of such classes
> should be described first.
> Possible pitfalls are:
> * Durability. Function code MUST be persistent, to survive node restart as
> there can be no guaranteed classes available on the server-side.
> * Consistency. Server (and maybe clients) nodes MUST have the same class
> code at a time.
> * Code ownership. Would class code be shared or per-cache? If first, you
> can't just change class code by loading a new one, because other caches may
> use this function.
> If second, different caches may have different code/behavior, that may be
> non-obvious to end-user.
>
> 3. IndexScan by the predicate is questionable.
> Maybe it will can faster if there are multiple tables in a cache, but looks
> similar to ScanQuery with a filter.
>
> Also, I believe we can have a common API (configuring, creating, using) for
> all types of Indices, but
> some types (e.g. functional) will be ignored in SQL due to limited support
> on H2 side,
> and other types will be shared and could be used by ScanQuery engine as
> well as by SQL engine.
>
> On Tue, Apr 6, 2021 at 4:14 PM Maksim Timonin <[email protected]>
> wrote:
>
> > Hi, Igniters!
> >
> > I'd like to propose a new feature - opportunity to query and create
> indexes
> > from public API.
> >
> > It will help in some cases, where:
> > 1. SQL is not applicable by design of user application;
> > 2. Where IndexScan is preferable than ScanQuery for performance reasons;
> > 3. Functional indexes are required.
> >
> > Also it'll be great to have a transactional support for such queries,
> like
> > the "select for update" query provides. But I don't dig there much. It
> will
> > be a next step if this API will be implemented.
> >
> > I've prepared an IEP-71 for that [1] with more details. Please share your
> > thoughts.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-71+Public+API+for+secondary+index+search
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: [DISCUSS] IEP-71 Public API for secondary index search

Reply via email to