Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Vladimir Ozerov Wed, 11 Dec 2019 06:12:35 -0800

Roman,

What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
integration with Calcite never made it to production, did it?


вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <kondako...@mail.ru.invalid>:

> Hi Vladimir,
>
> from what I understand, Drill does not exploit collation of indexes. To
> be precise it does not exploit index collation in "natural" way where,
> say, we a have sorted TableScan and hence we do not create a new Sort.
> Instead of it Drill always create a Sort operator, but if TableScan can
> be replaced with an IndexScan, this Sort operator is removed by the
> dedicated rule.
>
> Lets consider initial an operator tree:
>
> Project
>   Sort
>     TableScan
>
> after applying rule DbScanToIndexScanPrule this tree will be converted to:
>
> Project
>   Sort
>     IndexScan
>
> and finally, after applying DbScanSortRemovalRule we have:
>
> Project
>   IndexScan
>
> while for Phoenix approach we would have two equivalent subsets in our
> planner:
>
> Project
>   Sort
>     TableScan
>
> and
>
> Project
>   IndexScan
>
> and most likely the last plan  will be chosen as the best one.
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 10.12.2019 17:19, Vladimir Ozerov wrote:
> > Hi Roman,
> >
> > Why do you think that Drill-style will not let you exploit collation?
> > Collation should be propagated from the index scan in the same way as in
> > other sorted operators, such as merge join or streaming aggregate.
> Provided
> > that you use converter-hack (or any alternative solution to trigger
> parent
> > re-analysis).
> > In other words, propagation of collation from Drill-style indexes should
> be
> > no different from other sorted operators.
> >
> > Regards,
> > Vladimir.
> >
> > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
> <arzamas...@mail.ru.invalid
> >> :
> >
> >>
> >> Roman just as fast remark, Phoenix builds their approach on
> >> already existing monolith HBase architecture, most cases it`s just a
> stub
> >> for someone who wants use secondary indexes with a base with no
> >> native support of it. Don`t think it`s good idea here.
> >>
> >>>
> >>>
> >>> ------- Forwarded message -------
> >>> From: "Roman Kondakov" < kondako...@mail.ru.invalid >
> >>> To:  dev@ignite.apache.org
> >>> Cc:
> >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
> >>> planner
> >>> Date: Tue, 10 Dec 2019 15:55:52 +0300
> >>>
> >>> Hi all!
> >>>
> >>> As you may know there is an activity on integration of Apache Calcite
> >>> query optimizer into Ignite codebase is being carried out [1],[2].
> >>>
> >>> One of a bunch of problems in this integration is the absence of
> >>> out-of-the-box support for secondary indexes in Apache Calcite. After
> >>> some research I came to conclusion that this problem has a couple of
> >>> workarounds. Let's name them
> >>> 1. Phoenix-style approach - representing secondary indexes as
> >>> materialized views which are natively supported by Calcite engine [3]
> >>> 2. Drill-style approach - pushing filters into the table scans and
> >>> choose appropriate index for lookups when possible [4]
> >>>
> >>> Both these approaches have advantages and disadvantages:
> >>>
> >>> Phoenix style pros:
> >>> - natural way of adding indexes as an alternative source of rows: index
> >>> can be considered as a kind of sorted materialized view.
> >>> - possibility of using index sortedness for stream aggregates,
> >>> deduplication (DISTINCT operator), merge joins, etc.
> >>> - ability to support other types of indexes (i.e. functional indexes).
> >>>
> >>> Phoenix style cons:
> >>> - polluting optimizer's search space extra table scans hence increasing
> >>> the planning time.
> >>>
> >>> Drill style pros:
> >>> - easier to implement (although it's questionable).
> >>> - search space is not inflated.
> >>>
> >>> Drill style cons:
> >>> - missed opportunity to exploit sortedness.
> >>>
> >>> There is a good discussion about using both approaches can be found in
> >> [5].
> >>>
> >>> I made a small sketch [6] in order to demonstrate the applicability of
> >>> the Phoenix approach to Ignite. Key design concepts are:
> >>> 1. On creating indexes are registered as tables in Calcite schema. This
> >>> step is needed for internal Calcite's routines.
> >>> 2. On planner initialization we register these indexes as materialized
> >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
> >>> method.
> >>> 3. Right before the query execution Calcite selects all materialized
> >>> views (indexes) which can be potentially used in query.
> >>> 4. During the query optimization indexes are registered by planner as
> >>> usual TableScans and hence can be chosen by optimizer if they have
> lower
> >>> cost.
> >>>
> >>> This sketch shows the ability to exploit index sortedness only. So the
> >>> future work in this direction should be focused on using indexes for
> >>> fast index lookups. At first glance FilterableTable and
> >>> FilterTableScanRule are good points to start. We can push Filter into
> >>> the TableScan and then use FilterableTable for fast index lookups
> >>> avoiding reading the whole index on TableScan step and then filtering
> >>> its output on the Filter step.
> >>>
> >>> What do you think?
> >>>
> >>>
> >>>
> >>> [1]
> >>>
> >>
> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
> >>> [2]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
> >>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
> >>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
> >>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
> >>> [6]  https://github.com/apache/ignite/pull/7115
> >>
> >>
> >>
> >>
> >
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Reply via email to