Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov Wed, 11 Dec 2019 07:38:29 -0800

Vladimir,

the main advantage of the Phoenix approach I can see is the using of
Calcite's native materializations API. Calcite has advanced support for
materializations [1] and lattices [2]. Since secondary indexes can be
considered as materialized views (it's just a sorted representation of
the same table) we can seamlessly use views to simulate indexes behavior
for Calcite planner.



[1] https://calcite.apache.org/docs/materialized_views.html
[2] https://calcite.apache.org/docs/lattice.html

-- 
Kind Regards
Roman Kondakov


On 11.12.2019 17:11, Vladimir Ozerov wrote:
> Roman,
> 
> What is the advantage of Phoenix approach then? BTW, it looks like Phoenix
> integration with Calcite never made it to production, did it?
> 
> вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <kondako...@mail.ru.invalid>:
> 
>> Hi Vladimir,
>>
>> from what I understand, Drill does not exploit collation of indexes. To
>> be precise it does not exploit index collation in "natural" way where,
>> say, we a have sorted TableScan and hence we do not create a new Sort.
>> Instead of it Drill always create a Sort operator, but if TableScan can
>> be replaced with an IndexScan, this Sort operator is removed by the
>> dedicated rule.
>>
>> Lets consider initial an operator tree:
>>
>> Project
>>   Sort
>>     TableScan
>>
>> after applying rule DbScanToIndexScanPrule this tree will be converted to:
>>
>> Project
>>   Sort
>>     IndexScan
>>
>> and finally, after applying DbScanSortRemovalRule we have:
>>
>> Project
>>   IndexScan
>>
>> while for Phoenix approach we would have two equivalent subsets in our
>> planner:
>>
>> Project
>>   Sort
>>     TableScan
>>
>> and
>>
>> Project
>>   IndexScan
>>
>> and most likely the last plan  will be chosen as the best one.
>>
>> --
>> Kind Regards
>> Roman Kondakov
>>
>>
>> On 10.12.2019 17:19, Vladimir Ozerov wrote:
>>> Hi Roman,
>>>
>>> Why do you think that Drill-style will not let you exploit collation?
>>> Collation should be propagated from the index scan in the same way as in
>>> other sorted operators, such as merge join or streaming aggregate.
>> Provided
>>> that you use converter-hack (or any alternative solution to trigger
>> parent
>>> re-analysis).
>>> In other words, propagation of collation from Drill-style indexes should
>> be
>>> no different from other sorted operators.
>>>
>>> Regards,
>>> Vladimir.
>>>
>>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky
>> <arzamas...@mail.ru.invalid
>>>> :
>>>
>>>>
>>>> Roman just as fast remark, Phoenix builds their approach on
>>>> already existing monolith HBase architecture, most cases it`s just a
>> stub
>>>> for someone who wants use secondary indexes with a base with no
>>>> native support of it. Don`t think it`s good idea here.
>>>>
>>>>>
>>>>>
>>>>> ------- Forwarded message -------
>>>>> From: "Roman Kondakov" < kondako...@mail.ru.invalid >
>>>>> To:  dev@ignite.apache.org
>>>>> Cc:
>>>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>>>>> planner
>>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>>
>>>>> Hi all!
>>>>>
>>>>> As you may know there is an activity on integration of Apache Calcite
>>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>>
>>>>> One of a bunch of problems in this integration is the absence of
>>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>>> some research I came to conclusion that this problem has a couple of
>>>>> workarounds. Let's name them
>>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>>> materialized views which are natively supported by Calcite engine [3]
>>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>>> choose appropriate index for lookups when possible [4]
>>>>>
>>>>> Both these approaches have advantages and disadvantages:
>>>>>
>>>>> Phoenix style pros:
>>>>> - natural way of adding indexes as an alternative source of rows: index
>>>>> can be considered as a kind of sorted materialized view.
>>>>> - possibility of using index sortedness for stream aggregates,
>>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>>> - ability to support other types of indexes (i.e. functional indexes).
>>>>>
>>>>> Phoenix style cons:
>>>>> - polluting optimizer's search space extra table scans hence increasing
>>>>> the planning time.
>>>>>
>>>>> Drill style pros:
>>>>> - easier to implement (although it's questionable).
>>>>> - search space is not inflated.
>>>>>
>>>>> Drill style cons:
>>>>> - missed opportunity to exploit sortedness.
>>>>>
>>>>> There is a good discussion about using both approaches can be found in
>>>> [5].
>>>>>
>>>>> I made a small sketch [6] in order to demonstrate the applicability of
>>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>>> 1. On creating indexes are registered as tables in Calcite schema. This
>>>>> step is needed for internal Calcite's routines.
>>>>> 2. On planner initialization we register these indexes as materialized
>>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>>> method.
>>>>> 3. Right before the query execution Calcite selects all materialized
>>>>> views (indexes) which can be potentially used in query.
>>>>> 4. During the query optimization indexes are registered by planner as
>>>>> usual TableScans and hence can be chosen by optimizer if they have
>> lower
>>>>> cost.
>>>>>
>>>>> This sketch shows the ability to exploit index sortedness only. So the
>>>>> future work in this direction should be focused on using indexes for
>>>>> fast index lookups. At first glance FilterableTable and
>>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>>> the TableScan and then use FilterableTable for fast index lookups
>>>>> avoiding reading the whole index on TableScan step and then filtering
>>>>> its output on the Filter step.
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>>> [2]
>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Reply via email to