Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Roman Kondakov Tue, 10 Dec 2019 08:58:27 -0800

Alexey,

from my point of view Drill's approach looks like somewhat a hack:
sortedness and index lookups added to a removed from the query plan by
the special rules (which look very messy and complicated). Compare it to
the Phoneix approach where index is added to optimizer as a sorted view
of a table.



-- 
Kind Regards
Roman Kondakov


On 10.12.2019 17:44, Alexey Zinoviev wrote:
> I'd like Drill approach, worked and debugged with something similar, it's
> more easy to support
> 
> 
> Buuut, you have an implemented prototype (it votes for Phoenix in my mind)
> 
> вт, 10 дек. 2019 г. в 17:19, Vladimir Ozerov <[email protected]>:
> 
>> Hi Roman,
>>
>> Why do you think that Drill-style will not let you exploit collation?
>> Collation should be propagated from the index scan in the same way as in
>> other sorted operators, such as merge join or streaming aggregate. Provided
>> that you use converter-hack (or any alternative solution to trigger parent
>> re-analysis).
>> In other words, propagation of collation from Drill-style indexes should be
>> no different from other sorted operators.
>>
>> Regards,
>> Vladimir.
>>
>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <[email protected]
>>> :
>>
>>>
>>> Roman just as fast remark, Phoenix builds their approach on
>>> already existing monolith HBase architecture, most cases it`s just a stub
>>> for someone who wants use secondary indexes with a base with no
>>> native support of it. Don`t think it`s good idea here.
>>>
>>>>
>>>>
>>>> ------- Forwarded message -------
>>>> From: "Roman Kondakov" < [email protected] >
>>>> To:  [email protected]
>>>> Cc:
>>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite
>>>> planner
>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300
>>>>
>>>> Hi all!
>>>>
>>>> As you may know there is an activity on integration of Apache Calcite
>>>> query optimizer into Ignite codebase is being carried out [1],[2].
>>>>
>>>> One of a bunch of problems in this integration is the absence of
>>>> out-of-the-box support for secondary indexes in Apache Calcite. After
>>>> some research I came to conclusion that this problem has a couple of
>>>> workarounds. Let's name them
>>>> 1. Phoenix-style approach - representing secondary indexes as
>>>> materialized views which are natively supported by Calcite engine [3]
>>>> 2. Drill-style approach - pushing filters into the table scans and
>>>> choose appropriate index for lookups when possible [4]
>>>>
>>>> Both these approaches have advantages and disadvantages:
>>>>
>>>> Phoenix style pros:
>>>> - natural way of adding indexes as an alternative source of rows: index
>>>> can be considered as a kind of sorted materialized view.
>>>> - possibility of using index sortedness for stream aggregates,
>>>> deduplication (DISTINCT operator), merge joins, etc.
>>>> - ability to support other types of indexes (i.e. functional indexes).
>>>>
>>>> Phoenix style cons:
>>>> - polluting optimizer's search space extra table scans hence increasing
>>>> the planning time.
>>>>
>>>> Drill style pros:
>>>> - easier to implement (although it's questionable).
>>>> - search space is not inflated.
>>>>
>>>> Drill style cons:
>>>> - missed opportunity to exploit sortedness.
>>>>
>>>> There is a good discussion about using both approaches can be found in
>>> [5].
>>>>
>>>> I made a small sketch [6] in order to demonstrate the applicability of
>>>> the Phoenix approach to Ignite. Key design concepts are:
>>>> 1. On creating indexes are registered as tables in Calcite schema. This
>>>> step is needed for internal Calcite's routines.
>>>> 2. On planner initialization we register these indexes as materialized
>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization
>>>> method.
>>>> 3. Right before the query execution Calcite selects all materialized
>>>> views (indexes) which can be potentially used in query.
>>>> 4. During the query optimization indexes are registered by planner as
>>>> usual TableScans and hence can be chosen by optimizer if they have lower
>>>> cost.
>>>>
>>>> This sketch shows the ability to exploit index sortedness only. So the
>>>> future work in this direction should be focused on using indexes for
>>>> fast index lookups. At first glance FilterableTable and
>>>> FilterTableScanRule are good points to start. We can push Filter into
>>>> the TableScan and then use FilterableTable for fast index lookups
>>>> avoiding reading the whole index on TableScan step and then filtering
>>>> its output on the Filter step.
>>>>
>>>> What do you think?
>>>>
>>>>
>>>>
>>>> [1]
>>>>
>>>
>> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none
>>>> [2]
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>>>> [3]  https://issues.apache.org/jira/browse/PHOENIX-2047
>>>> [4]  https://issues.apache.org/jira/browse/DRILL-6381
>>>> [5]  https://issues.apache.org/jira/browse/DRILL-3929
>>>> [6]  https://github.com/apache/ignite/pull/7115
>>>
>>>
>>>
>>>
>>
>

Re: Adding support for Ignite secondary indexes to Apache Calcite planner

Reply via email to