Alexey, from my point of view Drill's approach looks like somewhat a hack: sortedness and index lookups added to a removed from the query plan by the special rules (which look very messy and complicated). Compare it to the Phoneix approach where index is added to optimizer as a sorted view of a table.
-- Kind Regards Roman Kondakov On 10.12.2019 17:44, Alexey Zinoviev wrote: > I'd like Drill approach, worked and debugged with something similar, it's > more easy to support > > > Buuut, you have an implemented prototype (it votes for Phoenix in my mind) > > вт, 10 дек. 2019 г. в 17:19, Vladimir Ozerov <[email protected]>: > >> Hi Roman, >> >> Why do you think that Drill-style will not let you exploit collation? >> Collation should be propagated from the index scan in the same way as in >> other sorted operators, such as merge join or streaming aggregate. Provided >> that you use converter-hack (or any alternative solution to trigger parent >> re-analysis). >> In other words, propagation of collation from Drill-style indexes should be >> no different from other sorted operators. >> >> Regards, >> Vladimir. >> >> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky <[email protected] >>> : >> >>> >>> Roman just as fast remark, Phoenix builds their approach on >>> already existing monolith HBase architecture, most cases it`s just a stub >>> for someone who wants use secondary indexes with a base with no >>> native support of it. Don`t think it`s good idea here. >>> >>>> >>>> >>>> ------- Forwarded message ------- >>>> From: "Roman Kondakov" < [email protected] > >>>> To: [email protected] >>>> Cc: >>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite >>>> planner >>>> Date: Tue, 10 Dec 2019 15:55:52 +0300 >>>> >>>> Hi all! >>>> >>>> As you may know there is an activity on integration of Apache Calcite >>>> query optimizer into Ignite codebase is being carried out [1],[2]. >>>> >>>> One of a bunch of problems in this integration is the absence of >>>> out-of-the-box support for secondary indexes in Apache Calcite. After >>>> some research I came to conclusion that this problem has a couple of >>>> workarounds. Let's name them >>>> 1. Phoenix-style approach - representing secondary indexes as >>>> materialized views which are natively supported by Calcite engine [3] >>>> 2. Drill-style approach - pushing filters into the table scans and >>>> choose appropriate index for lookups when possible [4] >>>> >>>> Both these approaches have advantages and disadvantages: >>>> >>>> Phoenix style pros: >>>> - natural way of adding indexes as an alternative source of rows: index >>>> can be considered as a kind of sorted materialized view. >>>> - possibility of using index sortedness for stream aggregates, >>>> deduplication (DISTINCT operator), merge joins, etc. >>>> - ability to support other types of indexes (i.e. functional indexes). >>>> >>>> Phoenix style cons: >>>> - polluting optimizer's search space extra table scans hence increasing >>>> the planning time. >>>> >>>> Drill style pros: >>>> - easier to implement (although it's questionable). >>>> - search space is not inflated. >>>> >>>> Drill style cons: >>>> - missed opportunity to exploit sortedness. >>>> >>>> There is a good discussion about using both approaches can be found in >>> [5]. >>>> >>>> I made a small sketch [6] in order to demonstrate the applicability of >>>> the Phoenix approach to Ignite. Key design concepts are: >>>> 1. On creating indexes are registered as tables in Calcite schema. This >>>> step is needed for internal Calcite's routines. >>>> 2. On planner initialization we register these indexes as materialized >>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization >>>> method. >>>> 3. Right before the query execution Calcite selects all materialized >>>> views (indexes) which can be potentially used in query. >>>> 4. During the query optimization indexes are registered by planner as >>>> usual TableScans and hence can be chosen by optimizer if they have lower >>>> cost. >>>> >>>> This sketch shows the ability to exploit index sortedness only. So the >>>> future work in this direction should be focused on using indexes for >>>> fast index lookups. At first glance FilterableTable and >>>> FilterTableScanRule are good points to start. We can push Filter into >>>> the TableScan and then use FilterableTable for fast index lookups >>>> avoiding reading the whole index on TableScan step and then filtering >>>> its output on the Filter step. >>>> >>>> What do you think? >>>> >>>> >>>> >>>> [1] >>>> >>> >> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none >>>> [2] >>>> >>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine >>>> [3] https://issues.apache.org/jira/browse/PHOENIX-2047 >>>> [4] https://issues.apache.org/jira/browse/DRILL-6381 >>>> [5] https://issues.apache.org/jira/browse/DRILL-3929 >>>> [6] https://github.com/apache/ignite/pull/7115 >>> >>> >>> >>> >> >
