Vladimir, the main advantage of the Phoenix approach I can see is the using of Calcite's native materializations API. Calcite has advanced support for materializations [1] and lattices [2]. Since secondary indexes can be considered as materialized views (it's just a sorted representation of the same table) we can seamlessly use views to simulate indexes behavior for Calcite planner.
[1] https://calcite.apache.org/docs/materialized_views.html [2] https://calcite.apache.org/docs/lattice.html -- Kind Regards Roman Kondakov On 11.12.2019 17:11, Vladimir Ozerov wrote: > Roman, > > What is the advantage of Phoenix approach then? BTW, it looks like Phoenix > integration with Calcite never made it to production, did it? > > вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <kondako...@mail.ru.invalid>: > >> Hi Vladimir, >> >> from what I understand, Drill does not exploit collation of indexes. To >> be precise it does not exploit index collation in "natural" way where, >> say, we a have sorted TableScan and hence we do not create a new Sort. >> Instead of it Drill always create a Sort operator, but if TableScan can >> be replaced with an IndexScan, this Sort operator is removed by the >> dedicated rule. >> >> Lets consider initial an operator tree: >> >> Project >> Sort >> TableScan >> >> after applying rule DbScanToIndexScanPrule this tree will be converted to: >> >> Project >> Sort >> IndexScan >> >> and finally, after applying DbScanSortRemovalRule we have: >> >> Project >> IndexScan >> >> while for Phoenix approach we would have two equivalent subsets in our >> planner: >> >> Project >> Sort >> TableScan >> >> and >> >> Project >> IndexScan >> >> and most likely the last plan will be chosen as the best one. >> >> -- >> Kind Regards >> Roman Kondakov >> >> >> On 10.12.2019 17:19, Vladimir Ozerov wrote: >>> Hi Roman, >>> >>> Why do you think that Drill-style will not let you exploit collation? >>> Collation should be propagated from the index scan in the same way as in >>> other sorted operators, such as merge join or streaming aggregate. >> Provided >>> that you use converter-hack (or any alternative solution to trigger >> parent >>> re-analysis). >>> In other words, propagation of collation from Drill-style indexes should >> be >>> no different from other sorted operators. >>> >>> Regards, >>> Vladimir. >>> >>> вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky >> <arzamas...@mail.ru.invalid >>>> : >>> >>>> >>>> Roman just as fast remark, Phoenix builds their approach on >>>> already existing monolith HBase architecture, most cases it`s just a >> stub >>>> for someone who wants use secondary indexes with a base with no >>>> native support of it. Don`t think it`s good idea here. >>>> >>>>> >>>>> >>>>> ------- Forwarded message ------- >>>>> From: "Roman Kondakov" < kondako...@mail.ru.invalid > >>>>> To: dev@ignite.apache.org >>>>> Cc: >>>>> Subject: Adding support for Ignite secondary indexes to Apache Calcite >>>>> planner >>>>> Date: Tue, 10 Dec 2019 15:55:52 +0300 >>>>> >>>>> Hi all! >>>>> >>>>> As you may know there is an activity on integration of Apache Calcite >>>>> query optimizer into Ignite codebase is being carried out [1],[2]. >>>>> >>>>> One of a bunch of problems in this integration is the absence of >>>>> out-of-the-box support for secondary indexes in Apache Calcite. After >>>>> some research I came to conclusion that this problem has a couple of >>>>> workarounds. Let's name them >>>>> 1. Phoenix-style approach - representing secondary indexes as >>>>> materialized views which are natively supported by Calcite engine [3] >>>>> 2. Drill-style approach - pushing filters into the table scans and >>>>> choose appropriate index for lookups when possible [4] >>>>> >>>>> Both these approaches have advantages and disadvantages: >>>>> >>>>> Phoenix style pros: >>>>> - natural way of adding indexes as an alternative source of rows: index >>>>> can be considered as a kind of sorted materialized view. >>>>> - possibility of using index sortedness for stream aggregates, >>>>> deduplication (DISTINCT operator), merge joins, etc. >>>>> - ability to support other types of indexes (i.e. functional indexes). >>>>> >>>>> Phoenix style cons: >>>>> - polluting optimizer's search space extra table scans hence increasing >>>>> the planning time. >>>>> >>>>> Drill style pros: >>>>> - easier to implement (although it's questionable). >>>>> - search space is not inflated. >>>>> >>>>> Drill style cons: >>>>> - missed opportunity to exploit sortedness. >>>>> >>>>> There is a good discussion about using both approaches can be found in >>>> [5]. >>>>> >>>>> I made a small sketch [6] in order to demonstrate the applicability of >>>>> the Phoenix approach to Ignite. Key design concepts are: >>>>> 1. On creating indexes are registered as tables in Calcite schema. This >>>>> step is needed for internal Calcite's routines. >>>>> 2. On planner initialization we register these indexes as materialized >>>>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization >>>>> method. >>>>> 3. Right before the query execution Calcite selects all materialized >>>>> views (indexes) which can be potentially used in query. >>>>> 4. During the query optimization indexes are registered by planner as >>>>> usual TableScans and hence can be chosen by optimizer if they have >> lower >>>>> cost. >>>>> >>>>> This sketch shows the ability to exploit index sortedness only. So the >>>>> future work in this direction should be focused on using indexes for >>>>> fast index lookups. At first glance FilterableTable and >>>>> FilterTableScanRule are good points to start. We can push Filter into >>>>> the TableScan and then use FilterableTable for fast index lookups >>>>> avoiding reading the whole index on TableScan step and then filtering >>>>> its output on the Filter step. >>>>> >>>>> What do you think? >>>>> >>>>> >>>>> >>>>> [1] >>>>> >>>> >> http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none >>>>> [2] >>>>> >>>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine >>>>> [3] https://issues.apache.org/jira/browse/PHOENIX-2047 >>>>> [4] https://issues.apache.org/jira/browse/DRILL-6381 >>>>> [5] https://issues.apache.org/jira/browse/DRILL-3929 >>>>> [6] https://github.com/apache/ignite/pull/7115 >>>> >>>> >>>> >>>> >>> >> >