Roman, What is the advantage of Phoenix approach then? BTW, it looks like Phoenix integration with Calcite never made it to production, did it?
вт, 10 дек. 2019 г. в 19:50, Roman Kondakov <kondako...@mail.ru.invalid>: > Hi Vladimir, > > from what I understand, Drill does not exploit collation of indexes. To > be precise it does not exploit index collation in "natural" way where, > say, we a have sorted TableScan and hence we do not create a new Sort. > Instead of it Drill always create a Sort operator, but if TableScan can > be replaced with an IndexScan, this Sort operator is removed by the > dedicated rule. > > Lets consider initial an operator tree: > > Project > Sort > TableScan > > after applying rule DbScanToIndexScanPrule this tree will be converted to: > > Project > Sort > IndexScan > > and finally, after applying DbScanSortRemovalRule we have: > > Project > IndexScan > > while for Phoenix approach we would have two equivalent subsets in our > planner: > > Project > Sort > TableScan > > and > > Project > IndexScan > > and most likely the last plan will be chosen as the best one. > > -- > Kind Regards > Roman Kondakov > > > On 10.12.2019 17:19, Vladimir Ozerov wrote: > > Hi Roman, > > > > Why do you think that Drill-style will not let you exploit collation? > > Collation should be propagated from the index scan in the same way as in > > other sorted operators, such as merge join or streaming aggregate. > Provided > > that you use converter-hack (or any alternative solution to trigger > parent > > re-analysis). > > In other words, propagation of collation from Drill-style indexes should > be > > no different from other sorted operators. > > > > Regards, > > Vladimir. > > > > вт, 10 дек. 2019 г. в 16:40, Zhenya Stanilovsky > <arzamas...@mail.ru.invalid > >> : > > > >> > >> Roman just as fast remark, Phoenix builds their approach on > >> already existing monolith HBase architecture, most cases it`s just a > stub > >> for someone who wants use secondary indexes with a base with no > >> native support of it. Don`t think it`s good idea here. > >> > >>> > >>> > >>> ------- Forwarded message ------- > >>> From: "Roman Kondakov" < kondako...@mail.ru.invalid > > >>> To: dev@ignite.apache.org > >>> Cc: > >>> Subject: Adding support for Ignite secondary indexes to Apache Calcite > >>> planner > >>> Date: Tue, 10 Dec 2019 15:55:52 +0300 > >>> > >>> Hi all! > >>> > >>> As you may know there is an activity on integration of Apache Calcite > >>> query optimizer into Ignite codebase is being carried out [1],[2]. > >>> > >>> One of a bunch of problems in this integration is the absence of > >>> out-of-the-box support for secondary indexes in Apache Calcite. After > >>> some research I came to conclusion that this problem has a couple of > >>> workarounds. Let's name them > >>> 1. Phoenix-style approach - representing secondary indexes as > >>> materialized views which are natively supported by Calcite engine [3] > >>> 2. Drill-style approach - pushing filters into the table scans and > >>> choose appropriate index for lookups when possible [4] > >>> > >>> Both these approaches have advantages and disadvantages: > >>> > >>> Phoenix style pros: > >>> - natural way of adding indexes as an alternative source of rows: index > >>> can be considered as a kind of sorted materialized view. > >>> - possibility of using index sortedness for stream aggregates, > >>> deduplication (DISTINCT operator), merge joins, etc. > >>> - ability to support other types of indexes (i.e. functional indexes). > >>> > >>> Phoenix style cons: > >>> - polluting optimizer's search space extra table scans hence increasing > >>> the planning time. > >>> > >>> Drill style pros: > >>> - easier to implement (although it's questionable). > >>> - search space is not inflated. > >>> > >>> Drill style cons: > >>> - missed opportunity to exploit sortedness. > >>> > >>> There is a good discussion about using both approaches can be found in > >> [5]. > >>> > >>> I made a small sketch [6] in order to demonstrate the applicability of > >>> the Phoenix approach to Ignite. Key design concepts are: > >>> 1. On creating indexes are registered as tables in Calcite schema. This > >>> step is needed for internal Calcite's routines. > >>> 2. On planner initialization we register these indexes as materialized > >>> views in Calcite's optimizer using VolcanoPlanner#addMaterialization > >>> method. > >>> 3. Right before the query execution Calcite selects all materialized > >>> views (indexes) which can be potentially used in query. > >>> 4. During the query optimization indexes are registered by planner as > >>> usual TableScans and hence can be chosen by optimizer if they have > lower > >>> cost. > >>> > >>> This sketch shows the ability to exploit index sortedness only. So the > >>> future work in this direction should be focused on using indexes for > >>> fast index lookups. At first glance FilterableTable and > >>> FilterTableScanRule are good points to start. We can push Filter into > >>> the TableScan and then use FilterableTable for fast index lookups > >>> avoiding reading the whole index on TableScan step and then filtering > >>> its output on the Filter step. > >>> > >>> What do you think? > >>> > >>> > >>> > >>> [1] > >>> > >> > http://apache-ignite-developers.2346864.n4.nabble.com/New-SQL-execution-engine-tt43724.html#none > >>> [2] > >>> > >> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine > >>> [3] https://issues.apache.org/jira/browse/PHOENIX-2047 > >>> [4] https://issues.apache.org/jira/browse/DRILL-6381 > >>> [5] https://issues.apache.org/jira/browse/DRILL-3929 > >>> [6] https://github.com/apache/ignite/pull/7115 > >> > >> > >> > >> > > >