Hi Alessandro,

That is very fascinating.  Most of those situations won't apply to this
optimization though, since it requires the tables of the union to be
identical rather than different tables representing unions of different
ontologies, that is, this will only apply optimizations when an ontology is
used more than once in a union.  Unless I am mistaken.

--Justin

On Sun, Jan 23, 2022 at 6:49 AM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hello everyone,
> SQL queries automatically generated by semantic queries (see [1] for
> instance) are pretty generous on UNIONs, so I'd say that in this area, you
> will have many practical cases matching the patterns highlighted here.
>
> The main idea is that when you have a concept hierarchy in your ontology
> (isA relationship), where different concepts are generally mapped to
> different tables (or subparts of a single table selected by changing the
> filter condition), when you query an ancestor in that hierarchy (say:
> Person), in order to account for implicit facts, you generally UNION all
> the subtables (say, Employee, Customer, etc, which are also Person) in your
> final SQL query.
>
> Best regards,
> Alessandro
>
> [1] https://en.wikipedia.org/wiki/Ontology-based_data_integration
>
> On Sun, 23 Jan 2022 at 08:19, Yanjing Wang <zhuangzixiao...@gmail.com>
> wrote:
>
> > Thanks Albert.
> >
> >  you could probably refer to osdi paper "resin general sub query fusion"
> > > which has chapters talking about general union fusion that deals with
> the
> > > case Julian pointed out.
> > >
> >
> > The optimization seems very helpful, but it introduces new operators and
> > the rule for new operators may be more complex.
> > Hi, Community, What do you think?
> >
> > Btw, per my understanding cases of union fusion is not that many in
> > > practice.
> > >
> >
> > In particular situations such as VDT and BI tools which generates sql
> > according to a fixed pattern we will meet many such union all queries.
> >
> > Albert <zinki...@gmail.com> 于2022年1月23日周日 11:22写道:
> >
> > > On a more general fusion case, you could probably refer to osdi paper
> > > "resin general sub query fusion" which has chapters talking about
> general
> > > union fusion that deals with the case Julian pointed out.
> > >
> > > Btw, per my understanding cases of union fusion is not that many in
> > > practice. Curious to know the real cases
> > >
> > > 在 2022年1月21日星期五,Yanjing Wang <zhuangzixiao...@gmail.com> 写道:
> > >
> > > > Hi Julian, I have logged a jira CALCITE-4990
> > > > <https://issues.apache.org/jira/browse/CALCITE-4990> for this new
> > rule,
> > > > later I will submit a pr for this.
> > > >
> > > > Julian Hyde <jhyde.apa...@gmail.com> 于2022年1月21日周五 01:37写道:
> > > >
> > > > > Process logic sounds great. Please log a Jira case.
> > > > >
> > > > > There’s a lot here, so it’s probably wise to split into a simple PR
> > > that
> > > > > just does the simple case (UNION ALL, no Project) and extend later.
> > > > >
> > > > > > On Jan 20, 2022, at 7:07 AM, Yanjing Wang <
> > zhuangzixiao...@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks Julian and Justin.
> > > > > >
> > > > > > What do you think the rule should be called?
> > > UnionFilterTransposeRule,
> > > > > >> perhaps?
> > > > > >>
> > > > > > SetOpFilterMergeRule?
> > > > > >
> > > > > > Maybe that problem does not occur when applied to UNION than when
> > > > applied
> > > > > >> to UNION ALL.
> > > > > >
> > > > > > Yes, This is very important.
> > > > > >
> > > > > > There would seem to be analogous rules for INTERSECT (combine the
> > > > > >> conditions using AND) and EXCEPT (combine the conditions using
> AND
> > > > NOT).
> > > > > >>
> > > > > > Excellent extensions, all the three operators process logic are:
> > > > > >
> > > > > > For UNION:
> > > > > > New Filter = left Filter *OR* right Filter.
> > > > > >
> > > > > > For INTERSECT:
> > > > > > New Filter = left Filter *AND* right Filter.
> > > > > >
> > > > > > For EXCEPT:
> > > > > > If left Filter, right Filter have no overlap, transform to left
> > child
> > > > > tree,
> > > > > > Otherwise
> > > > > > New Filter = left Filter *AND NOT *right Filter.
> > > > > >
> > > > > > For UNION ALL:
> > > > > > Add prerequisites:
> > > > > > left Filter, right Filter must have no overlap.
> > > > > >
> > > > > > For INTERSECT ALL:
> > > > > > Add prerequisites:
> > > > > > If left Filter, right Filter have no overlap, transform to empty
> > > > values.
> > > > > > Otherwise
> > > > > > the rule can't be applied.
> > > > > >
> > > > > > For EXCEPT ALL:
> > > > > > same for  EXCEPT.
> > > > > >
> > > > > >
> > > > > > work for N-way Union, not just 2-way Union.
> > > > > >>
> > > > > > Yes, I will add tests for this.
> > > > > >
> > > > > > And I think you should make it work whether or not a Project is
> > > > present.
> > > > > >
> > > > > > Ok, It seems I need construct several different operand match
> trees
> > > for
> > > > > > plan.
> > > > > >
> > > > > > Hi Julian, what do you think of the above process logic?
> > > > > >
> > > > > > Julian Hyde <jhyde.apa...@gmail.com <mailto:
> jhyde.apa...@gmail.com
> > >>
> > > > > 于2022年1月20日周四 10:18写道:
> > > > > >
> > > > > >> Justin,
> > > > > >>
> > > > > >> For planning table or index scans, I would recommend using a
> > single
> > > > > >> TableScan with a Filter that uses a Sarg, rather than using
> > multiple
> > > > > >> TableScans connected by a Union. So I think this rule will be
> > > useful.
> > > > > >>
> > > > > >> But I do agree that this proposed rule is not a “no brainer”. It
> > may
> > > > not
> > > > > >> do what people want/expect in all cases, and therefore it
> probably
> > > > > should
> > > > > >> not be enabled it by default.
> > > > > >>
> > > > > >> Julian
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> On Jan 19, 2022, at 3:38 PM, Justin Swanhart <
> > greenl...@gmail.com>
> > > > > >> wrote:
> > > > > >>>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> Note that this will negate the optimization that one usually is
> > > > looking
> > > > > >> for
> > > > > >>> when writing such queries:
> > > > > >>>
> > > > > >>> Select * from TAB where a = 1
> > > > > >>> UNION ALL
> > > > > >>> Select * from TAB where b = 1
> > > > > >>>
> > > > > >>> In a database with indexes (most databases) this will allow
> > indexes
> > > > to
> > > > > be
> > > > > >>> used on both the a column and the b column.
> > > > > >>> Databases with bitmap indexes or without indexes would benefit
> > from
> > > > the
> > > > > >>> rule.
> > > > > >>>
> > > > > >>> On Wed, Jan 19, 2022 at 4:32 PM Julian Hyde <
> > > jhyde.apa...@gmail.com
> > > > > >> <mailto:jhyde.apa...@gmail.com <mailto:jhyde.apa...@gmail.com
> >>>
> > > > wrote:
> > > > > >>>
> > > > > >>>> Can you log a Jira case for this?
> > > > > >>>>
> > > > > >>>> I think you should make your rule work for N-way Union, not
> just
> > > > 2-way
> > > > > >>>> Union. And I think you should make it work whether or not a
> > > Project
> > > > is
> > > > > >>>> present.
> > > > > >>>>
> > > > > >>>>> On Jan 19, 2022, at 1:25 PM, Julian Hyde <
> > jhyde.apa...@gmail.com
> > > > > <mailto:jhyde.apa...@gmail.com>>
> > > > > >> wrote:
> > > > > >>>>>
> > > > > >>>>> It sounds useful.
> > > > > >>>>>
> > > > > >>>>> What do you think the rule should be called?
> > > > > UnionFilterTransposeRule,
> > > > > >>>> perhaps?
> > > > > >>>>>
> > > > > >>>>> A challenge when writing the rule will be to ensure that all
> of
> > > the
> > > > > >>>> inputs to the Union are the same. The Volcano framework is not
> > > very
> > > > > >> good at
> > > > > >>>> that.
> > > > > >>>>>
> > > > > >>>>> You should be careful of the case that the conditions
> overlap.
> > > For
> > > > > >>>> example, the rewrite
> > > > > >>>>>
> > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30
> > > > > >>>>> UNION ALL
> > > > > >>>>> SELECT * FROM Emp WHERE deptno IN (25, 35, 45)
> > > > > >>>>>
> > > > > >>>>> to
> > > > > >>>>>
> > > > > >>>>> SELECT * FROM Emp WHERE deptno < 30 OR deptno IN (25, 35, 45)
> > > > > >>>>>
> > > > > >>>>> Is not valid, because rows with deptno = 25 will appear twice
> > in
> > > > the
> > > > > >>>> first query, once in the second. Maybe that problem does not
> > occur
> > > > > when
> > > > > >>>> applied to UNION than when applied to UNION ALL.
> > > > > >>>>>
> > > > > >>>>> There would seem to be analogous rules for INTERSECT (combine
> > the
> > > > > >>>> conditions using AND) and EXCEPT (combine the conditions using
> > AND
> > > > > NOT).
> > > > > >>>> Perhaps one rule could cover all set operations (see
> > > > > >>>> FilterSetOpTransposeRule).
> > > > > >>>>>
> > > > > >>>>> Julian
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>> On Jan 19, 2022, at 2:38 AM, Yanjing Wang <
> > > > > zhuangzixiao...@gmail.com <mailto:zhuangzixiao...@gmail.com>
> > > > > >>>> <mailto:zhuangzixiao...@gmail.com <mailto:
> > > zhuangzixiao...@gmail.com
> > > > >
> > > > > <mailto:zhuangzixiao...@gmail.com <mailto:
> zhuangzixiao...@gmail.com
> > > >>>>
> > > > > >> wrote:
> > > > > >>>>>>
> > > > > >>>>>> A simple example is converting SELECT a, b FROM t WHERE c =
> 1
> > > > UNION
> > > > > >> ALL
> > > > > >>>> SELECT a, b FROM t WHERE c = 2 to SELECT a, b FROM t WHERE c
> in
> > > (1,
> > > > 2)
> > > > > >>>>>>
> > > > > >>>>>> Yanjing Wang <zhuangzixiao...@gmail.com <mailto:
> > > > > zhuangzixiao...@gmail.com> <mailto:
> > > > > >> zhuangzixiao...@gmail.com <mailto:zhuangzixiao...@gmail.com>>
> > > > <mailto:
> > > > > >>>> zhuangzixiao...@gmail.com <mailto:zhuangzixiao...@gmail.com
> >>>
> > > > > >> 于2022年1月19日周三 18:35写道:
> > > > > >>>>>> Hi, community
> > > > > >>>>>>
> > > > > >>>>>> Here I recommend a new rule for converting UNION ALL sub
> plan
> > > to a
> > > > > >>>> single input with an OR filter, the following is its
> conversion
> > > > > diagram.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> The conversion prerequisites are
> > > > > >>>>>> 1. left filter range has no intersection with right filter
> > > range.
> > > > > >>>>>> 2. Project and Input Sub Tree must be identical.
> > > > > >>>>>>
> > > > > >>>>>> The rule will be used when Input Sub Tree is a
> > > computing-intensive
> > > > > or
> > > > > >>>> large IO operation.
> > > > > >>>>>>
> > > > > >>>>>> I don't know whether the community supports it or not, any
> > > > > suggestions
> > > > > >>>> will be appreciated.
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > ~~~~~~~~~~~~~~~
> > > no mistakes
> > > ~~~~~~~~~~~~~~~~~~
> > >
> >
>

Reply via email to