Hi Elias,

For your question 2: this is doable, i think it will be resolved in future
version of Flink.

Best,
Kurt


On Tue, Jan 15, 2019 at 10:35 PM Elias Saalmann <
es45g...@studserv.uni-leipzig.de> wrote:

> Hi there,
>
> I'm working on the Gradoop project at the University of Leipzig (
> https://github.com/dbs-leipzig/gradoop). Currently we're using the
> Batch-API - now we're investigating Table-API as an abstraction for
> Batch-API. I found 2 issues I want to discuss:
>
> 1. I get an error (Error while applying rule AggregateUnionAggregateRule)
> on compile time when having a DISTINCT on a result of a JOIN within an
> UNION, e.g.
>
> (
>   SELECT DISTINCT c
>   FROM a JOIN b ON a = b
> )
> UNION
> (
>   SELECT c
>   FROM c
> )
>
> Java example:
> https://gist.github.com/lordon/27fc5277b0d5abd58158f4ec40cda384
>
> 2. As we have large workflows, several parts of such a workflow are reused
> at differents point within the workflow. For example: Two datasets get
> scanned, INTERSECTED and JOINED to another dataset. The resulting dataset
> is used as JOIN partner for six other datasets. Using Table-API the
> resulting operator tree looks like:
> [image: Workflow]
>
> As you can see, the whole part of INTERSECTING and JOINING is executed for
> each reference. I guess this is because you decided to treat Flink Tables
> as VIEWs which get recalculated on each reference. In fact this doesn't
> make sense for our large workflows (note we're using the BatchEnvironment
> only). Is there any chance to avoid that behavior? Is there a possibility
> to allow Calcite to optimize/combine such common sub trees in the operator
> tree?
>
> Thanks in advance!
>
> Best,
> Elias
>

Reply via email to