Issues regarding Table-API

Elias Saalmann Tue, 15 Jan 2019 06:35:20 -0800

Hi there,

I'm working on the Gradoop project at the University of Leipzig(https://github.com/dbs-leipzig/gradoop). Currently we're using theBatch-API - now we're investigating Table-API as an abstraction forBatch-API. I found 2 issues I want to discuss:

1. I get an error (Error while applying ruleAggregateUnionAggregateRule) on compile time when having a DISTINCT on aresult of a JOIN within an UNION, e.g.


(
  SELECT DISTINCT c
  FROM a JOIN b ON a = b
)
UNION
(
  SELECT c
  FROM c
)

Java example:https://gist.github.com/lordon/27fc5277b0d5abd58158f4ec40cda384

2. As we have large workflows, several parts of such a workflow arereused at differents point within the workflow. For example: Twodatasets get scanned, INTERSECTED and JOINED to another dataset. Theresulting dataset is used as JOIN partner for six other datasets. UsingTable-API the resulting operator tree looks like:


Workflow

As you can see, the whole part of INTERSECTING and JOINING is executedfor each reference. I guess this is because you decided to treat FlinkTables as VIEWs which get recalculated on each reference. In fact thisdoesn't make sense for our large workflows (note we're using theBatchEnvironment only). Is there any chance to avoid that behavior? Isthere a possibility to allow Calcite to optimize/combine such common subtrees in the operator tree?


Thanks in advance!

Best,
Elias

Issues regarding Table-API

Reply via email to