Hi Niklas,
it would be interesting to know which planner caused the long runtime.
Could you use a debugger to figure out more details? Is it really the
Flink Table API planner or the under DataSet planner one level deeper?
There was an issue that was recently closed [1] about the DataSet
optimizer. Could this solve your problem?
I will also loop in Fabian who might knows more.
Regards,
Timo
[1] https://issues.apache.org/jira/browse/FLINK-10566
Am 07.01.19 um 14:05 schrieb Niklas Teichmann:
Hi everybody,
I have a question concerning the planner for the Flink Table / Batch API.
At the moment I try to use a library called Cypher for Apache Flink, a
project that tries to implement
the graph database query language Cypher on Apache Flink (CAPF,
https://github.com/soerenreichardt/cypher-for-apache-flink).
The problem is that the planner seemingly takes a very long time to
plan and optimize the job created by CAPF. This example job in json
format
https://pastebin.com/J84grsjc
takes on a 24 GB data set about 20 minutes to plan and about 5 minutes
to run the job. That seems very long for a job of this size.
Do you have any idea why this is the case?
Is there a way to give the planner hints to reduce the planning time?
Thanks in advance!
Niklas