Hi,

As a follow up to multiple discussions that happened during Flink Forward about 
how SQL should be supported by Flink, I was thinking to make a couple of 
proposals.
Disclaimer: I do not claim I have managed to synthesized all the discussions 
and probably a great deal of things are still missing

Why supporting SQL for Flink?

-          A goal to support SQL for Flink should be to enable larger adoption 
of Flink - particularly for data scientists / data engineers who might not 
want/know how to program against the existing APIs

-          The main implication as I see from this is that SQL should serve as 
a translation tool of the data processing processing flow to a stream topology 
that will be executed by Flink

-          This would require to support rather soon an SQL client for Flink

How many features should be supported?

-          In order to enable a (close to ) full benefit of the processing 
capabilities of Flink, I believe most of the processing types should be 
supported - this includes all different types of windows, aggregations, 
transformations, joins....

-          I would propose that UDFs should also be supported such that one can 
easily add more complex computation if needed

-          In the spirit of the extensibility that Flink supports for the 
operators, functions... such custom operators should be supported to replace 
the default implementations of the SQL logical operators

How much customization should be enabled?

-          Regarding customization this could be provided by configuration 
files. Such a configuration can cover the policies for how the triggers, 
evictors, parallelization ...  will be done for the specific translation of the 
SQL query into Flink code

-          In order to support the integration of custom operators for specific 
SQL logical operators, the users should be enabled also to provide translation 
RULES that will replace the default ones  (e.g. if a user want to define their 
own CUSTOM_TABLE_SCAN, it should be able to provide something like 
configuration.replaceRule(DataStreamScanRule.INSTANCE , 
CUSTOM_TABLE_SCAN_Rule.INSTANCE) - or if the selection of the new translation 
rule can be handled from the cost than simply configuration.addRule( 
CUSTOM_TABLE_SCAN_Rule.INSTANCE)

What do you think?


Dr. Radu Tudoran
Senior Research Engineer - Big Data Expert
IT R&D Division

[cid:image007.jpg@01CD52EB.AD060EE0]
HUAWEI TECHNOLOGIES Duesseldorf GmbH
European Research Center
Riesstrasse 25, 80992 München

E-mail: radu.tudo...@huawei.com<mailto:radu.tudo...@huawei.com>
Mobile: +49 15209084330
Telephone: +49 891588344173

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Düsseldorf, Germany, 
www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!

Reply via email to