Hi all, With the continuous efforts from the community, the Flink system has been continuously improved, which has attracted more and more users. Flink SQL is a canonical, widely used relational query language. However, there are still some scenarios where Flink SQL failed to meet user needs in terms of functionality and ease of use, such as:
- In terms of functionality Iteration, user-defined window, user-defined join, user-defined GroupReduce, etc. Users cannot express them with SQL; - In terms of ease of use - Map - e.g. “dataStream.map(mapFun)”. Although “table.select(udf1(), udf2(), udf3()....)” can be used to accomplish the same function., with a map() function returning 100 columns, one has to define or call 100 UDFs when using SQL, which is quite involved. - FlatMap - e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it can be implemented with “table.join(udtf).select()”. However, it is obvious that datastream is easier to use than SQL. Due to the above two reasons, some users have to use the DataStream API or the DataSet API. But when they do that, they lose the unification of batch and streaming. They will also lose the sophisticated optimizations such as codegen, aggregate join transpose and multi-stage agg from Flink SQL. We believe that enhancing the functionality and productivity is vital for the successful adoption of Table API. To this end, Table API still requires more efforts from every contributor in the community. We see great opportunity in improving our user’s experience from this work. Any feedback is welcome. Regards, Jincheng