Hi Biao, Thanks for your comments!
> 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in > Application mode? More specifically, if we use SQL client/gateway to > execute some interactive SQLs like a SELECT query, can we ask flink to use > Application mode to execute those queries after this FLIP? Thanks for pointing it out. I think only DMLs would be executed via SQL Driver. I'll add the scope to the FLIP. > 2. Deployment: I believe in YARN mode, the implementation is trivial as we > can ship files via YARN's tool easily but for K8s, things can be more > complicated as Shengkai said. Your input is very informative. I’m thinking about using web submission, but it requires exposing the JobManager port which could also be a problem on K8s. Another approach is to explicitly require a distributed storage to ship files, but we may need a new deployment executor for that. What do you think of these two approaches? > 3. Serialization of SessionState: in SessionState, there are some > unserializable fields > like org.apache.flink.table.resource.ResourceManager#userClassLoader. It > may be worthwhile to add more details about the serialization part. I agree. That’s a missing part. But if we use ExecNodeGraph as Shengkai mentioned, do we eliminate the need for serialization of SessionState? Best, Paul Lam > 2023年5月31日 13:07,Biao Geng <biaoge...@gmail.com> 写道: > > Thanks Paul for the proposal!I believe it would be very useful for flink > users. > After reading the FLIP, I have some questions: > 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in > Application mode? More specifically, if we use SQL client/gateway to > execute some interactive SQLs like a SELECT query, can we ask flink to use > Application mode to execute those queries after this FLIP? > 2. Deployment: I believe in YARN mode, the implementation is trivial as we > can ship files via YARN's tool easily but for K8s, things can be more > complicated as Shengkai said. I have implemented a simple POC > <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> > based on SQL client before(i.e. consider the SQL client which supports > executing a SQL file as the SQL driver in this FLIP). One problem I have > met is how do we ship SQL files ( or Job Graph) to the k8s side. Without > such support, users have to modify the initContainer or rebuild a new K8s > image every time to fetch the SQL file. Like the flink k8s operator, one > workaround is to utilize the flink config(transforming the SQL file to a > escaped string like Weihua mentioned) which will be converted to a > ConfigMap but K8s has size limit of ConfigMaps(no larger than 1MB > <https://kubernetes.io/docs/concepts/configuration/configmap/>). Not sure > if we have better solutions. > 3. Serialization of SessionState: in SessionState, there are some > unserializable fields > like org.apache.flink.table.resource.ResourceManager#userClassLoader. It > may be worthwhile to add more details about the serialization part. > > Best, > Biao Geng > > Paul Lam <paullin3...@gmail.com> 于2023年5月31日周三 11:49写道: > >> Hi Weihua, >> >> Thanks a lot for your input! Please see my comments inline. >> >>> - Is SQLRunner the better name? We use this to run a SQL Job. (Not >> strong, >>> the SQLDriver is fine for me) >> >> I’ve thought about SQL Runner but picked SQL Driver for the following >> reasons FYI: >> >> 1. I have a PythonDriver doing the same job for PyFlink [1] >> 2. Flink program's main class is sort of like Driver in JDBC which >> translates SQLs into >> databases specific languages. >> >> In general, I’m +1 for SQL Driver and +0 for SQL Runner. >> >>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to >> prepare >>> a SQL file in an image for Kubernetes application mode, which may be a >> bit >>> cumbersome. >> >> Do you mean a pass the SQL string a configuration or a program argument? >> >> I thought it might be convenient for testing propose, but not recommended >> for production, >> cause Flink SQLs could be complicated and involves lots of characters that >> need to escape. >> >> WDYT? >> >>> - I noticed that we don't specify the SQLDriver jar in the >> "run-application" >>> command. Does that mean we need to perform automatic detection in Flink? >> >> Yes! It’s like running a PyFlink job with the following command: >> >> ``` >> ./bin/flink run \ >> --pyModule table.word_count \ >> --pyFiles examples/python/table >> ``` >> >> The CLI determines if it’s a SQL job, if yes apply the SQL Driver >> automatically. >> >> >> [1] >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> Best, >> Paul Lam >> >>> 2023年5月30日 21:56,Weihua Hu <huweihua....@gmail.com> 写道: >>> >>> Thanks Paul for the proposal. >>> >>> +1 for this. It is valuable in improving ease of use. >>> >>> I have a few questions. >>> - Is SQLRunner the better name? We use this to run a SQL Job. (Not >> strong, >>> the SQLDriver is fine for me) >>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to >> prepare >>> a SQL file in an image for Kubernetes application mode, which may be a >> bit >>> cumbersome. >>> - I noticed that we don't specify the SQLDriver jar in the >> "run-application" >>> command. Does that mean we need to perform automatic detection in Flink? >>> >>> >>> Best, >>> Weihua >>> >>> >>> On Mon, May 29, 2023 at 7:24 PM Paul Lam <paullin3...@gmail.com> wrote: >>> >>>> Hi team, >>>> >>>> I’d like to start a discussion about FLIP-316 [1], which introduces a >> SQL >>>> driver as the >>>> default main class for Flink SQL jobs. >>>> >>>> Currently, Flink SQL could be executed out of the box either via SQL >>>> Client/Gateway >>>> or embedded in a Flink Java/Python program. >>>> >>>> However, each one has its drawback: >>>> >>>> - SQL Client/Gateway doesn’t support the application deployment mode [2] >>>> - Flink Java/Python program requires extra work to write a non-SQL >> program >>>> >>>> Therefore, I propose adding a SQL driver to act as the default main >> class >>>> for SQL jobs. >>>> Please see the FLIP docs for details and feel free to comment. Thanks! >>>> >>>> [1] >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >>>> < >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>>> >>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 < >>>> https://issues.apache.org/jira/browse/FLINK-26541> >>>> >>>> Best, >>>> Paul Lam >> >>