Hi Ferenc, Sorry for my late reply.
> Is any active work happening on this FLIP? As far as I see there > are blockers that needs to happen first to implement regarding > artifact distribution. You’re right. There’s a block in K8s application mode, but none in YARN application. I’m doing a POC on YARN application mode before starting a vote thread. I’ve been busy lately, but the FLIP is active for sure. The situation will change in a couple of weeks. Thank you for reaching out! I’ll let you know when the POC is completed. Best, Paul Lam > 2023年11月21日 06:01,Ferenc Csaky <ferenc.cs...@pm.me.INVALID> 写道: > > Hello devs, > > Is any active work happening on this FLIP? As far as I see there > are blockers that needs to happen first to implement regarding > artifact distribution. > > Is this work in halt completetly or some efforts are going into > resolve the blockers first or something? > > Our platform would benefit this feature a lot, we have a kind of > working custom implementation at the moment, but it is uniquely > adapted to our app and platform. > > I could help out to move this forward. > > Best, > Ferenc > > > > On Friday, June 30th, 2023 at 04:53, Paul Lam <paullin3...@gmail.com > <mailto:paullin3...@gmail.com>> wrote: > > >> >> >> Hi Jing, >> >> Thanks for your input! >> >>> Would you like to add >>> one section to describe(better with script/code example) how to use it in >>> these two scenarios from users' perspective? >> >> >> OK. I’ll update the FLIP with the code snippet after I get the POC branch >> done. >> >>> NIT: the pictures have transparent background when readers click on it. It >>> would be great if you can replace them with pictures with white background. >> >> >> Fixed. Thanks for pointing that out :) >> >> Best, >> Paul Lam >> >>> 2023年6月27日 06:51,Jing Ge j...@ververica.com.INVALID 写道: >>> >>> Hi Paul, >>> >>> Thanks for driving it and thank you all for the informative discussion! The >>> FLIP is in good shape now. As described in the FLIP, SQL Driver will be >>> mainly used to run Flink SQLs in two scenarios: 1. SQL client/gateway in >>> application mode and 2. external system integration. Would you like to add >>> one section to describe(better with script/code example) how to use it in >>> these two scenarios from users' perspective? >>> >>> NIT: the pictures have transparent background when readers click on it. It >>> would be great if you can replace them with pictures with white background. >>> >>> Best regards, >>> Jing >>> >>> On Mon, Jun 26, 2023 at 1:31 PM Paul Lam <paullin3...@gmail.com >>> <mailto:paullin3...@gmail.com> mailto:paullin3...@gmail.com> wrote: >>> >>>> Hi Shengkai, >>>> >>>>> * How can we ship the json plan to the JobManager? >>>> >>>> The Flink K8s module should be responsible for file distribution. We could >>>> introduce >>>> an option like `kubernetes.storage.dir`. For each flink cluster, there >>>> would be a >>>> dedicated subdirectory, with the pattern like >>>> `${kubernetes.storage.dir}/${cluster-id}`. >>>> >>>> All resources-related options (e.g. pipeline jars, json plans) that are >>>> configured with >>>> scheme `file://` <file://`/> <file://`/> <file:///%60 <file:///%60>> would >>>> be uploaded to the resource directory >>>> and downloaded to the >>>> jobmanager, before SQL Driver accesses the files with the original >>>> filenames. >>>> >>>>> * Classloading strategy >>>> >>>> We could directly specify the SQL Gateway jar as the jar file in >>>> PackagedProgram. >>>> It would be treated like a normal user jar and the SQL Driver is loaded >>>> into the user >>>> classloader. WDYT? >>>> >>>>> * Option `$internal.sql-gateway.driver.sql-config` is string type >>>>> I think it's better to use Map type here >>>> >>>> By Map type configuration, do you mean a nested map that contains all >>>> configurations? >>>> >>>> I hope I've explained myself well, it’s a file that contains the extra SQL >>>> configurations, which would be shipped to the jobmanager. >>>> >>>>> * PoC branch >>>> >>>> Sure. I’ll let you know once I get the job done. >>>> >>>> Best, >>>> Paul Lam >>>> >>>>> 2023年6月26日 14:27,Shengkai Fang <fskm...@gmail.com >>>>> <mailto:fskm...@gmail.com> mailto:fskm...@gmail.com> 写道: >>>>> >>>>> Hi, Paul. >>>>> >>>>> Thanks for your update. I have a few questions about the new design: >>>>> >>>>> * How can we ship the json plan to the JobManager? >>>>> >>>>> The current design only exposes an option about the URL of the json >>>>> plan. It seems the gateway is responsible to upload to an external >>>>> stroage. >>>>> Can we reuse the PipelineOptions.JARS to ship to the remote filesystem? >>>>> >>>>> * Classloading strategy >>>>> >>>>> Currently, the Driver is in the sql-gateway package. It means the Driver >>>>> is not in the JM's classpath directly. Because the sql-gateway jar is now >>>>> in the opt directory rather than lib directory. It may need to add the >>>>> external dependencies as Python does[1]. BTW, I think it's better to move >>>>> the Driver into the flink-table-runtime package, which is much easier to >>>>> find(Sorry for the wrong opinion before). >>>>> >>>>> * Option `$internal.sql-gateway.driver.sql-config` is string type >>>>> >>>>> I think it's better to use Map type here >>>>> >>>>> * PoC branch >>>>> >>>>> Because this FLIP involves many modules, do you have a PoC branch to >>>>> verify it does work? >>>>> >>>>> Best, >>>>> Shengkai >>>>> >>>>> [1] >>>>> https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940 >>>>> < >>>>> https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940 >>>>> >>>>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>> mailto:paullin3...@gmail.com>> >>>>> 于2023年6月19日周一 14:09写道: >>>>> Hi Shengkai, >>>>> >>>>> Sorry for my late reply. It took me some time to update the FLIP. >>>>> >>>>> In the latest FLIP design, SQL Driver is placed in flink-sql-gateway >>>>> module. PTAL. >>>>> >>>>> The FLIP does not cover details about the K8s file distribution, but its >>>>> general usage would >>>>> be very much the same as YARN setups. We could make follow-up >>>>> discussions in the jira >>>>> tickets. >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>>> 2023年6月12日 15:29,Shengkai Fang <fskm...@gmail.com >>>>>> <mailto:fskm...@gmail.com> mailto:fskm...@gmail.com <mailto: >>>>>> fskm...@gmail.com <mailto:fskm...@gmail.com> mailto:fskm...@gmail.com>> >>>>>> 写道: >>>>>> >>>>>>> If it’s the case, I’m good with introducing a new module and making >>>>>>> SQL Driver >>>>>>> an internal class and accepts JSON plans only. >>>>>> >>>>>> I rethink this again and again. I think it's better to move the >>>>>> SqlDriver into the sql-gateway module because the sql client relies on >>>>>> the >>>>>> sql-gateway to submit the sql and the sql-gateway has the ability to >>>>>> generate the ExecNodeGraph now. +1 to support accepting JSON plans only. >>>>>> >>>>>> * Upload configuration through command line parameter >>>>>> >>>>>> ExecNodeGraph only contains the job's information but it doesn't >>>>>> contain the checkpoint dir, checkpoint interval, execution mode and so >>>>>> on. >>>>>> So I think we should also upload the configuration. >>>>>> >>>>>> * KubernetesClusterDescripter and >>>>>> KubernetesApplicationClusterEntrypoint are responsible for the jar >>>>>> upload/download >>>>>> >>>>>> +1 for the change. >>>>>> >>>>>> Could you update the FLIP about the current discussion? >>>>>> >>>>>> Best, >>>>>> Shengkai >>>>>> >>>>>> Yang Wang <wangyang0...@apache.org <mailto:wangyang0...@apache.org> >>>>>> mailto:wangyang0...@apache.org >>>>>> <mailto:wangyang0918@apache.orgmailto:wangyang0...@apache.org>> >>>>>> 于2023年6月12日周一 11:41写道: >>>>>> Sorry for the late reply. I am in favor of introducing such a built-in >>>>>> resource localization mechanism >>>>>> based on Flink FileSystem. Then FLINK-28915[1] could be the second step >>>>>> which will download >>>>>> the jars and dependencies to the JobManager/TaskManager local directory >>>>>> before working. >>>>>> >>>>>> The first step could be done in another ticket in Flink. Or some >>>>>> external >>>>>> Flink jobs management system >>>>>> could also take care of this. >>>>>> >>>>>> [1]. https://issues.apache.org/jira/browse/FLINK-28915 >>>>>> https://issues.apache.org/jira/browse/FLINK-28915 < >>>>>> https://issues.apache.org/jira/browse/FLINK-28915 >>>>>> https://issues.apache.org/jira/browse/FLINK-28915> >>>>>> >>>>>> Best, >>>>>> Yang >>>>>> >>>>>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>> mailto:paullin3...@gmail.com>> >>>>>> 于2023年6月9日周五 17:39写道: >>>>>> >>>>>>> Hi Mason, >>>>>>> >>>>>>> I get your point. I'm increasingly feeling the need to introduce a >>>>>>> built-in >>>>>>> file distribution mechanism for flink-kubernetes module, just like >>>>>>> Spark >>>>>>> does with `spark.kubernetes.file.upload.path` [1]. >>>>>>> >>>>>>> I’m assuming the workflow is as follows: >>>>>>> >>>>>>> - KubernetesClusterDescripter uploads all local resources to a remote >>>>>>> storage via Flink filesystem (skips if the resources are already >>>>>>> remote). >>>>>>> - KubernetesApplicationClusterEntrypoint downloads the resources >>>>>>> and put them in the classpath during startup. >>>>>>> >>>>>>> I wouldn't mind splitting it into another FLIP to ensure that >>>>>>> everything is >>>>>>> done correctly. >>>>>>> >>>>>>> cc'ed @Yang to gather more opinions. >>>>>>> >>>>>>> [1] >>>> >>>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management >>>> >>>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management >>>> < >>>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management >>>> >>>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management >>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年6月8日 12:15,Mason Chen <mas.chen6...@gmail.com >>>>>>> <mailto:mas.chen6...@gmail.com> mailto:mas.chen6...@gmail.com <mailto: >>>>>>> mas.chen6...@gmail.com <mailto:mas.chen6...@gmail.com> >>>>>>> mailto:mas.chen6...@gmail.com>> 写道: >>>>>>> >>>>>>> Hi Paul, >>>>>>> >>>>>>> Thanks for your response! >>>>>>> >>>>>>> I agree that utilizing SQL Drivers in Java applications is equally >>>>>>> important >>>>>>> >>>>>>> as employing them in SQL Gateway. WRT init containers, I think most >>>>>>> users use them just as a workaround. For example, wget a jar from the >>>>>>> maven repo. >>>>>>> >>>>>>> We could implement the functionality in SQL Driver in a more graceful >>>>>>> way and the flink-supported filesystem approach seems to be a >>>>>>> good choice. >>>>>>> >>>>>>> My main point is: can we solve the problem with a design agnostic of >>>>>>> SQL >>>>>>> and Stream API? I mentioned a use case where this ability is useful >>>>>>> for >>>>>>> Java or Stream API applications. Maybe this is even a non-goal to >>>>>>> your FLIP >>>>>>> since you are focusing on the driver entrypoint. >>>>>>> >>>>>>> Jark mentioned some optimizations: >>>>>>> >>>>>>> This allows SQLGateway to leverage some metadata caching and UDF JAR >>>>>>> caching for better compiling performance. >>>>>>> >>>>>>> It would be great to see this even outside the SQLGateway (i.e. UDF >>>>>>> JAR >>>>>>> caching). >>>>>>> >>>>>>> Best, >>>>>>> Mason >>>>>>> >>>>>>> On Wed, Jun 7, 2023 at 2:26 AM Shengkai Fang <fskm...@gmail.com >>>>>>> <mailto:fskm...@gmail.com> mailto:fskm...@gmail.com >>>>>>> <mailto:fskm...@gmail.com mailto:fskm...@gmail.com>> wrote: >>>>>>> >>>>>>> Hi. Paul. Thanks for your update and the update makes me understand >>>>>>> the >>>>>>> design much better. >>>>>>> >>>>>>> But I still have some questions about the FLIP. >>>>>>> >>>>>>> For SQL Gateway, only DMLs need to be delegated to the SQL server >>>>>>> Driver. I would think about the details and update the FLIP. Do you >>>>>>> have >>>>>>> >>>>>>> some >>>>>>> >>>>>>> ideas already? >>>>>>> >>>>>>> If the applicaiton mode can not support library mode, I think we >>>>>>> should >>>>>>> only execute INSERT INTO and UPDATE/ DELETE statement in the >>>>>>> application >>>>>>> mode. AFAIK, we can not support ANALYZE TABLE and CALL PROCEDURE >>>>>>> statements. The ANALYZE TABLE syntax need to register the statistic >>>>>>> to the >>>>>>> catalog after job finishes and the CALL PROCEDURE statement doesn't >>>>>>> generate the ExecNodeGraph. >>>>>>> >>>>>>> * Introduce storage via option `sql-gateway.application.storage-dir` >>>>>>> >>>>>>> If we can not support to submit the jars through web submission, +1 to >>>>>>> introduce the options to upload the files. While I think the uploader >>>>>>> should be responsible to remove the uploaded jars. Can we remove the >>>>>>> jars >>>>>>> if the job is running or gateway exits? >>>>>>> >>>>>>> * JobID is not avaliable >>>>>>> >>>>>>> Can we use the returned rest client by ApplicationDeployer to query >>>>>>> the job >>>>>>> id? I am concerned that users don't know which job is related to the >>>>>>> submitted SQL. >>>>>>> >>>>>>> * Do we need to introduce a new module named flink-table-sql-runner? >>>>>>> >>>>>>> It seems we need to introduce a new module. Will the new module is >>>>>>> available in the distribution package? I agree with Jark that we >>>>>>> don't need >>>>>>> to introduce this for table-API users and these users have their main >>>>>>> class. If we want to make users write the k8s operator more easily, I >>>>>>> think >>>>>>> we should modify the k8s operator repo. If we don't need to support >>>>>>> SQL >>>>>>> files, can we make this jar only visible in the sql-gateway like we >>>>>>> do in >>>>>>> the planner loader?[1] >>>>>>> >>>>>>> [1] >>>> >>>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95 >>>> >>>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95 >>>> < >>>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95 >>>> >>>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95 >>>> >>>>>>> Best, >>>>>>> Shengkai >>>>>>> >>>>>>> Weihua Hu <huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com <mailto:huweihua....@gmail.com >>>>>>> mailto:huweihua....@gmail.com>> >>>>>>> 于2023年6月7日周三 10:52写道: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Thanks for updating the FLIP. >>>>>>> >>>>>>> I have two cents on the distribution of SQLs and resources. >>>>>>> 1. Should we support a common file distribution mechanism for k8s >>>>>>> application mode? >>>>>>> I have seen some issues and requirements on the mailing list. >>>>>>> In our production environment, we implement the download command in >>>>>>> the >>>>>>> CliFrontend. >>>>>>> And automatically add an init container to the POD for file >>>>>>> >>>>>>> downloading. >>>>>>> >>>>>>> The advantage of this >>>>>>> is that we can use all Flink-supported file systems to store files. >>>>>>> >>>>>>> This need more discussion. I would appreciate hearing more opinions. >>>>>>> >>>>>>> 2. In this FLIP, we distribute files in two different ways in YARN and >>>>>>> Kubernetes. Can we combine it in one way? >>>>>>> If we don't want to implement a common file distribution for k8s >>>>>>> application mode. Could we use the SQLDriver >>>>>>> to download the files both in YARN and K8S? IMO, this can reduce the >>>>>>> >>>>>>> cost >>>>>>> >>>>>>> of code maintenance. >>>>>>> >>>>>>> Best, >>>>>>> Weihua >>>>>>> >>>>>>> On Wed, Jun 7, 2023 at 10:18 AM Paul Lam <paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com> mailto:paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com>> wrote: >>>>>>> >>>>>>> Hi Mason, >>>>>>> >>>>>>> Thanks for your input! >>>>>>> >>>>>>> +1 for init containers or a more generalized way of obtaining >>>>>>> >>>>>>> arbitrary >>>>>>> >>>>>>> files. File fetching isn't specific to just SQL--it also matters for >>>>>>> >>>>>>> Java >>>>>>> >>>>>>> applications if the user doesn't want to rebuild a Flink image and >>>>>>> >>>>>>> just >>>>>>> >>>>>>> wants to modify the user application fat jar. >>>>>>> >>>>>>> I agree that utilizing SQL Drivers in Java applications is equally >>>>>>> important >>>>>>> as employing them in SQL Gateway. WRT init containers, I think most >>>>>>> users use them just as a workaround. For example, wget a jar from the >>>>>>> maven repo. >>>>>>> >>>>>>> We could implement the functionality in SQL Driver in a more graceful >>>>>>> way and the flink-supported filesystem approach seems to be a >>>>>>> good choice. >>>>>>> >>>>>>> Also, what do you think about prefixing the config options with >>>>>>> `sql-driver` instead of just `sql` to be more specific? >>>>>>> >>>>>>> LGTM, since SQL Driver is a public interface and the options are >>>>>>> specific to it. >>>>>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年6月6日 06:30,Mason Chen <mas.chen6...@gmail.com >>>>>>> <mailto:mas.chen6...@gmail.com> mailto:mas.chen6...@gmail.com <mailto: >>>>>>> mas.chen6...@gmail.com <mailto:mas.chen6...@gmail.com> >>>>>>> mailto:mas.chen6...@gmail.com>> 写道: >>>>>>> >>>>>>> Hi Paul, >>>>>>> >>>>>>> +1 for this feature and supporting SQL file + JSON plans. We get a >>>>>>> >>>>>>> lot >>>>>>> >>>>>>> of >>>>>>> >>>>>>> requests to just be able to submit a SQL file, but the JSON plan >>>>>>> optimizations make sense. >>>>>>> >>>>>>> +1 for init containers or a more generalized way of obtaining >>>>>>> >>>>>>> arbitrary >>>>>>> >>>>>>> files. File fetching isn't specific to just SQL--it also matters for >>>>>>> >>>>>>> Java >>>>>>> >>>>>>> applications if the user doesn't want to rebuild a Flink image and >>>>>>> >>>>>>> just >>>>>>> >>>>>>> wants to modify the user application fat jar. >>>>>>> >>>>>>> Please note that we could reuse the checkpoint storage like S3/HDFS, >>>>>>> >>>>>>> which >>>>>>> >>>>>>> should >>>>>>> >>>>>>> be required to run Flink in production, so I guess that would be >>>>>>> >>>>>>> acceptable >>>>>>> >>>>>>> for most >>>>>>> >>>>>>> users. WDYT? >>>>>>> >>>>>>> If you do go this route, it would be nice to support writing these >>>>>>> >>>>>>> files >>>>>>> >>>>>>> to >>>>>>> >>>>>>> S3/HDFS via Flink. This makes access control and policy management >>>>>>> >>>>>>> simpler. >>>>>>> >>>>>>> Also, what do you think about prefixing the config options with >>>>>>> `sql-driver` instead of just `sql` to be more specific? >>>>>>> >>>>>>> Best, >>>>>>> Mason >>>>>>> >>>>>>> On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com> mailto:paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com> >>>>>>> >>>>>>> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>> wrote: >>>>>>> >>>>>>> Hi Jark, >>>>>>> >>>>>>> Thanks for your input! Please see my comments inline. >>>>>>> >>>>>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL? >>>>>>> DataStream API also doesn't provide a default main class for users, >>>>>>> why do we need to provide such one for SQL? >>>>>>> >>>>>>> Sorry for the confusion I caused. By DataStream jobs, I mean jobs >>>>>>> >>>>>>> submitted >>>>>>> >>>>>>> via Flink CLI which actually could be DataStream/Table jobs. >>>>>>> >>>>>>> I think a default main class would be user-friendly which eliminates >>>>>>> >>>>>>> the >>>>>>> >>>>>>> need >>>>>>> for users to write a main class as SQLRunner in Flink K8s operator >>>>>>> >>>>>>> [1]. >>>>>>> >>>>>>> I thought the proposed SqlDriver was a dedicated main class >>>>>>> >>>>>>> accepting >>>>>>> >>>>>>> SQL files, is >>>>>>> >>>>>>> that correct? >>>>>>> >>>>>>> Both JSON plans and SQL files are accepted. SQL Gateway should use >>>>>>> >>>>>>> JSON >>>>>>> >>>>>>> plans, >>>>>>> while CLI users may use either JSON plans or SQL files. >>>>>>> >>>>>>> Please see the updated FLIP[2] for more details. >>>>>>> >>>>>>> Personally, I prefer the way of init containers which doesn't >>>>>>> >>>>>>> depend >>>>>>> >>>>>>> on >>>>>>> >>>>>>> additional components. >>>>>>> This can reduce the moving parts of a production environment. >>>>>>> Depending on a distributed file system makes the testing, demo, and >>>>>>> >>>>>>> local >>>>>>> >>>>>>> setup harder than init containers. >>>>>>> >>>>>>> Please note that we could reuse the checkpoint storage like S3/HDFS, >>>>>>> >>>>>>> which >>>>>>> >>>>>>> should >>>>>>> be required to run Flink in production, so I guess that would be >>>>>>> acceptable for most >>>>>>> users. WDYT? >>>>>>> >>>>>>> WRT testing, demo, and local setups, I think we could support the >>>>>>> >>>>>>> local >>>>>>> >>>>>>> filesystem >>>>>>> scheme i.e. file://** <file:///**> <file:///**> <> as the state >>>>>>> backends do. It works as long as >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> Gateway >>>>>>> and JobManager(or SQL Driver) can access the resource directory >>>>>>> >>>>>>> (specified >>>>>>> >>>>>>> via >>>>>>> `sql-gateway.application.storage-dir`). >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> [1] >>>> >>>> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.javahttps://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java >>>> < >>>> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.javahttps://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java >>>> >>>>>>> [2] >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> < >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> >>>>>>> [3] >>>> >>>> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161 >>>> < >>>> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161 >>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年6月3日 12:21,Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com> >>>>>>> mailto:imj...@gmail.com <mailto:imj...@gmail.com >>>>>>> mailto:imj...@gmail.com>> >>>>>>> 写道: >>>>>>> >>>>>>> Hi Paul, >>>>>>> >>>>>>> Thanks for your reply. I left my comments inline. >>>>>>> >>>>>>> As the FLIP said, it’s good to have a default main class for Flink >>>>>>> >>>>>>> SQLs, >>>>>>> >>>>>>> which allows users to submit Flink SQLs in the same way as >>>>>>> >>>>>>> DataStream >>>>>>> >>>>>>> jobs, or else users need to write their own main class. >>>>>>> >>>>>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL? >>>>>>> DataStream API also doesn't provide a default main class for users, >>>>>>> why do we need to provide such one for SQL? >>>>>>> >>>>>>> With the help of ExecNodeGraph, do we still need the serialized >>>>>>> SessionState? If not, we could make SQL Driver accepts two >>>>>>> >>>>>>> serialized >>>>>>> >>>>>>> formats: >>>>>>> >>>>>>> No, ExecNodeGraph doesn't need to serialize SessionState. I thought >>>>>>> >>>>>>> the >>>>>>> >>>>>>> proposed SqlDriver was a dedicated main class accepting SQL files, >>>>>>> >>>>>>> is >>>>>>> >>>>>>> that correct? >>>>>>> If true, we have to ship the SessionState for this case which is a >>>>>>> >>>>>>> large >>>>>>> >>>>>>> work. >>>>>>> I think we just need a JsonPlanDriver which is a main class that >>>>>>> >>>>>>> accepts >>>>>>> >>>>>>> JsonPlan as the parameter. >>>>>>> >>>>>>> The common solutions I know is to use distributed file systems or >>>>>>> >>>>>>> use >>>>>>> >>>>>>> init containers to localize the resources. >>>>>>> >>>>>>> Personally, I prefer the way of init containers which doesn't >>>>>>> >>>>>>> depend >>>>>>> >>>>>>> on >>>>>>> >>>>>>> additional components. >>>>>>> This can reduce the moving parts of a production environment. >>>>>>> Depending on a distributed file system makes the testing, demo, and >>>>>>> >>>>>>> local >>>>>>> >>>>>>> setup harder than init containers. >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> On Fri, 2 Jun 2023 at 18:10, Paul Lam <paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com> mailto:paullin3...@gmail.com <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com> >>>>>>> >>>>>>> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>>> wrote: >>>>>>> >>>>>>> The FLIP is in the early phase and some details are not included, >>>>>>> >>>>>>> but >>>>>>> >>>>>>> fortunately, we got lots of valuable ideas from the discussion. >>>>>>> >>>>>>> Thanks to everyone who joined the dissuasion! >>>>>>> @Weihua @Shanmon @Shengkai @Biao @Jark >>>>>>> >>>>>>> This weekend I’m gonna revisit and update the FLIP, adding more >>>>>>> details. Hopefully, we can further align our opinions. >>>>>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年6月2日 18:02,Paul Lam <paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com> mailto:paullin3...@gmail.com <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>> 写道: >>>>>>> >>>>>>> Hi Jark, >>>>>>> >>>>>>> Thanks a lot for your input! >>>>>>> >>>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it >>>>>>> >>>>>>> still >>>>>>> >>>>>>> necessary to support SQL Driver? >>>>>>> >>>>>>> I think so. Apart from usage in SQL Gateway, SQL Driver could >>>>>>> >>>>>>> simplify >>>>>>> >>>>>>> Flink SQL execution with Flink CLI. >>>>>>> >>>>>>> As the FLIP said, it’s good to have a default main class for >>>>>>> >>>>>>> Flink >>>>>>> >>>>>>> SQLs, >>>>>>> >>>>>>> which allows users to submit Flink SQLs in the same way as >>>>>>> >>>>>>> DataStream >>>>>>> >>>>>>> jobs, or else users need to write their own main class. >>>>>>> >>>>>>> SQL Driver needs to serialize SessionState which is very >>>>>>> >>>>>>> challenging >>>>>>> >>>>>>> but not detailed covered in the FLIP. >>>>>>> >>>>>>> With the help of ExecNodeGraph, do we still need the serialized >>>>>>> SessionState? If not, we could make SQL Driver accepts two >>>>>>> >>>>>>> serialized >>>>>>> >>>>>>> formats: >>>>>>> >>>>>>> - SQL files for user-facing public usage >>>>>>> - ExecNodeGraph for internal usage >>>>>>> >>>>>>> It’s kind of similar to the relationship between job jars and >>>>>>> >>>>>>> jobgraphs. >>>>>>> >>>>>>> Regarding "K8S doesn't support shipping multiple jars", is that >>>>>>> >>>>>>> true? >>>>>>> >>>>>>> Is it >>>>>>> >>>>>>> possible to support it? >>>>>>> >>>>>>> Yes, K8s doesn’t distribute any files. It’s the users’ >>>>>>> >>>>>>> responsibility >>>>>>> >>>>>>> to >>>>>>> >>>>>>> make >>>>>>> >>>>>>> sure the resources are accessible in the containers. The common >>>>>>> >>>>>>> solutions >>>>>>> >>>>>>> I know is to use distributed file systems or use init containers >>>>>>> >>>>>>> to >>>>>>> >>>>>>> localize the >>>>>>> >>>>>>> resources. >>>>>>> >>>>>>> Now I lean toward introducing a fs to do the distribution job. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年6月1日 20:33,Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com> >>>>>>> mailto:imj...@gmail.com <mailto:imj...@gmail.com >>>>>>> mailto:imj...@gmail.com> >>>>>>> <mailto: >>>>>>> >>>>>>> imj...@gmail.com <mailto:imj...@gmail.com> mailto:imj...@gmail.com >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com>> >>>>>>> >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com> <mailto: >>>>>>> imj...@gmail.com <mailto:imj...@gmail.com> mailto:imj...@gmail.com >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com>>> >>>>>>> >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com> <mailto: >>>>>>> imj...@gmail.com <mailto:imj...@gmail.com> mailto:imj...@gmail.com >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com>> <mailto: >>>>>>> >>>>>>> imj...@gmail.com <mailto:imj...@gmail.com> mailto:imj...@gmail.com >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com> >>>>>>> <mailto:imjark@gmail.commailto:imj...@gmail.com >>>>>>> <mailto:imj...@gmail.com mailto:imj...@gmail.com>>>>> >>>>>>> >>>>>>> 写道: >>>>>>> >>>>>>> Hi Paul, >>>>>>> >>>>>>> Thanks for starting this discussion. I like the proposal! This >>>>>>> >>>>>>> is >>>>>>> >>>>>>> a >>>>>>> >>>>>>> frequently requested feature! >>>>>>> >>>>>>> I agree with Shengkai that ExecNodeGraph as the submission >>>>>>> >>>>>>> object >>>>>>> >>>>>>> is a >>>>>>> >>>>>>> better idea than SQL file. To be more specific, it should be >>>>>>> >>>>>>> JsonPlanGraph >>>>>>> >>>>>>> or CompiledPlan which is the serializable representation. >>>>>>> >>>>>>> CompiledPlan >>>>>>> >>>>>>> is a >>>>>>> >>>>>>> clear separation between compiling/optimization/validation and >>>>>>> >>>>>>> execution. >>>>>>> >>>>>>> This can keep the validation and metadata accessing still on the >>>>>>> >>>>>>> SQLGateway >>>>>>> >>>>>>> side. This allows SQLGateway to leverage some metadata caching >>>>>>> >>>>>>> and >>>>>>> >>>>>>> UDF >>>>>>> >>>>>>> JAR >>>>>>> >>>>>>> caching for better compiling performance. >>>>>>> >>>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it >>>>>>> >>>>>>> still >>>>>>> >>>>>>> necessary to support SQL Driver? Regarding non-interactive SQL >>>>>>> >>>>>>> jobs, >>>>>>> >>>>>>> users >>>>>>> >>>>>>> can use the Table API program for application mode. SQL Driver >>>>>>> >>>>>>> needs >>>>>>> >>>>>>> to >>>>>>> >>>>>>> serialize SessionState which is very challenging but not >>>>>>> >>>>>>> detailed >>>>>>> >>>>>>> covered >>>>>>> >>>>>>> in the FLIP. >>>>>>> >>>>>>> Regarding "K8S doesn't support shipping multiple jars", is that >>>>>>> >>>>>>> true? >>>>>>> >>>>>>> Is it >>>>>>> >>>>>>> possible to support it? >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> On Thu, 1 Jun 2023 at 16:58, Paul Lam <paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com> mailto:paullin3...@gmail.com <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com> >>>>>>> >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com>> >>>>>>> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>>>> wrote: >>>>>>> >>>>>>> Hi Weihua, >>>>>>> >>>>>>> You’re right. Distributing the SQLs to the TMs is one of the >>>>>>> >>>>>>> challenging >>>>>>> >>>>>>> parts of this FLIP. >>>>>>> >>>>>>> Web submission is not enabled in application mode currently as >>>>>>> >>>>>>> you >>>>>>> >>>>>>> said, >>>>>>> >>>>>>> but it could be changed if we have good reasons. >>>>>>> >>>>>>> What do you think about introducing a distributed storage for >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> Gateway? >>>>>>> >>>>>>> We could make use of Flink file systems [1] to distribute the >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> Gateway >>>>>>> >>>>>>> generated resources, that should solve the problem at its root >>>>>>> >>>>>>> cause. >>>>>>> >>>>>>> Users could specify Flink-supported file systems to ship files. >>>>>>> >>>>>>> It’s >>>>>>> >>>>>>> only >>>>>>> >>>>>>> required when using SQL Gateway with K8s application mode. >>>>>>> >>>>>>> [1] >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> < >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> < >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> < >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> < >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> < >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> < >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> < >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> < >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年6月1日 13:55,Weihua Hu <huweihua....@gmail.com >>>>>>> <mailto:huweihua....@gmail.com> mailto:huweihua....@gmail.com <mailto: >>>>>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com> <mailto: >>>>>>> >>>>>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com <mailto:huweihua....@gmail.com >>>>>>> mailto:huweihua....@gmail.com>> <mailto: >>>>>>> >>>>>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com <mailto:huweihua....@gmail.com >>>>>>> mailto:huweihua....@gmail.com> <mailto: >>>>>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com <mailto:huweihua....@gmail.com >>>>>>> mailto:huweihua....@gmail.com>>>> 写道: >>>>>>> >>>>>>> Thanks Paul for your reply. >>>>>>> >>>>>>> SQLDriver looks good to me. >>>>>>> >>>>>>> 2. Do you mean a pass the SQL string a configuration or a >>>>>>> >>>>>>> program >>>>>>> >>>>>>> argument? >>>>>>> >>>>>>> I brought this up because we were unable to pass the SQL file >>>>>>> >>>>>>> to >>>>>>> >>>>>>> Flink >>>>>>> >>>>>>> using Kubernetes mode. >>>>>>> For DataStream/Python users, they need to prepare their images >>>>>>> >>>>>>> for >>>>>>> >>>>>>> the >>>>>>> >>>>>>> jars >>>>>>> >>>>>>> and dependencies. >>>>>>> But for SQL users, they can use a common image to run >>>>>>> >>>>>>> different >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> queries >>>>>>> >>>>>>> if there are no other udf requirements. >>>>>>> It would be great if the SQL query and image were not bound. >>>>>>> >>>>>>> Using strings is a way to decouple these, but just as you >>>>>>> >>>>>>> mentioned, >>>>>>> >>>>>>> it's >>>>>>> >>>>>>> not easy to pass complex SQL. >>>>>>> >>>>>>> use web submission >>>>>>> >>>>>>> AFAIK, we can not use web submission in the Application mode. >>>>>>> >>>>>>> Please >>>>>>> >>>>>>> correct me if I'm wrong. >>>>>>> >>>>>>> Best, >>>>>>> Weihua >>>>>>> >>>>>>> On Wed, May 31, 2023 at 9:37 PM Paul Lam < >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> >>>>>>> >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com>> >>>>>>> >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com> <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>>> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Biao, >>>>>>> >>>>>>> Thanks for your comments! >>>>>>> >>>>>>> 1. Scope: is this FLIP only targeted for non-interactive >>>>>>> >>>>>>> Flink >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> jobs >>>>>>> >>>>>>> in >>>>>>> >>>>>>> Application mode? More specifically, if we use SQL >>>>>>> >>>>>>> client/gateway >>>>>>> >>>>>>> to >>>>>>> >>>>>>> execute some interactive SQLs like a SELECT query, can we >>>>>>> >>>>>>> ask >>>>>>> >>>>>>> flink >>>>>>> >>>>>>> to >>>>>>> >>>>>>> use >>>>>>> >>>>>>> Application mode to execute those queries after this FLIP? >>>>>>> >>>>>>> Thanks for pointing it out. I think only DMLs would be >>>>>>> >>>>>>> executed >>>>>>> >>>>>>> via >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> Driver. >>>>>>> I'll add the scope to the FLIP. >>>>>>> >>>>>>> 2. Deployment: I believe in YARN mode, the implementation is >>>>>>> >>>>>>> trivial as >>>>>>> >>>>>>> we >>>>>>> >>>>>>> can ship files via YARN's tool easily but for K8s, things >>>>>>> >>>>>>> can >>>>>>> >>>>>>> be >>>>>>> >>>>>>> more >>>>>>> >>>>>>> complicated as Shengkai said. >>>>>>> >>>>>>> Your input is very informative. I’m thinking about using web >>>>>>> >>>>>>> submission, >>>>>>> >>>>>>> but it requires exposing the JobManager port which could also >>>>>>> >>>>>>> be >>>>>>> >>>>>>> a >>>>>>> >>>>>>> problem >>>>>>> >>>>>>> on K8s. >>>>>>> >>>>>>> Another approach is to explicitly require a distributed >>>>>>> >>>>>>> storage >>>>>>> >>>>>>> to >>>>>>> >>>>>>> ship >>>>>>> >>>>>>> files, >>>>>>> but we may need a new deployment executor for that. >>>>>>> >>>>>>> What do you think of these two approaches? >>>>>>> >>>>>>> 3. Serialization of SessionState: in SessionState, there are >>>>>>> >>>>>>> some >>>>>>> >>>>>>> unserializable fields >>>>>>> like >>>>>>> >>>>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader. >>>>>>> >>>>>>> It >>>>>>> >>>>>>> may be worthwhile to add more details about the >>>>>>> >>>>>>> serialization >>>>>>> >>>>>>> part. >>>>>>> >>>>>>> I agree. That’s a missing part. But if we use ExecNodeGraph >>>>>>> >>>>>>> as >>>>>>> >>>>>>> Shengkai >>>>>>> >>>>>>> mentioned, do we eliminate the need for serialization of >>>>>>> >>>>>>> SessionState? >>>>>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年5月31日 13:07,Biao Geng <biaoge...@gmail.com >>>>>>> <mailto:biaoge...@gmail.com> mailto:biaoge...@gmail.com <mailto: >>>>>>> biaoge...@gmail.com <mailto:biaoge...@gmail.com> >>>>>>> mailto:biaoge...@gmail.com> <mailto: >>>>>>> >>>>>>> biaoge...@gmail.com <mailto:biaoge...@gmail.com> >>>>>>> mailto:biaoge...@gmail.com <mailto:biaoge...@gmail.com >>>>>>> mailto:biaoge...@gmail.com>> <mailto: >>>>>>> >>>>>>> biaoge...@gmail.com <mailto:biaoge...@gmail.com> >>>>>>> mailto:biaoge...@gmail.com <mailto:biaoge...@gmail.com >>>>>>> mailto:biaoge...@gmail.com> <mailto: >>>>>>> biaoge...@gmail.com <mailto:biaoge...@gmail.com> >>>>>>> mailto:biaoge...@gmail.com <mailto:biaoge...@gmail.com >>>>>>> mailto:biaoge...@gmail.com>>>> 写道: >>>>>>> >>>>>>> Thanks Paul for the proposal!I believe it would be very >>>>>>> >>>>>>> useful >>>>>>> >>>>>>> for >>>>>>> >>>>>>> flink >>>>>>> >>>>>>> users. >>>>>>> After reading the FLIP, I have some questions: >>>>>>> 1. Scope: is this FLIP only targeted for non-interactive >>>>>>> >>>>>>> Flink >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> jobs >>>>>>> >>>>>>> in >>>>>>> >>>>>>> Application mode? More specifically, if we use SQL >>>>>>> >>>>>>> client/gateway >>>>>>> >>>>>>> to >>>>>>> >>>>>>> execute some interactive SQLs like a SELECT query, can we >>>>>>> >>>>>>> ask >>>>>>> >>>>>>> flink >>>>>>> >>>>>>> to >>>>>>> >>>>>>> use >>>>>>> >>>>>>> Application mode to execute those queries after this FLIP? >>>>>>> 2. Deployment: I believe in YARN mode, the implementation is >>>>>>> >>>>>>> trivial as >>>>>>> >>>>>>> we >>>>>>> >>>>>>> can ship files via YARN's tool easily but for K8s, things >>>>>>> >>>>>>> can >>>>>>> >>>>>>> be >>>>>>> >>>>>>> more >>>>>>> >>>>>>> complicated as Shengkai said. I have implemented a simple >>>>>>> >>>>>>> POC >>>>>>> >>>>>>> < >>>> >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> < >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> >>>>>>> < >>>> >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> < >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> >>>>>>> < >>>> >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> < >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> >>>>>>> < >>>> >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> < >>>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >>>> >>>>>>> based on SQL client before(i.e. consider the SQL client >>>>>>> >>>>>>> which >>>>>>> >>>>>>> supports >>>>>>> >>>>>>> executing a SQL file as the SQL driver in this FLIP). One >>>>>>> >>>>>>> problem >>>>>>> >>>>>>> I >>>>>>> >>>>>>> have >>>>>>> >>>>>>> met is how do we ship SQL files ( or Job Graph) to the k8s >>>>>>> >>>>>>> side. >>>>>>> >>>>>>> Without >>>>>>> >>>>>>> such support, users have to modify the initContainer or >>>>>>> >>>>>>> rebuild >>>>>>> >>>>>>> a >>>>>>> >>>>>>> new >>>>>>> >>>>>>> K8s >>>>>>> >>>>>>> image every time to fetch the SQL file. Like the flink k8s >>>>>>> >>>>>>> operator, >>>>>>> >>>>>>> one >>>>>>> >>>>>>> workaround is to utilize the flink config(transforming the >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> file >>>>>>> >>>>>>> to >>>>>>> >>>>>>> a >>>>>>> >>>>>>> escaped string like Weihua mentioned) which will be >>>>>>> >>>>>>> converted >>>>>>> >>>>>>> to a >>>>>>> >>>>>>> ConfigMap but K8s has size limit of ConfigMaps(no larger >>>>>>> >>>>>>> than >>>>>>> >>>>>>> 1MB >>>>>>> >>>>>>> < >>>>>>> >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ < >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/> >>>>>>> >>>>>>> < >>>>>>> >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ < >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/>> < >>>>>>> >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ < >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/> < >>>>>>> >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ < >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>>>> https://kubernetes.io/docs/concepts/configuration/configmap/>>>>). >>>>>>> >>>>>>> Not >>>>>>> >>>>>>> sure >>>>>>> >>>>>>> if we have better solutions. >>>>>>> 3. Serialization of SessionState: in SessionState, there are >>>>>>> >>>>>>> some >>>>>>> >>>>>>> unserializable fields >>>>>>> like >>>>>>> >>>>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader. >>>>>>> >>>>>>> It >>>>>>> >>>>>>> may be worthwhile to add more details about the >>>>>>> >>>>>>> serialization >>>>>>> >>>>>>> part. >>>>>>> >>>>>>> Best, >>>>>>> Biao Geng >>>>>>> >>>>>>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> >>>>>>> <mailto: >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> >>>>>>> >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com> <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>>> >>>>>>> >>>>>>> 于2023年5月31日周三 11:49写道: >>>>>>> >>>>>>> Hi Weihua, >>>>>>> >>>>>>> Thanks a lot for your input! Please see my comments inline. >>>>>>> >>>>>>> - Is SQLRunner the better name? We use this to run a SQL >>>>>>> >>>>>>> Job. >>>>>>> >>>>>>> (Not >>>>>>> >>>>>>> strong, >>>>>>> >>>>>>> the SQLDriver is fine for me) >>>>>>> >>>>>>> I’ve thought about SQL Runner but picked SQL Driver for the >>>>>>> >>>>>>> following >>>>>>> >>>>>>> reasons FYI: >>>>>>> >>>>>>> 1. I have a PythonDriver doing the same job for PyFlink [1] >>>>>>> 2. Flink program's main class is sort of like Driver in >>>>>>> >>>>>>> JDBC >>>>>>> >>>>>>> which >>>>>>> >>>>>>> translates SQLs into >>>>>>> databases specific languages. >>>>>>> >>>>>>> In general, I’m +1 for SQL Driver and +0 for SQL Runner. >>>>>>> >>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise, >>>>>>> >>>>>>> we >>>>>>> >>>>>>> need >>>>>>> >>>>>>> to >>>>>>> >>>>>>> prepare >>>>>>> >>>>>>> a SQL file in an image for Kubernetes application mode, >>>>>>> >>>>>>> which >>>>>>> >>>>>>> may >>>>>>> >>>>>>> be >>>>>>> >>>>>>> a >>>>>>> >>>>>>> bit >>>>>>> >>>>>>> cumbersome. >>>>>>> >>>>>>> Do you mean a pass the SQL string a configuration or a >>>>>>> >>>>>>> program >>>>>>> >>>>>>> argument? >>>>>>> >>>>>>> I thought it might be convenient for testing propose, but >>>>>>> >>>>>>> not >>>>>>> >>>>>>> recommended >>>>>>> >>>>>>> for production, >>>>>>> cause Flink SQLs could be complicated and involves lots of >>>>>>> >>>>>>> characters >>>>>>> >>>>>>> that >>>>>>> >>>>>>> need to escape. >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> - I noticed that we don't specify the SQLDriver jar in the >>>>>>> >>>>>>> "run-application" >>>>>>> >>>>>>> command. Does that mean we need to perform automatic >>>>>>> >>>>>>> detection >>>>>>> >>>>>>> in >>>>>>> >>>>>>> Flink? >>>>>>> >>>>>>> Yes! It’s like running a PyFlink job with the following >>>>>>> >>>>>>> command: >>>>>>> >>>>>>> `./bin/flink run \\ --pyModule table.word_count \\ --pyFiles >>>>>>> examples/python/table` >>>>>>> >>>>>>> The CLI determines if it’s a SQL job, if yes apply the SQL >>>>>>> >>>>>>> Driver >>>>>>> >>>>>>> automatically. >>>>>>> >>>>>>> [1] >>>> >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> < >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> >>>>>>> < >>>> >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> < >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> >>>>>>> < >>>> >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> < >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> >>>>>>> < >>>> >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> < >>>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.javahttps://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >>>> >>>>>>> Best, >>>>>>> Paul Lam >>>>>>> >>>>>>> 2023年5月30日 21:56,Weihua Hu <huweihua....@gmail.com >>>>>>> <mailto:huweihua....@gmail.com> mailto:huweihua....@gmail.com <mailto: >>>>>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com> >>>>>>> >>>>>>> <mailto: >>>>>>> >>>>>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com <mailto:huweihua....@gmail.com >>>>>>> mailto:huweihua....@gmail.com>> <mailto: >>>>>>> >>>>>>> huweihua....@gmail.com <mailto:huweihua....@gmail.com> >>>>>>> mailto:huweihua....@gmail.com <mailto:huweihua....@gmail.com >>>>>>> mailto:huweihua....@gmail.com>>> 写道: >>>>>>> >>>>>>> Thanks Paul for the proposal. >>>>>>> >>>>>>> +1 for this. It is valuable in improving ease of use. >>>>>>> >>>>>>> I have a few questions. >>>>>>> - Is SQLRunner the better name? We use this to run a SQL >>>>>>> >>>>>>> Job. >>>>>>> >>>>>>> (Not >>>>>>> >>>>>>> strong, >>>>>>> >>>>>>> the SQLDriver is fine for me) >>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise, >>>>>>> >>>>>>> we >>>>>>> >>>>>>> need >>>>>>> >>>>>>> to >>>>>>> >>>>>>> prepare >>>>>>> >>>>>>> a SQL file in an image for Kubernetes application mode, >>>>>>> >>>>>>> which >>>>>>> >>>>>>> may >>>>>>> >>>>>>> be >>>>>>> >>>>>>> a >>>>>>> >>>>>>> bit >>>>>>> >>>>>>> cumbersome. >>>>>>> - I noticed that we don't specify the SQLDriver jar in the >>>>>>> >>>>>>> "run-application" >>>>>>> >>>>>>> command. Does that mean we need to perform automatic >>>>>>> >>>>>>> detection >>>>>>> >>>>>>> in >>>>>>> >>>>>>> Flink? >>>>>>> >>>>>>> Best, >>>>>>> Weihua >>>>>>> >>>>>>> On Mon, May 29, 2023 at 7:24 PM Paul Lam < >>>>>>> >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com> >>>>>>> >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com >>>>>>> <mailto:paullin3...@gmail.com mailto:paullin3...@gmail.com> <mailto: >>>>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> >>>>>>> mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com >>>>>>> mailto:paullin3...@gmail.com>>>> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hi team, >>>>>>> >>>>>>> I’d like to start a discussion about FLIP-316 [1], which >>>>>>> >>>>>>> introduces >>>>>>> >>>>>>> a >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> driver as the >>>>>>> default main class for Flink SQL jobs. >>>>>>> >>>>>>> Currently, Flink SQL could be executed out of the box >>>>>>> >>>>>>> either >>>>>>> >>>>>>> via >>>>>>> >>>>>>> SQL >>>>>>> >>>>>>> Client/Gateway >>>>>>> or embedded in a Flink Java/Python program. >>>>>>> >>>>>>> However, each one has its drawback: >>>>>>> >>>>>>> - SQL Client/Gateway doesn’t support the application >>>>>>> >>>>>>> deployment >>>>>>> >>>>>>> mode >>>>>>> >>>>>>> [2] >>>>>>> >>>>>>> - Flink Java/Python program requires extra work to write >>>>>>> >>>>>>> a >>>>>>> >>>>>>> non-SQL >>>>>>> >>>>>>> program >>>>>>> >>>>>>> Therefore, I propose adding a SQL driver to act as the >>>>>>> >>>>>>> default >>>>>>> >>>>>>> main >>>>>>> >>>>>>> class >>>>>>> >>>>>>> for SQL jobs. >>>>>>> Please see the FLIP docs for details and feel free to >>>>>>> >>>>>>> comment. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> [1] >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >>>> < >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >>>> >>>>>>> < >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> < >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >>>> >>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 >>>>>>> https://issues.apache.org/jira/browse/FLINK-26541 < >>>>>>> https://issues.apache.org/jira/browse/FLINK-26541 >>>>>>> https://issues.apache.org/jira/browse/FLINK-26541> < >>>>>>> https://issues.apache.org/jira/browse/FLINK-26541 >>>>>>> https://issues.apache.org/jira/browse/FLINK-26541 < >>>>>>> https://issues.apache.org/jira/browse/FLINK-26541 >>>>>>> https://issues.apache.org/jira/browse/FLINK-26541>> >>>>>>> >>>>>>> Best, >>>>>>> Paul Lam