Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Paul Lam Mon, 12 Jun 2023 06:02:19 -0700

Hi Yang,

Thanks a lot for your input!


It’s great that FLINK-28915 has covered the file download part. I’ve created 
a ticket for the file upload part [1]. It's a prerequisite for supporting K8s 
application 
mode for SQL Gateway.

[1] https://issues.apache.org/jira/browse/FLINK-32315

Best,
Paul Lam

> 2023年6月12日 11:40，Yang Wang <[email protected]> 写道：
> 
> Sorry for the late reply. I am in favor of introducing such a built-in
> resource localization mechanism
> based on Flink FileSystem. Then FLINK-28915[1] could be the second step
> which will download
> the jars and dependencies to the JobManager/TaskManager local directory
> before working.
> 
> The first step could be done in another ticket in Flink. Or some external
> Flink jobs management system
> could also take care of this.
> 
> [1]. https://issues.apache.org/jira/browse/FLINK-28915
> 
> Best,
> Yang
> 
> Paul Lam <[email protected]> 于2023年6月9日周五 17:39写道：
> 
>> Hi Mason,
>> 
>> I get your point. I'm increasingly feeling the need to introduce a
>> built-in
>> file distribution mechanism for flink-kubernetes module, just like Spark
>> does with `spark.kubernetes.file.upload.path` [1].
>> 
>> I’m assuming the workflow is as follows:
>> 
>> - KubernetesClusterDescripter uploads all local resources to a remote
>>  storage via Flink filesystem (skips if the resources are already remote).
>> - KubernetesApplicationClusterEntrypoint downloads the resources
>>  and put them in the classpath during startup.
>> 
>> I wouldn't mind splitting it into another FLIP to ensure that everything is
>> done correctly.
>> 
>> cc'ed @Yang to gather more opinions.
>> 
>> [1]
>> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年6月8日 12:15，Mason Chen <[email protected]> 写道：
>> 
>> Hi Paul,
>> 
>> Thanks for your response!
>> 
>> I agree that utilizing SQL Drivers in Java applications is equally
>> important
>> 
>> as employing them in SQL Gateway. WRT init containers, I think most
>> users use them just as a workaround. For example, wget a jar from the
>> maven repo.
>> 
>> We could implement the functionality in SQL Driver in a more graceful
>> way and the flink-supported filesystem approach seems to be a
>> good choice.
>> 
>> 
>> My main point is: can we solve the problem with a design agnostic of SQL
>> and Stream API? I mentioned a use case where this ability is useful for
>> Java or Stream API applications. Maybe this is even a non-goal to your FLIP
>> since you are focusing on the driver entrypoint.
>> 
>> Jark mentioned some optimizations:
>> 
>> This allows SQLGateway to leverage some metadata caching and UDF JAR
>> caching for better compiling performance.
>> 
>> It would be great to see this even outside the SQLGateway (i.e. UDF JAR
>> caching).
>> 
>> Best,
>> Mason
>> 
>> On Wed, Jun 7, 2023 at 2:26 AM Shengkai Fang <[email protected]> wrote:
>> 
>> Hi. Paul.  Thanks for your update and the update makes me understand the
>> design much better.
>> 
>> But I still have some questions about the FLIP.
>> 
>> For SQL Gateway, only DMLs need to be delegated to the SQL server
>> Driver. I would think about the details and update the FLIP. Do you have
>> 
>> some
>> 
>> ideas already?
>> 
>> 
>> If the applicaiton mode can not support library mode, I think we should
>> only execute INSERT INTO and UPDATE/ DELETE statement in the application
>> mode. AFAIK, we can not support ANALYZE TABLE and CALL PROCEDURE
>> statements. The ANALYZE TABLE syntax need to register the statistic to the
>> catalog after job finishes and the CALL PROCEDURE statement doesn't
>> generate the ExecNodeGraph.
>> 
>> * Introduce storage via option `sql-gateway.application.storage-dir`
>> 
>> If we can not support to submit the jars through web submission, +1 to
>> introduce the options to upload the files. While I think the uploader
>> should be responsible to remove the uploaded jars. Can we remove the jars
>> if the job is running or gateway exits?
>> 
>> * JobID is not avaliable
>> 
>> Can we use the returned rest client by ApplicationDeployer to query the job
>> id? I am concerned that users don't know which job is related to the
>> submitted SQL.
>> 
>> * Do we need to introduce a new module named flink-table-sql-runner?
>> 
>> It seems we need to introduce a new module. Will the new module is
>> available in the distribution package? I agree with Jark that we don't need
>> to introduce this for table-API users and these users have their main
>> class. If we want to make users write the k8s operator more easily, I think
>> we should modify the k8s operator repo. If we don't need to support SQL
>> files, can we make this jar only visible in the sql-gateway like we do in
>> the planner loader?[1]
>> 
>> [1]
>> 
>> 
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95
>> 
>> Best,
>> Shengkai
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Weihua Hu <[email protected]> 于2023年6月7日周三 10:52写道：
>> 
>> Hi,
>> 
>> Thanks for updating the FLIP.
>> 
>> I have two cents on the distribution of SQLs and resources.
>> 1. Should we support a common file distribution mechanism for k8s
>> application mode?
>> I have seen some issues and requirements on the mailing list.
>> In our production environment, we implement the download command in the
>> CliFrontend.
>> And automatically add an init container to the POD for file
>> 
>> downloading.
>> 
>> The advantage of this
>> is that we can use all Flink-supported file systems to store files.
>> 
>> This need more discussion. I would appreciate hearing more opinions.
>> 
>> 2. In this FLIP, we distribute files in two different ways in YARN and
>> Kubernetes. Can we combine it in one way?
>> If we don't want to implement a common file distribution for k8s
>> application mode. Could we use the SQLDriver
>> to download the files both in YARN and K8S? IMO, this can reduce the
>> 
>> cost
>> 
>> of code maintenance.
>> 
>> Best,
>> Weihua
>> 
>> 
>> On Wed, Jun 7, 2023 at 10:18 AM Paul Lam <[email protected]> wrote:
>> 
>> Hi Mason,
>> 
>> Thanks for your input!
>> 
>> +1 for init containers or a more generalized way of obtaining
>> 
>> arbitrary
>> 
>> files. File fetching isn't specific to just SQL--it also matters for
>> 
>> Java
>> 
>> applications if the user doesn't want to rebuild a Flink image and
>> 
>> just
>> 
>> wants to modify the user application fat jar.
>> 
>> 
>> I agree that utilizing SQL Drivers in Java applications is equally
>> important
>> as employing them in SQL Gateway. WRT init containers, I think most
>> users use them just as a workaround. For example, wget a jar from the
>> maven repo.
>> 
>> We could implement the functionality in SQL Driver in a more graceful
>> way and the flink-supported filesystem approach seems to be a
>> good choice.
>> 
>> Also, what do you think about prefixing the config options with
>> `sql-driver` instead of just `sql` to be more specific?
>> 
>> 
>> LGTM, since SQL Driver is a public interface and the options are
>> specific to it.
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年6月6日 06:30，Mason Chen <[email protected]> 写道：
>> 
>> Hi Paul,
>> 
>> +1 for this feature and supporting SQL file + JSON plans. We get a
>> 
>> lot
>> 
>> of
>> 
>> requests to just be able to submit a SQL file, but the JSON plan
>> optimizations make sense.
>> 
>> +1 for init containers or a more generalized way of obtaining
>> 
>> arbitrary
>> 
>> files. File fetching isn't specific to just SQL--it also matters for
>> 
>> Java
>> 
>> applications if the user doesn't want to rebuild a Flink image and
>> 
>> just
>> 
>> wants to modify the user application fat jar.
>> 
>> Please note that we could reuse the checkpoint storage like S3/HDFS,
>> 
>> which
>> 
>> should
>> 
>> 
>> be required to run Flink in production, so I guess that would be
>> 
>> acceptable
>> 
>> for most
>> 
>> 
>> users. WDYT?
>> 
>> 
>> If you do go this route, it would be nice to support writing these
>> 
>> files
>> 
>> to
>> 
>> S3/HDFS via Flink. This makes access control and policy management
>> 
>> simpler.
>> 
>> 
>> Also, what do you think about prefixing the config options with
>> `sql-driver` instead of just `sql` to be more specific?
>> 
>> Best,
>> Mason
>> 
>> On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <[email protected]
>> 
>> <mailto:
>> 
>> [email protected]>> wrote:
>> 
>> 
>> Hi Jark,
>> 
>> Thanks for your input! Please see my comments inline.
>> 
>> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
>> DataStream API also doesn't provide a default main class for users,
>> why do we need to provide such one for SQL?
>> 
>> 
>> Sorry for the confusion I caused. By DataStream jobs, I mean jobs
>> 
>> submitted
>> 
>> via Flink CLI which actually could be DataStream/Table jobs.
>> 
>> I think a default main class would be user-friendly which eliminates
>> 
>> the
>> 
>> need
>> for users to write a main class as SQLRunner in Flink K8s operator
>> 
>> [1].
>> 
>> 
>> I thought the proposed SqlDriver was a dedicated main class
>> 
>> accepting
>> 
>> SQL files, is
>> 
>> that correct?
>> 
>> 
>> Both JSON plans and SQL files are accepted. SQL Gateway should use
>> 
>> JSON
>> 
>> plans,
>> while CLI users may use either JSON plans or SQL files.
>> 
>> Please see the updated FLIP[2] for more details.
>> 
>> Personally, I prefer the way of init containers which doesn't
>> 
>> depend
>> 
>> on
>> 
>> additional components.
>> This can reduce the moving parts of a production environment.
>> Depending on a distributed file system makes the testing, demo, and
>> 
>> local
>> 
>> setup harder than init containers.
>> 
>> 
>> Please note that we could reuse the checkpoint storage like S3/HDFS,
>> 
>> which
>> 
>> should
>> be required to run Flink in production, so I guess that would be
>> acceptable for most
>> users. WDYT?
>> 
>> WRT testing, demo, and local setups, I think we could support the
>> 
>> local
>> 
>> filesystem
>> scheme i.e. file://** as the state backends do. It works as long as
>> 
>> SQL
>> 
>> Gateway
>> and JobManager(or SQL Driver) can access the resource directory
>> 
>> (specified
>> 
>> via
>> `sql-gateway.application.storage-dir`).
>> 
>> Thanks!
>> 
>> [1]
>> 
>> 
>> 
>> 
>> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java
>> 
>> [2]
>> 
>> 
>> 
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
>> 
>> [3]
>> 
>> 
>> 
>> 
>> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161
>> 
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年6月3日 12:21，Jark Wu <[email protected]> 写道：
>> 
>> Hi Paul,
>> 
>> Thanks for your reply. I left my comments inline.
>> 
>> As the FLIP said, it’s good to have a default main class for Flink
>> 
>> SQLs,
>> 
>> which allows users to submit Flink SQLs in the same way as
>> 
>> DataStream
>> 
>> jobs, or else users need to write their own main class.
>> 
>> 
>> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
>> DataStream API also doesn't provide a default main class for users,
>> why do we need to provide such one for SQL?
>> 
>> With the help of ExecNodeGraph, do we still need the serialized
>> SessionState? If not, we could make SQL Driver accepts two
>> 
>> serialized
>> 
>> formats:
>> 
>> 
>> No, ExecNodeGraph doesn't need to serialize SessionState. I thought
>> 
>> the
>> 
>> proposed SqlDriver was a dedicated main class accepting SQL files,
>> 
>> is
>> 
>> that correct?
>> If true, we have to ship the SessionState for this case which is a
>> 
>> large
>> 
>> work.
>> I think we just need a JsonPlanDriver which is a main class that
>> 
>> accepts
>> 
>> JsonPlan as the parameter.
>> 
>> 
>> The common solutions I know is to use distributed file systems or
>> 
>> use
>> 
>> init containers to localize the resources.
>> 
>> 
>> Personally, I prefer the way of init containers which doesn't
>> 
>> depend
>> 
>> on
>> 
>> additional components.
>> This can reduce the moving parts of a production environment.
>> Depending on a distributed file system makes the testing, demo, and
>> 
>> local
>> 
>> setup harder than init containers.
>> 
>> Best,
>> Jark
>> 
>> 
>> 
>> 
>> On Fri, 2 Jun 2023 at 18:10, Paul Lam <[email protected]
>> 
>> <mailto:
>> 
>> [email protected]> <mailto:
>> 
>> [email protected] <mailto:[email protected]>>> wrote:
>> 
>> 
>> The FLIP is in the early phase and some details are not included,
>> 
>> but
>> 
>> fortunately, we got lots of valuable ideas from the discussion.
>> 
>> Thanks to everyone who joined the dissuasion!
>> @Weihua @Shanmon @Shengkai @Biao @Jark
>> 
>> This weekend I’m gonna revisit and update the FLIP, adding more
>> details. Hopefully, we can further align our opinions.
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年6月2日 18:02，Paul Lam <[email protected] <mailto:
>> 
>> [email protected]>> 写道：
>> 
>> 
>> Hi Jark,
>> 
>> Thanks a lot for your input!
>> 
>> If we decide to submit ExecNodeGraph instead of SQL file, is it
>> 
>> still
>> 
>> necessary to support SQL Driver?
>> 
>> 
>> I think so. Apart from usage in SQL Gateway, SQL Driver could
>> 
>> simplify
>> 
>> Flink SQL execution with Flink CLI.
>> 
>> As the FLIP said, it’s good to have a default main class for
>> 
>> Flink
>> 
>> SQLs,
>> 
>> which allows users to submit Flink SQLs in the same way as
>> 
>> DataStream
>> 
>> jobs, or else users need to write their own main class.
>> 
>> SQL Driver needs to serialize SessionState which is very
>> 
>> challenging
>> 
>> but not detailed covered in the FLIP.
>> 
>> 
>> With the help of ExecNodeGraph, do we still need the serialized
>> SessionState? If not, we could make SQL Driver accepts two
>> 
>> serialized
>> 
>> formats:
>> 
>> - SQL files for user-facing public usage
>> - ExecNodeGraph for internal usage
>> 
>> It’s kind of similar to the relationship between job jars and
>> 
>> jobgraphs.
>> 
>> 
>> Regarding "K8S doesn't support shipping multiple jars", is that
>> 
>> true?
>> 
>> Is it
>> 
>> possible to support it?
>> 
>> 
>> Yes, K8s doesn’t distribute any files. It’s the users’
>> 
>> responsibility
>> 
>> to
>> 
>> make
>> 
>> sure the resources are accessible in the containers. The common
>> 
>> solutions
>> 
>> I know is to use distributed file systems or use init containers
>> 
>> to
>> 
>> localize the
>> 
>> resources.
>> 
>> Now I lean toward introducing a fs to do the distribution job.
>> 
>> WDYT?
>> 
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年6月1日 20:33，Jark Wu <[email protected] <mailto:
>> 
>> [email protected]>
>> 
>> <mailto:[email protected] <mailto:[email protected]>>
>> 
>> <mailto:[email protected] <mailto:[email protected]> <mailto:
>> 
>> [email protected] <mailto:[email protected]>>>>
>> 
>> 写道：
>> 
>> 
>> Hi Paul,
>> 
>> Thanks for starting this discussion. I like the proposal! This
>> 
>> is
>> 
>> a
>> 
>> frequently requested feature!
>> 
>> I agree with Shengkai that ExecNodeGraph as the submission
>> 
>> object
>> 
>> is a
>> 
>> better idea than SQL file. To be more specific, it should be
>> 
>> JsonPlanGraph
>> 
>> or CompiledPlan which is the serializable representation.
>> 
>> CompiledPlan
>> 
>> is a
>> 
>> clear separation between compiling/optimization/validation and
>> 
>> execution.
>> 
>> This can keep the validation and metadata accessing still on the
>> 
>> SQLGateway
>> 
>> side. This allows SQLGateway to leverage some metadata caching
>> 
>> and
>> 
>> UDF
>> 
>> JAR
>> 
>> caching for better compiling performance.
>> 
>> If we decide to submit ExecNodeGraph instead of SQL file, is it
>> 
>> still
>> 
>> necessary to support SQL Driver? Regarding non-interactive SQL
>> 
>> jobs,
>> 
>> users
>> 
>> can use the Table API program for application mode. SQL Driver
>> 
>> needs
>> 
>> to
>> 
>> serialize SessionState which is very challenging but not
>> 
>> detailed
>> 
>> covered
>> 
>> in the FLIP.
>> 
>> Regarding "K8S doesn't support shipping multiple jars", is that
>> 
>> true?
>> 
>> Is it
>> 
>> possible to support it?
>> 
>> Best,
>> Jark
>> 
>> 
>> 
>> On Thu, 1 Jun 2023 at 16:58, Paul Lam <[email protected]
>> 
>> <mailto:[email protected]> <mailto:
>> 
>> [email protected] <mailto:[email protected]>> <mailto:
>> 
>> [email protected] <mailto:[email protected]> <mailto:
>> 
>> [email protected] <mailto:[email protected]>>>> wrote:
>> 
>> 
>> Hi Weihua,
>> 
>> You’re right. Distributing the SQLs to the TMs is one of the
>> 
>> challenging
>> 
>> parts of this FLIP.
>> 
>> Web submission is not enabled in application mode currently as
>> 
>> you
>> 
>> said,
>> 
>> but it could be changed if we have good reasons.
>> 
>> What do you think about introducing a distributed storage for
>> 
>> SQL
>> 
>> Gateway?
>> 
>> 
>> We could make use of Flink file systems [1] to distribute the
>> 
>> SQL
>> 
>> Gateway
>> 
>> generated resources, that should solve the problem at its root
>> 
>> cause.
>> 
>> 
>> Users could specify Flink-supported file systems to ship files.
>> 
>> It’s
>> 
>> only
>> 
>> required when using SQL Gateway with K8s application mode.
>> 
>> [1]
>> 
>> 
>> 
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> <
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> 
>> <
>> 
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> <
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> 
>> 
>> <
>> 
>> 
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> <
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> 
>> <
>> 
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> <
>> 
>> 
>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>> 
>> 
>> 
>> 
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年6月1日 13:55，Weihua Hu <[email protected] <mailto:
>> 
>> [email protected]> <mailto:
>> 
>> [email protected] <mailto:[email protected]>>> 写道：
>> 
>> 
>> Thanks Paul for your reply.
>> 
>> SQLDriver looks good to me.
>> 
>> 2. Do you mean a pass the SQL string a configuration or a
>> 
>> program
>> 
>> argument?
>> 
>> 
>> 
>> I brought this up because we were unable to pass the SQL file
>> 
>> to
>> 
>> Flink
>> 
>> using Kubernetes mode.
>> For DataStream/Python users, they need to prepare their images
>> 
>> for
>> 
>> the
>> 
>> jars
>> 
>> and dependencies.
>> But for SQL users, they can use a common image to run
>> 
>> different
>> 
>> SQL
>> 
>> queries
>> 
>> if there are no other udf requirements.
>> It would be great if the SQL query and image were not bound.
>> 
>> Using strings is a way to decouple these, but just as you
>> 
>> mentioned,
>> 
>> it's
>> 
>> not easy to pass complex SQL.
>> 
>> use web submission
>> 
>> AFAIK, we can not use web submission in the Application mode.
>> 
>> Please
>> 
>> correct me if I'm wrong.
>> 
>> 
>> Best,
>> Weihua
>> 
>> 
>> On Wed, May 31, 2023 at 9:37 PM Paul Lam <
>> 
>> [email protected]
>> 
>> <mailto:[email protected]>
>> 
>> <mailto:[email protected] <mailto:[email protected]>>>
>> 
>> wrote:
>> 
>> 
>> Hi Biao,
>> 
>> Thanks for your comments!
>> 
>> 1. Scope: is this FLIP only targeted for non-interactive
>> 
>> Flink
>> 
>> SQL
>> 
>> jobs
>> 
>> in
>> 
>> Application mode? More specifically, if we use SQL
>> 
>> client/gateway
>> 
>> to
>> 
>> execute some interactive SQLs like a SELECT query, can we
>> 
>> ask
>> 
>> flink
>> 
>> to
>> 
>> use
>> 
>> Application mode to execute those queries after this FLIP?
>> 
>> 
>> Thanks for pointing it out. I think only DMLs would be
>> 
>> executed
>> 
>> via
>> 
>> SQL
>> 
>> Driver.
>> I'll add the scope to the FLIP.
>> 
>> 2. Deployment: I believe in YARN mode, the implementation is
>> 
>> trivial as
>> 
>> we
>> 
>> can ship files via YARN's tool easily but for K8s, things
>> 
>> can
>> 
>> be
>> 
>> more
>> 
>> complicated as Shengkai said.
>> 
>> 
>> 
>> Your input is very informative. I’m thinking about using web
>> 
>> submission,
>> 
>> but it requires exposing the JobManager port which could also
>> 
>> be
>> 
>> a
>> 
>> problem
>> 
>> on K8s.
>> 
>> Another approach is to explicitly require a distributed
>> 
>> storage
>> 
>> to
>> 
>> ship
>> 
>> files,
>> but we may need a new deployment executor for that.
>> 
>> What do you think of these two approaches?
>> 
>> 3. Serialization of SessionState: in SessionState, there are
>> 
>> some
>> 
>> unserializable fields
>> like
>> 
>> org.apache.flink.table.resource.ResourceManager#userClassLoader.
>> 
>> It
>> 
>> may be worthwhile to add more details about the
>> 
>> serialization
>> 
>> part.
>> 
>> 
>> I agree. That’s a missing part. But if we use ExecNodeGraph
>> 
>> as
>> 
>> Shengkai
>> 
>> mentioned, do we eliminate the need for serialization of
>> 
>> SessionState?
>> 
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年5月31日 13:07，Biao Geng <[email protected] <mailto:
>> 
>> [email protected]> <mailto:
>> 
>> [email protected] <mailto:[email protected]>>> 写道：
>> 
>> 
>> Thanks Paul for the proposal!I believe it would be very
>> 
>> useful
>> 
>> for
>> 
>> flink
>> 
>> users.
>> After reading the FLIP, I have some questions:
>> 1. Scope: is this FLIP only targeted for non-interactive
>> 
>> Flink
>> 
>> SQL
>> 
>> jobs
>> 
>> in
>> 
>> Application mode? More specifically, if we use SQL
>> 
>> client/gateway
>> 
>> to
>> 
>> execute some interactive SQLs like a SELECT query, can we
>> 
>> ask
>> 
>> flink
>> 
>> to
>> 
>> use
>> 
>> Application mode to execute those queries after this FLIP?
>> 2. Deployment: I believe in YARN mode, the implementation is
>> 
>> trivial as
>> 
>> we
>> 
>> can ship files via YARN's tool easily but for K8s, things
>> 
>> can
>> 
>> be
>> 
>> more
>> 
>> complicated as Shengkai said. I have implemented a simple
>> 
>> POC
>> 
>> <
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>> 
>> <
>> 
>> 
>> 
>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>> 
>> 
>> <
>> 
>> 
>> 
>> 
>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>> 
>> <
>> 
>> 
>> 
>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>> 
>> 
>> 
>> 
>> based on SQL client before(i.e. consider the SQL client
>> 
>> which
>> 
>> supports
>> 
>> executing a SQL file as the SQL driver in this FLIP). One
>> 
>> problem
>> 
>> I
>> 
>> have
>> 
>> met is how do we ship SQL files ( or Job Graph) to the k8s
>> 
>> side.
>> 
>> Without
>> 
>> such support, users have to modify the initContainer or
>> 
>> rebuild
>> 
>> a
>> 
>> new
>> 
>> K8s
>> 
>> image every time to fetch the SQL file. Like the flink k8s
>> 
>> operator,
>> 
>> one
>> 
>> workaround is to utilize the flink config(transforming the
>> 
>> SQL
>> 
>> file
>> 
>> to
>> 
>> a
>> 
>> escaped string like Weihua mentioned) which will be
>> 
>> converted
>> 
>> to a
>> 
>> ConfigMap but K8s has size limit of ConfigMaps(no larger
>> 
>> than
>> 
>> 1MB
>> 
>> <
>> 
>> https://kubernetes.io/docs/concepts/configuration/configmap/
>> 
>> <
>> 
>> https://kubernetes.io/docs/concepts/configuration/configmap/> <
>> 
>> https://kubernetes.io/docs/concepts/configuration/configmap/ <
>> 
>> https://kubernetes.io/docs/concepts/configuration/configmap/>>>).
>> 
>> Not
>> 
>> sure
>> 
>> if we have better solutions.
>> 3. Serialization of SessionState: in SessionState, there are
>> 
>> some
>> 
>> unserializable fields
>> like
>> 
>> org.apache.flink.table.resource.ResourceManager#userClassLoader.
>> 
>> It
>> 
>> may be worthwhile to add more details about the
>> 
>> serialization
>> 
>> part.
>> 
>> 
>> Best,
>> Biao Geng
>> 
>> Paul Lam <[email protected] <mailto:
>> 
>> [email protected]
>> 
>> 
>> <mailto:[email protected] <mailto:[email protected]>>>
>> 
>> 于2023年5月31日周三 11:49写道：
>> 
>> 
>> Hi Weihua,
>> 
>> Thanks a lot for your input! Please see my comments inline.
>> 
>> - Is SQLRunner the better name? We use this to run a SQL
>> 
>> Job.
>> 
>> (Not
>> 
>> strong,
>> 
>> the SQLDriver is fine for me)
>> 
>> 
>> I’ve thought about SQL Runner but picked SQL Driver for the
>> 
>> following
>> 
>> reasons FYI:
>> 
>> 1. I have a PythonDriver doing the same job for PyFlink [1]
>> 2. Flink program's main class is sort of like Driver in
>> 
>> JDBC
>> 
>> which
>> 
>> translates SQLs into
>> databases specific languages.
>> 
>> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
>> 
>> - Could we run SQL jobs using SQL in strings? Otherwise,
>> 
>> we
>> 
>> need
>> 
>> to
>> 
>> prepare
>> 
>> a SQL file in an image for Kubernetes application mode,
>> 
>> which
>> 
>> may
>> 
>> be
>> 
>> a
>> 
>> bit
>> 
>> cumbersome.
>> 
>> 
>> Do you mean a pass the SQL string a configuration or a
>> 
>> program
>> 
>> argument?
>> 
>> 
>> I thought it might be convenient for testing propose, but
>> 
>> not
>> 
>> recommended
>> 
>> for production,
>> cause Flink SQLs could be complicated and involves lots of
>> 
>> characters
>> 
>> that
>> 
>> need to escape.
>> 
>> WDYT?
>> 
>> - I noticed that we don't specify the SQLDriver jar in the
>> 
>> "run-application"
>> 
>> command. Does that mean we need to perform automatic
>> 
>> detection
>> 
>> in
>> 
>> Flink?
>> 
>> 
>> Yes! It’s like running a PyFlink job with the following
>> 
>> command:
>> 
>> 
>> ```
>> ./bin/flink run \
>> --pyModule table.word_count \
>> --pyFiles examples/python/table
>> ```
>> 
>> The CLI determines if it’s a SQL job, if yes apply the SQL
>> 
>> Driver
>> 
>> automatically.
>> 
>> 
>> [1]
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>> 
>> <
>> 
>> 
>> 
>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>> 
>> 
>> <
>> 
>> 
>> 
>> 
>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>> 
>> <
>> 
>> 
>> 
>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>> 
>> 
>> 
>> 
>> Best,
>> Paul Lam
>> 
>> 2023年5月30日 21:56，Weihua Hu <[email protected]
>> 
>> <mailto:
>> 
>> [email protected]> <mailto:
>> 
>> [email protected]>> 写道：
>> 
>> 
>> Thanks Paul for the proposal.
>> 
>> +1 for this. It is valuable in improving ease of use.
>> 
>> I have a few questions.
>> - Is SQLRunner the better name? We use this to run a SQL
>> 
>> Job.
>> 
>> (Not
>> 
>> strong,
>> 
>> the SQLDriver is fine for me)
>> - Could we run SQL jobs using SQL in strings? Otherwise,
>> 
>> we
>> 
>> need
>> 
>> to
>> 
>> prepare
>> 
>> a SQL file in an image for Kubernetes application mode,
>> 
>> which
>> 
>> may
>> 
>> be
>> 
>> a
>> 
>> bit
>> 
>> cumbersome.
>> - I noticed that we don't specify the SQLDriver jar in the
>> 
>> "run-application"
>> 
>> command. Does that mean we need to perform automatic
>> 
>> detection
>> 
>> in
>> 
>> Flink?
>> 
>> 
>> 
>> Best,
>> Weihua
>> 
>> 
>> On Mon, May 29, 2023 at 7:24 PM Paul Lam <
>> 
>> [email protected]
>> 
>> <mailto:[email protected] <mailto:[email protected]>>>
>> 
>> wrote:
>> 
>> 
>> Hi team,
>> 
>> I’d like to start a discussion about FLIP-316 [1], which
>> 
>> introduces
>> 
>> a
>> 
>> SQL
>> 
>> driver as the
>> default main class for Flink SQL jobs.
>> 
>> Currently, Flink SQL could be executed out of the box
>> 
>> either
>> 
>> via
>> 
>> SQL
>> 
>> Client/Gateway
>> or embedded in a Flink Java/Python program.
>> 
>> However, each one has its drawback:
>> 
>> - SQL Client/Gateway doesn’t support the application
>> 
>> deployment
>> 
>> mode
>> 
>> [2]
>> 
>> - Flink Java/Python program requires extra work to write
>> 
>> a
>> 
>> non-SQL
>> 
>> program
>> 
>> 
>> Therefore, I propose adding a SQL driver to act as the
>> 
>> default
>> 
>> main
>> 
>> class
>> 
>> for SQL jobs.
>> Please see the FLIP docs for details and feel free to
>> 
>> comment.
>> 
>> Thanks!
>> 
>> 
>> [1]
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
>> 
>> <
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
>> 
>> 
>> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
>> https://issues.apache.org/jira/browse/FLINK-26541>
>> 
>> Best,
>> Paul Lam
>> 
>> 
>> 
>> 
>> 
>> 
>>

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Reply via email to