Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Shengkai Fang Mon, 12 Jun 2023 00:30:02 -0700

> If it’s the case, I’m good with introducing a new module and making SQL
Driver
> an internal class and accepts JSON plans only.


I rethink this again and again. I think it's better to move the SqlDriver
into the sql-gateway module because the sql client relies on the
sql-gateway to submit the sql and the sql-gateway has the ability to
generate the ExecNodeGraph now. +1 to support accepting JSON plans only.

* Upload configuration through command line parameter

ExecNodeGraph only contains the job's information but it doesn't contain
the checkpoint dir, checkpoint interval, execution mode and so on. So I
think we should also upload the configuration.

* KubernetesClusterDescripter and
KubernetesApplicationClusterEntrypoint are responsible for the jar
upload/download

+1 for the change.

Could you update the FLIP about the current discussion?

Best,
Shengkai






Yang Wang <wangyang0...@apache.org> 于2023年6月12日周一 11:41写道：

> Sorry for the late reply. I am in favor of introducing such a built-in
> resource localization mechanism
> based on Flink FileSystem. Then FLINK-28915[1] could be the second step
> which will download
> the jars and dependencies to the JobManager/TaskManager local directory
> before working.
>
> The first step could be done in another ticket in Flink. Or some external
> Flink jobs management system
> could also take care of this.
>
> [1]. https://issues.apache.org/jira/browse/FLINK-28915
>
> Best,
> Yang
>
> Paul Lam <paullin3...@gmail.com> 于2023年6月9日周五 17:39写道：
>
> > Hi Mason,
> >
> > I get your point. I'm increasingly feeling the need to introduce a
> > built-in
> > file distribution mechanism for flink-kubernetes module, just like Spark
> > does with `spark.kubernetes.file.upload.path` [1].
> >
> > I’m assuming the workflow is as follows:
> >
> > - KubernetesClusterDescripter uploads all local resources to a remote
> >   storage via Flink filesystem (skips if the resources are already
> remote).
> > - KubernetesApplicationClusterEntrypoint downloads the resources
> >   and put them in the classpath during startup.
> >
> > I wouldn't mind splitting it into another FLIP to ensure that everything
> is
> > done correctly.
> >
> > cc'ed @Yang to gather more opinions.
> >
> > [1]
> >
> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
> >
> > Best,
> > Paul Lam
> >
> > 2023年6月8日 12:15，Mason Chen <mas.chen6...@gmail.com> 写道：
> >
> > Hi Paul,
> >
> > Thanks for your response!
> >
> > I agree that utilizing SQL Drivers in Java applications is equally
> > important
> >
> > as employing them in SQL Gateway. WRT init containers, I think most
> > users use them just as a workaround. For example, wget a jar from the
> > maven repo.
> >
> > We could implement the functionality in SQL Driver in a more graceful
> > way and the flink-supported filesystem approach seems to be a
> > good choice.
> >
> >
> > My main point is: can we solve the problem with a design agnostic of SQL
> > and Stream API? I mentioned a use case where this ability is useful for
> > Java or Stream API applications. Maybe this is even a non-goal to your
> FLIP
> > since you are focusing on the driver entrypoint.
> >
> > Jark mentioned some optimizations:
> >
> > This allows SQLGateway to leverage some metadata caching and UDF JAR
> > caching for better compiling performance.
> >
> > It would be great to see this even outside the SQLGateway (i.e. UDF JAR
> > caching).
> >
> > Best,
> > Mason
> >
> > On Wed, Jun 7, 2023 at 2:26 AM Shengkai Fang <fskm...@gmail.com> wrote:
> >
> > Hi. Paul.  Thanks for your update and the update makes me understand the
> > design much better.
> >
> > But I still have some questions about the FLIP.
> >
> > For SQL Gateway, only DMLs need to be delegated to the SQL server
> > Driver. I would think about the details and update the FLIP. Do you have
> >
> > some
> >
> > ideas already?
> >
> >
> > If the applicaiton mode can not support library mode, I think we should
> > only execute INSERT INTO and UPDATE/ DELETE statement in the application
> > mode. AFAIK, we can not support ANALYZE TABLE and CALL PROCEDURE
> > statements. The ANALYZE TABLE syntax need to register the statistic to
> the
> > catalog after job finishes and the CALL PROCEDURE statement doesn't
> > generate the ExecNodeGraph.
> >
> > * Introduce storage via option `sql-gateway.application.storage-dir`
> >
> > If we can not support to submit the jars through web submission, +1 to
> > introduce the options to upload the files. While I think the uploader
> > should be responsible to remove the uploaded jars. Can we remove the jars
> > if the job is running or gateway exits?
> >
> > * JobID is not avaliable
> >
> > Can we use the returned rest client by ApplicationDeployer to query the
> job
> > id? I am concerned that users don't know which job is related to the
> > submitted SQL.
> >
> > * Do we need to introduce a new module named flink-table-sql-runner?
> >
> > It seems we need to introduce a new module. Will the new module is
> > available in the distribution package? I agree with Jark that we don't
> need
> > to introduce this for table-API users and these users have their main
> > class. If we want to make users write the k8s operator more easily, I
> think
> > we should modify the k8s operator repo. If we don't need to support SQL
> > files, can we make this jar only visible in the sql-gateway like we do in
> > the planner loader?[1]
> >
> > [1]
> >
> >
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95
> >
> > Best,
> > Shengkai
> >
> >
> >
> >
> >
> >
> >
> >
> > Weihua Hu <huweihua....@gmail.com> 于2023年6月7日周三 10:52写道：
> >
> > Hi,
> >
> > Thanks for updating the FLIP.
> >
> > I have two cents on the distribution of SQLs and resources.
> > 1. Should we support a common file distribution mechanism for k8s
> > application mode?
> >  I have seen some issues and requirements on the mailing list.
> >  In our production environment, we implement the download command in the
> > CliFrontend.
> >  And automatically add an init container to the POD for file
> >
> > downloading.
> >
> > The advantage of this
> >  is that we can use all Flink-supported file systems to store files.
> >
> >  This need more discussion. I would appreciate hearing more opinions.
> >
> > 2. In this FLIP, we distribute files in two different ways in YARN and
> > Kubernetes. Can we combine it in one way?
> >  If we don't want to implement a common file distribution for k8s
> > application mode. Could we use the SQLDriver
> >  to download the files both in YARN and K8S? IMO, this can reduce the
> >
> > cost
> >
> > of code maintenance.
> >
> > Best,
> > Weihua
> >
> >
> > On Wed, Jun 7, 2023 at 10:18 AM Paul Lam <paullin3...@gmail.com> wrote:
> >
> > Hi Mason,
> >
> > Thanks for your input!
> >
> > +1 for init containers or a more generalized way of obtaining
> >
> > arbitrary
> >
> > files. File fetching isn't specific to just SQL--it also matters for
> >
> > Java
> >
> > applications if the user doesn't want to rebuild a Flink image and
> >
> > just
> >
> > wants to modify the user application fat jar.
> >
> >
> > I agree that utilizing SQL Drivers in Java applications is equally
> > important
> > as employing them in SQL Gateway. WRT init containers, I think most
> > users use them just as a workaround. For example, wget a jar from the
> > maven repo.
> >
> > We could implement the functionality in SQL Driver in a more graceful
> > way and the flink-supported filesystem approach seems to be a
> > good choice.
> >
> > Also, what do you think about prefixing the config options with
> > `sql-driver` instead of just `sql` to be more specific?
> >
> >
> > LGTM, since SQL Driver is a public interface and the options are
> > specific to it.
> >
> > Best,
> > Paul Lam
> >
> > 2023年6月6日 06:30，Mason Chen <mas.chen6...@gmail.com> 写道：
> >
> > Hi Paul,
> >
> > +1 for this feature and supporting SQL file + JSON plans. We get a
> >
> > lot
> >
> > of
> >
> > requests to just be able to submit a SQL file, but the JSON plan
> > optimizations make sense.
> >
> > +1 for init containers or a more generalized way of obtaining
> >
> > arbitrary
> >
> > files. File fetching isn't specific to just SQL--it also matters for
> >
> > Java
> >
> > applications if the user doesn't want to rebuild a Flink image and
> >
> > just
> >
> > wants to modify the user application fat jar.
> >
> > Please note that we could reuse the checkpoint storage like S3/HDFS,
> >
> > which
> >
> > should
> >
> >
> > be required to run Flink in production, so I guess that would be
> >
> > acceptable
> >
> > for most
> >
> >
> > users. WDYT?
> >
> >
> > If you do go this route, it would be nice to support writing these
> >
> > files
> >
> > to
> >
> > S3/HDFS via Flink. This makes access control and policy management
> >
> > simpler.
> >
> >
> > Also, what do you think about prefixing the config options with
> > `sql-driver` instead of just `sql` to be more specific?
> >
> > Best,
> > Mason
> >
> > On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <paullin3...@gmail.com
> >
> > <mailto:
> >
> > paullin3...@gmail.com>> wrote:
> >
> >
> > Hi Jark,
> >
> > Thanks for your input! Please see my comments inline.
> >
> > Isn't Table API the same way as DataSream jobs to submit Flink SQL?
> > DataStream API also doesn't provide a default main class for users,
> > why do we need to provide such one for SQL?
> >
> >
> > Sorry for the confusion I caused. By DataStream jobs, I mean jobs
> >
> > submitted
> >
> > via Flink CLI which actually could be DataStream/Table jobs.
> >
> > I think a default main class would be user-friendly which eliminates
> >
> > the
> >
> > need
> > for users to write a main class as SQLRunner in Flink K8s operator
> >
> > [1].
> >
> >
> > I thought the proposed SqlDriver was a dedicated main class
> >
> > accepting
> >
> > SQL files, is
> >
> > that correct?
> >
> >
> > Both JSON plans and SQL files are accepted. SQL Gateway should use
> >
> > JSON
> >
> > plans,
> > while CLI users may use either JSON plans or SQL files.
> >
> > Please see the updated FLIP[2] for more details.
> >
> > Personally, I prefer the way of init containers which doesn't
> >
> > depend
> >
> > on
> >
> > additional components.
> > This can reduce the moving parts of a production environment.
> > Depending on a distributed file system makes the testing, demo, and
> >
> > local
> >
> > setup harder than init containers.
> >
> >
> > Please note that we could reuse the checkpoint storage like S3/HDFS,
> >
> > which
> >
> > should
> > be required to run Flink in production, so I guess that would be
> > acceptable for most
> > users. WDYT?
> >
> > WRT testing, demo, and local setups, I think we could support the
> >
> > local
> >
> > filesystem
> > scheme i.e. file://** as the state backends do. It works as long as
> >
> > SQL
> >
> > Gateway
> > and JobManager(or SQL Driver) can access the resource directory
> >
> > (specified
> >
> > via
> > `sql-gateway.application.storage-dir`).
> >
> > Thanks!
> >
> > [1]
> >
> >
> >
> >
> >
> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java
> >
> > [2]
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
> >
> > [3]
> >
> >
> >
> >
> >
> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161
> >
> >
> > Best,
> > Paul Lam
> >
> > 2023年6月3日 12:21，Jark Wu <imj...@gmail.com> 写道：
> >
> > Hi Paul,
> >
> > Thanks for your reply. I left my comments inline.
> >
> > As the FLIP said, it’s good to have a default main class for Flink
> >
> > SQLs,
> >
> > which allows users to submit Flink SQLs in the same way as
> >
> > DataStream
> >
> > jobs, or else users need to write their own main class.
> >
> >
> > Isn't Table API the same way as DataSream jobs to submit Flink SQL?
> > DataStream API also doesn't provide a default main class for users,
> > why do we need to provide such one for SQL?
> >
> > With the help of ExecNodeGraph, do we still need the serialized
> > SessionState? If not, we could make SQL Driver accepts two
> >
> > serialized
> >
> > formats:
> >
> >
> > No, ExecNodeGraph doesn't need to serialize SessionState. I thought
> >
> > the
> >
> > proposed SqlDriver was a dedicated main class accepting SQL files,
> >
> > is
> >
> > that correct?
> > If true, we have to ship the SessionState for this case which is a
> >
> > large
> >
> > work.
> > I think we just need a JsonPlanDriver which is a main class that
> >
> > accepts
> >
> > JsonPlan as the parameter.
> >
> >
> > The common solutions I know is to use distributed file systems or
> >
> > use
> >
> > init containers to localize the resources.
> >
> >
> > Personally, I prefer the way of init containers which doesn't
> >
> > depend
> >
> > on
> >
> > additional components.
> > This can reduce the moving parts of a production environment.
> > Depending on a distributed file system makes the testing, demo, and
> >
> > local
> >
> > setup harder than init containers.
> >
> > Best,
> > Jark
> >
> >
> >
> >
> > On Fri, 2 Jun 2023 at 18:10, Paul Lam <paullin3...@gmail.com
> >
> > <mailto:
> >
> > paullin3...@gmail.com> <mailto:
> >
> > paullin3...@gmail.com <mailto:paullin3...@gmail.com>>> wrote:
> >
> >
> > The FLIP is in the early phase and some details are not included,
> >
> > but
> >
> > fortunately, we got lots of valuable ideas from the discussion.
> >
> > Thanks to everyone who joined the dissuasion!
> > @Weihua @Shanmon @Shengkai @Biao @Jark
> >
> > This weekend I’m gonna revisit and update the FLIP, adding more
> > details. Hopefully, we can further align our opinions.
> >
> > Best,
> > Paul Lam
> >
> > 2023年6月2日 18:02，Paul Lam <paullin3...@gmail.com <mailto:
> >
> > paullin3...@gmail.com>> 写道：
> >
> >
> > Hi Jark,
> >
> > Thanks a lot for your input!
> >
> > If we decide to submit ExecNodeGraph instead of SQL file, is it
> >
> > still
> >
> > necessary to support SQL Driver?
> >
> >
> > I think so. Apart from usage in SQL Gateway, SQL Driver could
> >
> > simplify
> >
> > Flink SQL execution with Flink CLI.
> >
> > As the FLIP said, it’s good to have a default main class for
> >
> > Flink
> >
> > SQLs,
> >
> > which allows users to submit Flink SQLs in the same way as
> >
> > DataStream
> >
> > jobs, or else users need to write their own main class.
> >
> > SQL Driver needs to serialize SessionState which is very
> >
> > challenging
> >
> > but not detailed covered in the FLIP.
> >
> >
> > With the help of ExecNodeGraph, do we still need the serialized
> > SessionState? If not, we could make SQL Driver accepts two
> >
> > serialized
> >
> > formats:
> >
> > - SQL files for user-facing public usage
> > - ExecNodeGraph for internal usage
> >
> > It’s kind of similar to the relationship between job jars and
> >
> > jobgraphs.
> >
> >
> > Regarding "K8S doesn't support shipping multiple jars", is that
> >
> > true?
> >
> > Is it
> >
> > possible to support it?
> >
> >
> > Yes, K8s doesn’t distribute any files. It’s the users’
> >
> > responsibility
> >
> > to
> >
> > make
> >
> > sure the resources are accessible in the containers. The common
> >
> > solutions
> >
> > I know is to use distributed file systems or use init containers
> >
> > to
> >
> > localize the
> >
> > resources.
> >
> > Now I lean toward introducing a fs to do the distribution job.
> >
> > WDYT?
> >
> >
> > Best,
> > Paul Lam
> >
> > 2023年6月1日 20:33，Jark Wu <imj...@gmail.com <mailto:
> >
> > imj...@gmail.com>
> >
> > <mailto:imj...@gmail.com <mailto:imj...@gmail.com>>
> >
> > <mailto:imj...@gmail.com <mailto:imj...@gmail.com> <mailto:
> >
> > imj...@gmail.com <mailto:imj...@gmail.com>>>>
> >
> > 写道：
> >
> >
> > Hi Paul,
> >
> > Thanks for starting this discussion. I like the proposal! This
> >
> > is
> >
> > a
> >
> > frequently requested feature!
> >
> > I agree with Shengkai that ExecNodeGraph as the submission
> >
> > object
> >
> > is a
> >
> > better idea than SQL file. To be more specific, it should be
> >
> > JsonPlanGraph
> >
> > or CompiledPlan which is the serializable representation.
> >
> > CompiledPlan
> >
> > is a
> >
> > clear separation between compiling/optimization/validation and
> >
> > execution.
> >
> > This can keep the validation and metadata accessing still on the
> >
> > SQLGateway
> >
> > side. This allows SQLGateway to leverage some metadata caching
> >
> > and
> >
> > UDF
> >
> > JAR
> >
> > caching for better compiling performance.
> >
> > If we decide to submit ExecNodeGraph instead of SQL file, is it
> >
> > still
> >
> > necessary to support SQL Driver? Regarding non-interactive SQL
> >
> > jobs,
> >
> > users
> >
> > can use the Table API program for application mode. SQL Driver
> >
> > needs
> >
> > to
> >
> > serialize SessionState which is very challenging but not
> >
> > detailed
> >
> > covered
> >
> > in the FLIP.
> >
> > Regarding "K8S doesn't support shipping multiple jars", is that
> >
> > true?
> >
> > Is it
> >
> > possible to support it?
> >
> > Best,
> > Jark
> >
> >
> >
> > On Thu, 1 Jun 2023 at 16:58, Paul Lam <paullin3...@gmail.com
> >
> > <mailto:paullin3...@gmail.com> <mailto:
> >
> > paullin3...@gmail.com <mailto:paullin3...@gmail.com>> <mailto:
> >
> > paullin3...@gmail.com <mailto:paullin3...@gmail.com> <mailto:
> >
> > paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> wrote:
> >
> >
> > Hi Weihua,
> >
> > You’re right. Distributing the SQLs to the TMs is one of the
> >
> > challenging
> >
> > parts of this FLIP.
> >
> > Web submission is not enabled in application mode currently as
> >
> > you
> >
> > said,
> >
> > but it could be changed if we have good reasons.
> >
> > What do you think about introducing a distributed storage for
> >
> > SQL
> >
> > Gateway?
> >
> >
> > We could make use of Flink file systems [1] to distribute the
> >
> > SQL
> >
> > Gateway
> >
> > generated resources, that should solve the problem at its root
> >
> > cause.
> >
> >
> > Users could specify Flink-supported file systems to ship files.
> >
> > It’s
> >
> > only
> >
> > required when using SQL Gateway with K8s application mode.
> >
> > [1]
> >
> >
> >
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> > <
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> >
> > <
> >
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> > <
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> >
> >
> > <
> >
> >
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> > <
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> >
> > <
> >
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> > <
> >
> >
> >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> >
> >
> >
> >
> >
> > Best,
> > Paul Lam
> >
> > 2023年6月1日 13:55，Weihua Hu <huweihua....@gmail.com <mailto:
> >
> > huweihua....@gmail.com> <mailto:
> >
> > huweihua....@gmail.com <mailto:huweihua....@gmail.com>>> 写道：
> >
> >
> > Thanks Paul for your reply.
> >
> > SQLDriver looks good to me.
> >
> > 2. Do you mean a pass the SQL string a configuration or a
> >
> > program
> >
> > argument?
> >
> >
> >
> > I brought this up because we were unable to pass the SQL file
> >
> > to
> >
> > Flink
> >
> > using Kubernetes mode.
> > For DataStream/Python users, they need to prepare their images
> >
> > for
> >
> > the
> >
> > jars
> >
> > and dependencies.
> > But for SQL users, they can use a common image to run
> >
> > different
> >
> > SQL
> >
> > queries
> >
> > if there are no other udf requirements.
> > It would be great if the SQL query and image were not bound.
> >
> > Using strings is a way to decouple these, but just as you
> >
> > mentioned,
> >
> > it's
> >
> > not easy to pass complex SQL.
> >
> > use web submission
> >
> > AFAIK, we can not use web submission in the Application mode.
> >
> > Please
> >
> > correct me if I'm wrong.
> >
> >
> > Best,
> > Weihua
> >
> >
> > On Wed, May 31, 2023 at 9:37 PM Paul Lam <
> >
> > paullin3...@gmail.com
> >
> > <mailto:paullin3...@gmail.com>
> >
> > <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
> >
> > wrote:
> >
> >
> > Hi Biao,
> >
> > Thanks for your comments!
> >
> > 1. Scope: is this FLIP only targeted for non-interactive
> >
> > Flink
> >
> > SQL
> >
> > jobs
> >
> > in
> >
> > Application mode? More specifically, if we use SQL
> >
> > client/gateway
> >
> > to
> >
> > execute some interactive SQLs like a SELECT query, can we
> >
> > ask
> >
> > flink
> >
> > to
> >
> > use
> >
> > Application mode to execute those queries after this FLIP?
> >
> >
> > Thanks for pointing it out. I think only DMLs would be
> >
> > executed
> >
> > via
> >
> > SQL
> >
> > Driver.
> > I'll add the scope to the FLIP.
> >
> > 2. Deployment: I believe in YARN mode, the implementation is
> >
> > trivial as
> >
> > we
> >
> > can ship files via YARN's tool easily but for K8s, things
> >
> > can
> >
> > be
> >
> > more
> >
> > complicated as Shengkai said.
> >
> >
> >
> > Your input is very informative. I’m thinking about using web
> >
> > submission,
> >
> > but it requires exposing the JobManager port which could also
> >
> > be
> >
> > a
> >
> > problem
> >
> > on K8s.
> >
> > Another approach is to explicitly require a distributed
> >
> > storage
> >
> > to
> >
> > ship
> >
> > files,
> > but we may need a new deployment executor for that.
> >
> > What do you think of these two approaches?
> >
> > 3. Serialization of SessionState: in SessionState, there are
> >
> > some
> >
> > unserializable fields
> > like
> >
> > org.apache.flink.table.resource.ResourceManager#userClassLoader.
> >
> > It
> >
> > may be worthwhile to add more details about the
> >
> > serialization
> >
> > part.
> >
> >
> > I agree. That’s a missing part. But if we use ExecNodeGraph
> >
> > as
> >
> > Shengkai
> >
> > mentioned, do we eliminate the need for serialization of
> >
> > SessionState?
> >
> >
> > Best,
> > Paul Lam
> >
> > 2023年5月31日 13:07，Biao Geng <biaoge...@gmail.com <mailto:
> >
> > biaoge...@gmail.com> <mailto:
> >
> > biaoge...@gmail.com <mailto:biaoge...@gmail.com>>> 写道：
> >
> >
> > Thanks Paul for the proposal!I believe it would be very
> >
> > useful
> >
> > for
> >
> > flink
> >
> > users.
> > After reading the FLIP, I have some questions:
> > 1. Scope: is this FLIP only targeted for non-interactive
> >
> > Flink
> >
> > SQL
> >
> > jobs
> >
> > in
> >
> > Application mode? More specifically, if we use SQL
> >
> > client/gateway
> >
> > to
> >
> > execute some interactive SQLs like a SELECT query, can we
> >
> > ask
> >
> > flink
> >
> > to
> >
> > use
> >
> > Application mode to execute those queries after this FLIP?
> > 2. Deployment: I believe in YARN mode, the implementation is
> >
> > trivial as
> >
> > we
> >
> > can ship files via YARN's tool easily but for K8s, things
> >
> > can
> >
> > be
> >
> > more
> >
> > complicated as Shengkai said. I have implemented a simple
> >
> > POC
> >
> > <
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> >
> > <
> >
> >
> >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> >
> >
> > <
> >
> >
> >
> >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> >
> > <
> >
> >
> >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> >
> >
> >
> >
> > based on SQL client before(i.e. consider the SQL client
> >
> > which
> >
> > supports
> >
> > executing a SQL file as the SQL driver in this FLIP). One
> >
> > problem
> >
> > I
> >
> > have
> >
> > met is how do we ship SQL files ( or Job Graph) to the k8s
> >
> > side.
> >
> > Without
> >
> > such support, users have to modify the initContainer or
> >
> > rebuild
> >
> > a
> >
> > new
> >
> > K8s
> >
> > image every time to fetch the SQL file. Like the flink k8s
> >
> > operator,
> >
> > one
> >
> > workaround is to utilize the flink config(transforming the
> >
> > SQL
> >
> > file
> >
> > to
> >
> > a
> >
> > escaped string like Weihua mentioned) which will be
> >
> > converted
> >
> > to a
> >
> > ConfigMap but K8s has size limit of ConfigMaps(no larger
> >
> > than
> >
> > 1MB
> >
> > <
> >
> > https://kubernetes.io/docs/concepts/configuration/configmap/
> >
> > <
> >
> > https://kubernetes.io/docs/concepts/configuration/configmap/> <
> >
> > https://kubernetes.io/docs/concepts/configuration/configmap/ <
> >
> > https://kubernetes.io/docs/concepts/configuration/configmap/>>>).
> >
> > Not
> >
> > sure
> >
> > if we have better solutions.
> > 3. Serialization of SessionState: in SessionState, there are
> >
> > some
> >
> > unserializable fields
> > like
> >
> > org.apache.flink.table.resource.ResourceManager#userClassLoader.
> >
> > It
> >
> > may be worthwhile to add more details about the
> >
> > serialization
> >
> > part.
> >
> >
> > Best,
> > Biao Geng
> >
> > Paul Lam <paullin3...@gmail.com <mailto:
> >
> > paullin3...@gmail.com
> >
> >
> > <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
> >
> > 于2023年5月31日周三 11:49写道：
> >
> >
> > Hi Weihua,
> >
> > Thanks a lot for your input! Please see my comments inline.
> >
> > - Is SQLRunner the better name? We use this to run a SQL
> >
> > Job.
> >
> > (Not
> >
> > strong,
> >
> > the SQLDriver is fine for me)
> >
> >
> > I’ve thought about SQL Runner but picked SQL Driver for the
> >
> > following
> >
> > reasons FYI:
> >
> > 1. I have a PythonDriver doing the same job for PyFlink [1]
> > 2. Flink program's main class is sort of like Driver in
> >
> > JDBC
> >
> > which
> >
> > translates SQLs into
> > databases specific languages.
> >
> > In general, I’m +1 for SQL Driver and +0 for SQL Runner.
> >
> > - Could we run SQL jobs using SQL in strings? Otherwise,
> >
> > we
> >
> > need
> >
> > to
> >
> > prepare
> >
> > a SQL file in an image for Kubernetes application mode,
> >
> > which
> >
> > may
> >
> > be
> >
> > a
> >
> > bit
> >
> > cumbersome.
> >
> >
> > Do you mean a pass the SQL string a configuration or a
> >
> > program
> >
> > argument?
> >
> >
> > I thought it might be convenient for testing propose, but
> >
> > not
> >
> > recommended
> >
> > for production,
> > cause Flink SQLs could be complicated and involves lots of
> >
> > characters
> >
> > that
> >
> > need to escape.
> >
> > WDYT?
> >
> > - I noticed that we don't specify the SQLDriver jar in the
> >
> > "run-application"
> >
> > command. Does that mean we need to perform automatic
> >
> > detection
> >
> > in
> >
> > Flink?
> >
> >
> > Yes! It’s like running a PyFlink job with the following
> >
> > command:
> >
> >
> > ```
> > ./bin/flink run \
> > --pyModule table.word_count \
> > --pyFiles examples/python/table
> > ```
> >
> > The CLI determines if it’s a SQL job, if yes apply the SQL
> >
> > Driver
> >
> > automatically.
> >
> >
> > [1]
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> >
> > <
> >
> >
> >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> >
> >
> > <
> >
> >
> >
> >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> >
> > <
> >
> >
> >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> >
> >
> >
> >
> > Best,
> > Paul Lam
> >
> > 2023年5月30日 21:56，Weihua Hu <huweihua....@gmail.com
> >
> > <mailto:
> >
> > huweihua....@gmail.com> <mailto:
> >
> > huweihua....@gmail.com>> 写道：
> >
> >
> > Thanks Paul for the proposal.
> >
> > +1 for this. It is valuable in improving ease of use.
> >
> > I have a few questions.
> > - Is SQLRunner the better name? We use this to run a SQL
> >
> > Job.
> >
> > (Not
> >
> > strong,
> >
> > the SQLDriver is fine for me)
> > - Could we run SQL jobs using SQL in strings? Otherwise,
> >
> > we
> >
> > need
> >
> > to
> >
> > prepare
> >
> > a SQL file in an image for Kubernetes application mode,
> >
> > which
> >
> > may
> >
> > be
> >
> > a
> >
> > bit
> >
> > cumbersome.
> > - I noticed that we don't specify the SQLDriver jar in the
> >
> > "run-application"
> >
> > command. Does that mean we need to perform automatic
> >
> > detection
> >
> > in
> >
> > Flink?
> >
> >
> >
> > Best,
> > Weihua
> >
> >
> > On Mon, May 29, 2023 at 7:24 PM Paul Lam <
> >
> > paullin3...@gmail.com
> >
> > <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
> >
> > wrote:
> >
> >
> > Hi team,
> >
> > I’d like to start a discussion about FLIP-316 [1], which
> >
> > introduces
> >
> > a
> >
> > SQL
> >
> > driver as the
> > default main class for Flink SQL jobs.
> >
> > Currently, Flink SQL could be executed out of the box
> >
> > either
> >
> > via
> >
> > SQL
> >
> > Client/Gateway
> > or embedded in a Flink Java/Python program.
> >
> > However, each one has its drawback:
> >
> > - SQL Client/Gateway doesn’t support the application
> >
> > deployment
> >
> > mode
> >
> > [2]
> >
> > - Flink Java/Python program requires extra work to write
> >
> > a
> >
> > non-SQL
> >
> > program
> >
> >
> > Therefore, I propose adding a SQL driver to act as the
> >
> > default
> >
> > main
> >
> > class
> >
> > for SQL jobs.
> > Please see the FLIP docs for details and feel free to
> >
> > comment.
> >
> > Thanks!
> >
> >
> > [1]
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
> >
> > <
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
> >
> >
> > [2] https://issues.apache.org/jira/browse/FLINK-26541 <
> > https://issues.apache.org/jira/browse/FLINK-26541>
> >
> > Best,
> > Paul Lam
> >
> >
> >
> >
> >
> >
> >
>

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Reply via email to