Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Paul Lam Tue, 06 Jun 2023 19:18:20 -0700

Hi Mason,

Thanks for your input!


> +1 for init containers or a more generalized way of obtaining arbitrary
> files. File fetching isn't specific to just SQL--it also matters for Java
> applications if the user doesn't want to rebuild a Flink image and just
> wants to modify the user application fat jar.

I agree that utilizing SQL Drivers in Java applications is equally important
as employing them in SQL Gateway. WRT init containers, I think most
users use them just as a workaround. For example, wget a jar from the 
maven repo.

We could implement the functionality in SQL Driver in a more graceful
way and the flink-supported filesystem approach seems to be a 
good choice.

> Also, what do you think about prefixing the config options with
> `sql-driver` instead of just `sql` to be more specific?

LGTM, since SQL Driver is a public interface and the options are
specific to it.

Best,
Paul Lam

> 2023年6月6日 06:30，Mason Chen <mas.chen6...@gmail.com> 写道：
> 
> Hi Paul,
> 
> +1 for this feature and supporting SQL file + JSON plans. We get a lot of
> requests to just be able to submit a SQL file, but the JSON plan
> optimizations make sense.
> 
> +1 for init containers or a more generalized way of obtaining arbitrary
> files. File fetching isn't specific to just SQL--it also matters for Java
> applications if the user doesn't want to rebuild a Flink image and just
> wants to modify the user application fat jar.
> 
> Please note that we could reuse the checkpoint storage like S3/HDFS, which
>> should
> 
> be required to run Flink in production, so I guess that would be acceptable
>> for most
> 
> users. WDYT?
> 
> 
> If you do go this route, it would be nice to support writing these files to
> S3/HDFS via Flink. This makes access control and policy management simpler.
> 
> Also, what do you think about prefixing the config options with
> `sql-driver` instead of just `sql` to be more specific?
> 
> Best,
> Mason
> 
> On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <paullin3...@gmail.com 
> <mailto:paullin3...@gmail.com>> wrote:
> 
>> Hi Jark,
>> 
>> Thanks for your input! Please see my comments inline.
>> 
>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
>>> DataStream API also doesn't provide a default main class for users,
>>> why do we need to provide such one for SQL?
>> 
>> Sorry for the confusion I caused. By DataStream jobs, I mean jobs submitted
>> via Flink CLI which actually could be DataStream/Table jobs.
>> 
>> I think a default main class would be user-friendly which eliminates the
>> need
>> for users to write a main class as SQLRunner in Flink K8s operator [1].
>> 
>>> I thought the proposed SqlDriver was a dedicated main class accepting
>> SQL files, is
>>> that correct?
>> 
>> Both JSON plans and SQL files are accepted. SQL Gateway should use JSON
>> plans,
>> while CLI users may use either JSON plans or SQL files.
>> 
>> Please see the updated FLIP[2] for more details.
>> 
>>> Personally, I prefer the way of init containers which doesn't depend on
>>> additional components.
>>> This can reduce the moving parts of a production environment.
>>> Depending on a distributed file system makes the testing, demo, and local
>>> setup harder than init containers.
>> 
>> Please note that we could reuse the checkpoint storage like S3/HDFS, which
>> should
>> be required to run Flink in production, so I guess that would be
>> acceptable for most
>> users. WDYT?
>> 
>> WRT testing, demo, and local setups, I think we could support the local
>> filesystem
>> scheme i.e. file://** as the state backends do. It works as long as SQL
>> Gateway
>> and JobManager(or SQL Driver) can access the resource directory (specified
>> via
>> `sql-gateway.application.storage-dir`).
>> 
>> Thanks!
>> 
>> [1]
>> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java
>> [2]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
>> [3]
>> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161
>> 
>> Best,
>> Paul Lam
>> 
>>> 2023年6月3日 12:21，Jark Wu <imj...@gmail.com> 写道：
>>> 
>>> Hi Paul,
>>> 
>>> Thanks for your reply. I left my comments inline.
>>> 
>>>> As the FLIP said, it’s good to have a default main class for Flink SQLs,
>>>> which allows users to submit Flink SQLs in the same way as DataStream
>>>> jobs, or else users need to write their own main class.
>>> 
>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
>>> DataStream API also doesn't provide a default main class for users,
>>> why do we need to provide such one for SQL?
>>> 
>>>> With the help of ExecNodeGraph, do we still need the serialized
>>>> SessionState? If not, we could make SQL Driver accepts two serialized
>>>> formats:
>>> 
>>> No, ExecNodeGraph doesn't need to serialize SessionState. I thought the
>>> proposed SqlDriver was a dedicated main class accepting SQL files, is
>>> that correct?
>>> If true, we have to ship the SessionState for this case which is a large
>>> work.
>>> I think we just need a JsonPlanDriver which is a main class that accepts
>>> JsonPlan as the parameter.
>>> 
>>> 
>>>> The common solutions I know is to use distributed file systems or use
>>>> init containers to localize the resources.
>>> 
>>> Personally, I prefer the way of init containers which doesn't depend on
>>> additional components.
>>> This can reduce the moving parts of a production environment.
>>> Depending on a distributed file system makes the testing, demo, and local
>>> setup harder than init containers.
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> 
>>> 
>>> On Fri, 2 Jun 2023 at 18:10, Paul Lam <paullin3...@gmail.com 
>>> <mailto:paullin3...@gmail.com> <mailto:
>> paullin3...@gmail.com <mailto:paullin3...@gmail.com>>> wrote:
>>> 
>>>> The FLIP is in the early phase and some details are not included, but
>>>> fortunately, we got lots of valuable ideas from the discussion.
>>>> 
>>>> Thanks to everyone who joined the dissuasion!
>>>> @Weihua @Shanmon @Shengkai @Biao @Jark
>>>> 
>>>> This weekend I’m gonna revisit and update the FLIP, adding more
>>>> details. Hopefully, we can further align our opinions.
>>>> 
>>>> Best,
>>>> Paul Lam
>>>> 
>>>>> 2023年6月2日 18:02，Paul Lam <paullin3...@gmail.com 
>>>>> <mailto:paullin3...@gmail.com>> 写道：
>>>>> 
>>>>> Hi Jark,
>>>>> 
>>>>> Thanks a lot for your input!
>>>>> 
>>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it still
>>>>>> necessary to support SQL Driver?
>>>>> 
>>>>> I think so. Apart from usage in SQL Gateway, SQL Driver could simplify
>>>>> Flink SQL execution with Flink CLI.
>>>>> 
>>>>> As the FLIP said, it’s good to have a default main class for Flink
>> SQLs,
>>>>> which allows users to submit Flink SQLs in the same way as DataStream
>>>>> jobs, or else users need to write their own main class.
>>>>> 
>>>>>> SQL Driver needs to serialize SessionState which is very challenging
>>>>>> but not detailed covered in the FLIP.
>>>>> 
>>>>> With the help of ExecNodeGraph, do we still need the serialized
>>>>> SessionState? If not, we could make SQL Driver accepts two serialized
>>>>> formats:
>>>>> 
>>>>> - SQL files for user-facing public usage
>>>>> - ExecNodeGraph for internal usage
>>>>> 
>>>>> It’s kind of similar to the relationship between job jars and
>> jobgraphs.
>>>>> 
>>>>>> Regarding "K8S doesn't support shipping multiple jars", is that true?
>>>> Is it
>>>>>> possible to support it?
>>>>> 
>>>>> Yes, K8s doesn’t distribute any files. It’s the users’ responsibility
>> to
>>>> make
>>>>> sure the resources are accessible in the containers. The common
>> solutions
>>>>> I know is to use distributed file systems or use init containers to
>>>> localize the
>>>>> resources.
>>>>> 
>>>>> Now I lean toward introducing a fs to do the distribution job. WDYT?
>>>>> 
>>>>> Best,
>>>>> Paul Lam
>>>>> 
>>>>>> 2023年6月1日 20:33，Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com> 
>>>>>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>>
>> <mailto:imj...@gmail.com <mailto:imj...@gmail.com> <mailto:imj...@gmail.com 
>> <mailto:imj...@gmail.com>>>>
>>>> 写道：
>>>>>> 
>>>>>> Hi Paul,
>>>>>> 
>>>>>> Thanks for starting this discussion. I like the proposal! This is a
>>>>>> frequently requested feature!
>>>>>> 
>>>>>> I agree with Shengkai that ExecNodeGraph as the submission object is a
>>>>>> better idea than SQL file. To be more specific, it should be
>>>> JsonPlanGraph
>>>>>> or CompiledPlan which is the serializable representation. CompiledPlan
>>>> is a
>>>>>> clear separation between compiling/optimization/validation and
>>>> execution.
>>>>>> This can keep the validation and metadata accessing still on the
>>>> SQLGateway
>>>>>> side. This allows SQLGateway to leverage some metadata caching and UDF
>>>> JAR
>>>>>> caching for better compiling performance.
>>>>>> 
>>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it still
>>>>>> necessary to support SQL Driver? Regarding non-interactive SQL jobs,
>>>> users
>>>>>> can use the Table API program for application mode. SQL Driver needs
>> to
>>>>>> serialize SessionState which is very challenging but not detailed
>>>> covered
>>>>>> in the FLIP.
>>>>>> 
>>>>>> Regarding "K8S doesn't support shipping multiple jars", is that true?
>>>> Is it
>>>>>> possible to support it?
>>>>>> 
>>>>>> Best,
>>>>>> Jark
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, 1 Jun 2023 at 16:58, Paul Lam <paullin3...@gmail.com 
>>>>>> <mailto:paullin3...@gmail.com> <mailto:
>> paullin3...@gmail.com <mailto:paullin3...@gmail.com>> <mailto:
>>>> paullin3...@gmail.com <mailto:paullin3...@gmail.com> 
>>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> wrote:
>>>>>> 
>>>>>>> Hi Weihua,
>>>>>>> 
>>>>>>> You’re right. Distributing the SQLs to the TMs is one of the
>>>> challenging
>>>>>>> parts of this FLIP.
>>>>>>> 
>>>>>>> Web submission is not enabled in application mode currently as you
>>>> said,
>>>>>>> but it could be changed if we have good reasons.
>>>>>>> 
>>>>>>> What do you think about introducing a distributed storage for SQL
>>>> Gateway?
>>>>>>> 
>>>>>>> We could make use of Flink file systems [1] to distribute the SQL
>>>> Gateway
>>>>>>> generated resources, that should solve the problem at its root cause.
>>>>>>> 
>>>>>>> Users could specify Flink-supported file systems to ship files. It’s
>>>> only
>>>>>>> required when using SQL Gateway with K8s application mode.
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>>  
>> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/>
>> <
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>>  
>> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/>
>>> 
>>>> <
>>>> 
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>>  
>> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/>
>> <
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>>  
>> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/>
>>> 
>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Paul Lam
>>>>>>> 
>>>>>>>> 2023年6月1日 13:55，Weihua Hu <huweihua....@gmail.com 
>>>>>>>> <mailto:huweihua....@gmail.com> <mailto:
>> huweihua....@gmail.com <mailto:huweihua....@gmail.com>>> 写道：
>>>>>>>> 
>>>>>>>> Thanks Paul for your reply.
>>>>>>>> 
>>>>>>>> SQLDriver looks good to me.
>>>>>>>> 
>>>>>>>> 2. Do you mean a pass the SQL string a configuration or a program
>>>>>>> argument?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I brought this up because we were unable to pass the SQL file to
>> Flink
>>>>>>>> using Kubernetes mode.
>>>>>>>> For DataStream/Python users, they need to prepare their images for
>> the
>>>>>>> jars
>>>>>>>> and dependencies.
>>>>>>>> But for SQL users, they can use a common image to run different SQL
>>>>>>> queries
>>>>>>>> if there are no other udf requirements.
>>>>>>>> It would be great if the SQL query and image were not bound.
>>>>>>>> 
>>>>>>>> Using strings is a way to decouple these, but just as you mentioned,
>>>> it's
>>>>>>>> not easy to pass complex SQL.
>>>>>>>> 
>>>>>>>>> use web submission
>>>>>>>> AFAIK, we can not use web submission in the Application mode. Please
>>>>>>>> correct me if I'm wrong.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Weihua
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, May 31, 2023 at 9:37 PM Paul Lam <paullin3...@gmail.com 
>>>>>>>> <mailto:paullin3...@gmail.com>
>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Biao,
>>>>>>>>> 
>>>>>>>>> Thanks for your comments!
>>>>>>>>> 
>>>>>>>>>> 1. Scope: is this FLIP only targeted for non-interactive Flink SQL
>>>> jobs
>>>>>>>>> in
>>>>>>>>>> Application mode? More specifically, if we use SQL client/gateway
>> to
>>>>>>>>>> execute some interactive SQLs like a SELECT query, can we ask
>> flink
>>>> to
>>>>>>>>> use
>>>>>>>>>> Application mode to execute those queries after this FLIP?
>>>>>>>>> 
>>>>>>>>> Thanks for pointing it out. I think only DMLs would be executed via
>>>> SQL
>>>>>>>>> Driver.
>>>>>>>>> I'll add the scope to the FLIP.
>>>>>>>>> 
>>>>>>>>>> 2. Deployment: I believe in YARN mode, the implementation is
>>>> trivial as
>>>>>>>>> we
>>>>>>>>>> can ship files via YARN's tool easily but for K8s, things can be
>>>> more
>>>>>>>>>> complicated as Shengkai said.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Your input is very informative. I’m thinking about using web
>>>> submission,
>>>>>>>>> but it requires exposing the JobManager port which could also be a
>>>>>>> problem
>>>>>>>>> on K8s.
>>>>>>>>> 
>>>>>>>>> Another approach is to explicitly require a distributed storage to
>>>> ship
>>>>>>>>> files,
>>>>>>>>> but we may need a new deployment executor for that.
>>>>>>>>> 
>>>>>>>>> What do you think of these two approaches?
>>>>>>>>> 
>>>>>>>>>> 3. Serialization of SessionState: in SessionState, there are some
>>>>>>>>>> unserializable fields
>>>>>>>>>> like
>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader.
>>>>>>> It
>>>>>>>>>> may be worthwhile to add more details about the serialization
>> part.
>>>>>>>>> 
>>>>>>>>> I agree. That’s a missing part. But if we use ExecNodeGraph as
>>>> Shengkai
>>>>>>>>> mentioned, do we eliminate the need for serialization of
>>>> SessionState?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Paul Lam
>>>>>>>>> 
>>>>>>>>>> 2023年5月31日 13:07，Biao Geng <biaoge...@gmail.com 
>>>>>>>>>> <mailto:biaoge...@gmail.com> <mailto:
>> biaoge...@gmail.com <mailto:biaoge...@gmail.com>>> 写道：
>>>>>>>>>> 
>>>>>>>>>> Thanks Paul for the proposal!I believe it would be very useful for
>>>>>>> flink
>>>>>>>>>> users.
>>>>>>>>>> After reading the FLIP, I have some questions:
>>>>>>>>>> 1. Scope: is this FLIP only targeted for non-interactive Flink SQL
>>>> jobs
>>>>>>>>> in
>>>>>>>>>> Application mode? More specifically, if we use SQL client/gateway
>> to
>>>>>>>>>> execute some interactive SQLs like a SELECT query, can we ask
>> flink
>>>> to
>>>>>>>>> use
>>>>>>>>>> Application mode to execute those queries after this FLIP?
>>>>>>>>>> 2. Deployment: I believe in YARN mode, the implementation is
>>>> trivial as
>>>>>>>>> we
>>>>>>>>>> can ship files via YARN's tool easily but for K8s, things can be
>>>> more
>>>>>>>>>> complicated as Shengkai said. I have implemented a simple POC
>>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>>  
>> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133>
>> <
>> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>>  
>> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133>
>>> 
>>>>>>>>>> 
>>>>>>>>>> based on SQL client before(i.e. consider the SQL client which
>>>> supports
>>>>>>>>>> executing a SQL file as the SQL driver in this FLIP). One problem
>> I
>>>>>>> have
>>>>>>>>>> met is how do we ship SQL files ( or Job Graph) to the k8s side.
>>>>>>> Without
>>>>>>>>>> such support, users have to modify the initContainer or rebuild a
>>>> new
>>>>>>> K8s
>>>>>>>>>> image every time to fetch the SQL file. Like the flink k8s
>> operator,
>>>>>>> one
>>>>>>>>>> workaround is to utilize the flink config(transforming the SQL
>> file
>>>> to
>>>>>>> a
>>>>>>>>>> escaped string like Weihua mentioned) which will be converted to a
>>>>>>>>>> ConfigMap but K8s has size limit of ConfigMaps(no larger than 1MB
>>>>>>>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/ 
>>>>>>>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/> <
>> https://kubernetes.io/docs/concepts/configuration/configmap/ 
>> <https://kubernetes.io/docs/concepts/configuration/configmap/>>>).
>>>> Not
>>>>>>>>> sure
>>>>>>>>>> if we have better solutions.
>>>>>>>>>> 3. Serialization of SessionState: in SessionState, there are some
>>>>>>>>>> unserializable fields
>>>>>>>>>> like
>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader.
>>>>>>> It
>>>>>>>>>> may be worthwhile to add more details about the serialization
>> part.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Biao Geng
>>>>>>>>>> 
>>>>>>>>>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com> 
>>>>>>>>>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
>> 于2023年5月31日周三 11:49写道：
>>>>>>>>>> 
>>>>>>>>>>> Hi Weihua,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks a lot for your input! Please see my comments inline.
>>>>>>>>>>> 
>>>>>>>>>>>> - Is SQLRunner the better name? We use this to run a SQL Job.
>> (Not
>>>>>>>>>>> strong,
>>>>>>>>>>>> the SQLDriver is fine for me)
>>>>>>>>>>> 
>>>>>>>>>>> I’ve thought about SQL Runner but picked SQL Driver for the
>>>> following
>>>>>>>>>>> reasons FYI:
>>>>>>>>>>> 
>>>>>>>>>>> 1. I have a PythonDriver doing the same job for PyFlink [1]
>>>>>>>>>>> 2. Flink program's main class is sort of like Driver in JDBC
>> which
>>>>>>>>>>> translates SQLs into
>>>>>>>>>>> databases specific languages.
>>>>>>>>>>> 
>>>>>>>>>>> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
>>>>>>>>>>> 
>>>>>>>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise, we need
>>>> to
>>>>>>>>>>> prepare
>>>>>>>>>>>> a SQL file in an image for Kubernetes application mode, which
>> may
>>>> be
>>>>>>> a
>>>>>>>>>>> bit
>>>>>>>>>>>> cumbersome.
>>>>>>>>>>> 
>>>>>>>>>>> Do you mean a pass the SQL string a configuration or a program
>>>>>>> argument?
>>>>>>>>>>> 
>>>>>>>>>>> I thought it might be convenient for testing propose, but not
>>>>>>>>> recommended
>>>>>>>>>>> for production,
>>>>>>>>>>> cause Flink SQLs could be complicated and involves lots of
>>>> characters
>>>>>>>>> that
>>>>>>>>>>> need to escape.
>>>>>>>>>>> 
>>>>>>>>>>> WDYT?
>>>>>>>>>>> 
>>>>>>>>>>>> - I noticed that we don't specify the SQLDriver jar in the
>>>>>>>>>>> "run-application"
>>>>>>>>>>>> command. Does that mean we need to perform automatic detection
>> in
>>>>>>>>> Flink?
>>>>>>>>>>> 
>>>>>>>>>>> Yes! It’s like running a PyFlink job with the following command:
>>>>>>>>>>> 
>>>>>>>>>>> ```
>>>>>>>>>>> ./bin/flink run \
>>>>>>>>>>>  --pyModule table.word_count \
>>>>>>>>>>>  --pyFiles examples/python/table
>>>>>>>>>>> ```
>>>>>>>>>>> 
>>>>>>>>>>> The CLI determines if it’s a SQL job, if yes apply the SQL Driver
>>>>>>>>>>> automatically.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>>  
>> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java>
>> <
>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>>  
>> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java>
>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Paul Lam
>>>>>>>>>>> 
>>>>>>>>>>>> 2023年5月30日 21:56，Weihua Hu <huweihua....@gmail.com 
>>>>>>>>>>>> <mailto:huweihua....@gmail.com> <mailto:
>> huweihua....@gmail.com>> 写道：
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Paul for the proposal.
>>>>>>>>>>>> 
>>>>>>>>>>>> +1 for this. It is valuable in improving ease of use.
>>>>>>>>>>>> 
>>>>>>>>>>>> I have a few questions.
>>>>>>>>>>>> - Is SQLRunner the better name? We use this to run a SQL Job.
>> (Not
>>>>>>>>>>> strong,
>>>>>>>>>>>> the SQLDriver is fine for me)
>>>>>>>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise, we need
>>>> to
>>>>>>>>>>> prepare
>>>>>>>>>>>> a SQL file in an image for Kubernetes application mode, which
>> may
>>>> be
>>>>>>> a
>>>>>>>>>>> bit
>>>>>>>>>>>> cumbersome.
>>>>>>>>>>>> - I noticed that we don't specify the SQLDriver jar in the
>>>>>>>>>>> "run-application"
>>>>>>>>>>>> command. Does that mean we need to perform automatic detection
>> in
>>>>>>>>> Flink?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Weihua
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, May 29, 2023 at 7:24 PM Paul Lam <paullin3...@gmail.com
>> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi team,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I’d like to start a discussion about FLIP-316 [1], which
>>>> introduces
>>>>>>> a
>>>>>>>>>>> SQL
>>>>>>>>>>>>> driver as the
>>>>>>>>>>>>> default main class for Flink SQL jobs.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Currently, Flink SQL could be executed out of the box either
>> via
>>>> SQL
>>>>>>>>>>>>> Client/Gateway
>>>>>>>>>>>>> or embedded in a Flink Java/Python program.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> However, each one has its drawback:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - SQL Client/Gateway doesn’t support the application deployment
>>>> mode
>>>>>>>>> [2]
>>>>>>>>>>>>> - Flink Java/Python program requires extra work to write a
>>>> non-SQL
>>>>>>>>>>> program
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Therefore, I propose adding a SQL driver to act as the default
>>>> main
>>>>>>>>>>> class
>>>>>>>>>>>>> for SQL jobs.
>>>>>>>>>>>>> Please see the FLIP docs for details and feel free to comment.
>>>>>>> Thanks!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
>>>>>>>>>>>>> <
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-26541>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Paul Lam

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Reply via email to