Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Paul Lam Wed, 31 May 2023 06:37:28 -0700

Hi Biao,

Thanks for your comments!


> 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in
> Application mode? More specifically, if we use SQL client/gateway to
> execute some interactive SQLs like a SELECT query, can we ask flink to use
> Application mode to execute those queries after this FLIP?

Thanks for pointing it out. I think only DMLs would be executed via SQL Driver. 
I'll add the scope to the FLIP.

> 2. Deployment: I believe in YARN mode, the implementation is trivial as we
> can ship files via YARN's tool easily but for K8s, things can be more
> complicated as Shengkai said.


Your input is very informative. I’m thinking about using web submission,
but it requires exposing the JobManager port which could also be a problem
on K8s.

Another approach is to explicitly require a distributed storage to ship files,
but we may need a new deployment executor for that.

What do you think of these two approaches?

> 3. Serialization of SessionState: in SessionState, there are some
> unserializable fields
> like org.apache.flink.table.resource.ResourceManager#userClassLoader. It
> may be worthwhile to add more details about the serialization part.

I agree. That’s a missing part. But if we use ExecNodeGraph as Shengkai
mentioned, do we eliminate the need for serialization of SessionState?

Best,
Paul Lam

> 2023年5月31日 13:07，Biao Geng <biaoge...@gmail.com> 写道：
> 
> Thanks Paul for the proposal!I believe it would be very useful for flink
> users.
> After reading the FLIP, I have some questions:
> 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs in
> Application mode? More specifically, if we use SQL client/gateway to
> execute some interactive SQLs like a SELECT query, can we ask flink to use
> Application mode to execute those queries after this FLIP?
> 2. Deployment: I believe in YARN mode, the implementation is trivial as we
> can ship files via YARN's tool easily but for K8s, things can be more
> complicated as Shengkai said. I have implemented a simple POC
> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133>
> based on SQL client before(i.e. consider the SQL client which supports
> executing a SQL file as the SQL driver in this FLIP). One problem I have
> met is how do we ship SQL files ( or Job Graph) to the k8s side. Without
> such support, users have to modify the initContainer or rebuild a new K8s
> image every time to fetch the SQL file. Like the flink k8s operator, one
> workaround is to utilize the flink config(transforming the SQL file to a
> escaped string like Weihua mentioned) which will be converted to a
> ConfigMap but K8s has size limit of ConfigMaps(no larger than 1MB
> <https://kubernetes.io/docs/concepts/configuration/configmap/>). Not sure
> if we have better solutions.
> 3. Serialization of SessionState: in SessionState, there are some
> unserializable fields
> like org.apache.flink.table.resource.ResourceManager#userClassLoader. It
> may be worthwhile to add more details about the serialization part.
> 
> Best,
> Biao Geng
> 
> Paul Lam <paullin3...@gmail.com> 于2023年5月31日周三 11:49写道：
> 
>> Hi Weihua,
>> 
>> Thanks a lot for your input! Please see my comments inline.
>> 
>>> - Is SQLRunner the better name? We use this to run a SQL Job. (Not
>> strong,
>>> the SQLDriver is fine for me)
>> 
>> I’ve thought about SQL Runner but picked SQL Driver for the following
>> reasons FYI:
>> 
>> 1. I have a PythonDriver doing the same job for PyFlink [1]
>> 2. Flink program's main class is sort of like Driver in JDBC which
>> translates SQLs into
>>    databases specific languages.
>> 
>> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
>> 
>>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to
>> prepare
>>> a SQL file in an image for Kubernetes application mode, which may be a
>> bit
>>> cumbersome.
>> 
>> Do you mean a pass the SQL string a configuration or a program argument?
>> 
>> I thought it might be convenient for testing propose, but not recommended
>> for production,
>> cause Flink SQLs could be complicated and involves lots of characters that
>> need to escape.
>> 
>> WDYT?
>> 
>>> - I noticed that we don't specify the SQLDriver jar in the
>> "run-application"
>>> command. Does that mean we need to perform automatic detection in Flink?
>> 
>> Yes! It’s like running a PyFlink job with the following command:
>> 
>> ```
>> ./bin/flink run \
>>      --pyModule table.word_count \
>>      --pyFiles examples/python/table
>> ```
>> 
>> The CLI determines if it’s a SQL job, if yes apply the SQL Driver
>> automatically.
>> 
>> 
>> [1]
>> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>> 
>> Best,
>> Paul Lam
>> 
>>> 2023年5月30日 21:56，Weihua Hu <huweihua....@gmail.com> 写道：
>>> 
>>> Thanks Paul for the proposal.
>>> 
>>> +1 for this. It is valuable in improving ease of use.
>>> 
>>> I have a few questions.
>>> - Is SQLRunner the better name? We use this to run a SQL Job. (Not
>> strong,
>>> the SQLDriver is fine for me)
>>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to
>> prepare
>>> a SQL file in an image for Kubernetes application mode, which may be a
>> bit
>>> cumbersome.
>>> - I noticed that we don't specify the SQLDriver jar in the
>> "run-application"
>>> command. Does that mean we need to perform automatic detection in Flink?
>>> 
>>> 
>>> Best,
>>> Weihua
>>> 
>>> 
>>> On Mon, May 29, 2023 at 7:24 PM Paul Lam <paullin3...@gmail.com> wrote:
>>> 
>>>> Hi team,
>>>> 
>>>> I’d like to start a discussion about FLIP-316 [1], which introduces a
>> SQL
>>>> driver as the
>>>> default main class for Flink SQL jobs.
>>>> 
>>>> Currently, Flink SQL could be executed out of the box either via SQL
>>>> Client/Gateway
>>>> or embedded in a Flink Java/Python program.
>>>> 
>>>> However, each one has its drawback:
>>>> 
>>>> - SQL Client/Gateway doesn’t support the application deployment mode [2]
>>>> - Flink Java/Python program requires extra work to write a non-SQL
>> program
>>>> 
>>>> Therefore, I propose adding a SQL driver to act as the default main
>> class
>>>> for SQL jobs.
>>>> Please see the FLIP docs for details and feel free to comment. Thanks!
>>>> 
>>>> [1]
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
>>>>> 
>>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
>>>> https://issues.apache.org/jira/browse/FLINK-26541>
>>>> 
>>>> Best,
>>>> Paul Lam
>> 
>>

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Reply via email to