Thanks Paul for your reply. SQLDriver looks good to me.
2. Do you mean a pass the SQL string a configuration or a program argument? I brought this up because we were unable to pass the SQL file to Flink using Kubernetes mode. For DataStream/Python users, they need to prepare their images for the jars and dependencies. But for SQL users, they can use a common image to run different SQL queries if there are no other udf requirements. It would be great if the SQL query and image were not bound. Using strings is a way to decouple these, but just as you mentioned, it's not easy to pass complex SQL. > use web submission AFAIK, we can not use web submission in the Application mode. Please correct me if I'm wrong. Best, Weihua On Wed, May 31, 2023 at 9:37 PM Paul Lam <paullin3...@gmail.com> wrote: > Hi Biao, > > Thanks for your comments! > > > 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs > in > > Application mode? More specifically, if we use SQL client/gateway to > > execute some interactive SQLs like a SELECT query, can we ask flink to > use > > Application mode to execute those queries after this FLIP? > > Thanks for pointing it out. I think only DMLs would be executed via SQL > Driver. > I'll add the scope to the FLIP. > > > 2. Deployment: I believe in YARN mode, the implementation is trivial as > we > > can ship files via YARN's tool easily but for K8s, things can be more > > complicated as Shengkai said. > > > Your input is very informative. I’m thinking about using web submission, > but it requires exposing the JobManager port which could also be a problem > on K8s. > > Another approach is to explicitly require a distributed storage to ship > files, > but we may need a new deployment executor for that. > > What do you think of these two approaches? > > > 3. Serialization of SessionState: in SessionState, there are some > > unserializable fields > > like org.apache.flink.table.resource.ResourceManager#userClassLoader. It > > may be worthwhile to add more details about the serialization part. > > I agree. That’s a missing part. But if we use ExecNodeGraph as Shengkai > mentioned, do we eliminate the need for serialization of SessionState? > > Best, > Paul Lam > > > 2023年5月31日 13:07,Biao Geng <biaoge...@gmail.com> 写道: > > > > Thanks Paul for the proposal!I believe it would be very useful for flink > > users. > > After reading the FLIP, I have some questions: > > 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs > in > > Application mode? More specifically, if we use SQL client/gateway to > > execute some interactive SQLs like a SELECT query, can we ask flink to > use > > Application mode to execute those queries after this FLIP? > > 2. Deployment: I believe in YARN mode, the implementation is trivial as > we > > can ship files via YARN's tool easily but for K8s, things can be more > > complicated as Shengkai said. I have implemented a simple POC > > < > https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 > > > > based on SQL client before(i.e. consider the SQL client which supports > > executing a SQL file as the SQL driver in this FLIP). One problem I have > > met is how do we ship SQL files ( or Job Graph) to the k8s side. Without > > such support, users have to modify the initContainer or rebuild a new K8s > > image every time to fetch the SQL file. Like the flink k8s operator, one > > workaround is to utilize the flink config(transforming the SQL file to a > > escaped string like Weihua mentioned) which will be converted to a > > ConfigMap but K8s has size limit of ConfigMaps(no larger than 1MB > > <https://kubernetes.io/docs/concepts/configuration/configmap/>). Not > sure > > if we have better solutions. > > 3. Serialization of SessionState: in SessionState, there are some > > unserializable fields > > like org.apache.flink.table.resource.ResourceManager#userClassLoader. It > > may be worthwhile to add more details about the serialization part. > > > > Best, > > Biao Geng > > > > Paul Lam <paullin3...@gmail.com> 于2023年5月31日周三 11:49写道: > > > >> Hi Weihua, > >> > >> Thanks a lot for your input! Please see my comments inline. > >> > >>> - Is SQLRunner the better name? We use this to run a SQL Job. (Not > >> strong, > >>> the SQLDriver is fine for me) > >> > >> I’ve thought about SQL Runner but picked SQL Driver for the following > >> reasons FYI: > >> > >> 1. I have a PythonDriver doing the same job for PyFlink [1] > >> 2. Flink program's main class is sort of like Driver in JDBC which > >> translates SQLs into > >> databases specific languages. > >> > >> In general, I’m +1 for SQL Driver and +0 for SQL Runner. > >> > >>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to > >> prepare > >>> a SQL file in an image for Kubernetes application mode, which may be a > >> bit > >>> cumbersome. > >> > >> Do you mean a pass the SQL string a configuration or a program argument? > >> > >> I thought it might be convenient for testing propose, but not > recommended > >> for production, > >> cause Flink SQLs could be complicated and involves lots of characters > that > >> need to escape. > >> > >> WDYT? > >> > >>> - I noticed that we don't specify the SQLDriver jar in the > >> "run-application" > >>> command. Does that mean we need to perform automatic detection in > Flink? > >> > >> Yes! It’s like running a PyFlink job with the following command: > >> > >> ``` > >> ./bin/flink run \ > >> --pyModule table.word_count \ > >> --pyFiles examples/python/table > >> ``` > >> > >> The CLI determines if it’s a SQL job, if yes apply the SQL Driver > >> automatically. > >> > >> > >> [1] > >> > https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java > >> > >> Best, > >> Paul Lam > >> > >>> 2023年5月30日 21:56,Weihua Hu <huweihua....@gmail.com> 写道: > >>> > >>> Thanks Paul for the proposal. > >>> > >>> +1 for this. It is valuable in improving ease of use. > >>> > >>> I have a few questions. > >>> - Is SQLRunner the better name? We use this to run a SQL Job. (Not > >> strong, > >>> the SQLDriver is fine for me) > >>> - Could we run SQL jobs using SQL in strings? Otherwise, we need to > >> prepare > >>> a SQL file in an image for Kubernetes application mode, which may be a > >> bit > >>> cumbersome. > >>> - I noticed that we don't specify the SQLDriver jar in the > >> "run-application" > >>> command. Does that mean we need to perform automatic detection in > Flink? > >>> > >>> > >>> Best, > >>> Weihua > >>> > >>> > >>> On Mon, May 29, 2023 at 7:24 PM Paul Lam <paullin3...@gmail.com> > wrote: > >>> > >>>> Hi team, > >>>> > >>>> I’d like to start a discussion about FLIP-316 [1], which introduces a > >> SQL > >>>> driver as the > >>>> default main class for Flink SQL jobs. > >>>> > >>>> Currently, Flink SQL could be executed out of the box either via SQL > >>>> Client/Gateway > >>>> or embedded in a Flink Java/Python program. > >>>> > >>>> However, each one has its drawback: > >>>> > >>>> - SQL Client/Gateway doesn’t support the application deployment mode > [2] > >>>> - Flink Java/Python program requires extra work to write a non-SQL > >> program > >>>> > >>>> Therefore, I propose adding a SQL driver to act as the default main > >> class > >>>> for SQL jobs. > >>>> Please see the FLIP docs for details and feel free to comment. Thanks! > >>>> > >>>> [1] > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver > >>>> < > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver > >>>>> > >>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 < > >>>> https://issues.apache.org/jira/browse/FLINK-26541> > >>>> > >>>> Best, > >>>> Paul Lam > >> > >> > >