Re: Re: Re: [DISCUSS] FLIP-480: Support to deploy script in application mode

Gyula Fóra Mon, 11 Nov 2024 01:25:44 -0800

Hey!

My point regarding the credentials wasn't about lazy/non-lazy catalog
initialization. The problem is that JM may run in an env where you don't
necessarily want to have the same credentials at all.


> I agree we need 2 different runners or designs here. But I don't think we
> should only support json plan in the sql-gateway because json plan is much
> limited comparing to than script even though we supports to ship the
> temporary objects definition:

Seems like there is a lot of overlap / conflicting design here with
FLIP-316. It would be great to have non-overlapping scopes for these Flips
because it's going to be strange to implement FLIP-480 first and then a
completely separate logic in FLIP-316 for the same thing.

I think this FLIP FLIP-480 should not cover the SQL gateway and probably
should not change it in any way. The gateway related changes should be
consolidated in FLIP-316 and implemented as part of that for Flink 2.0.

Cheers,
Gyula



On Mon, Nov 11, 2024 at 4:26 AM Shengkai Fang <fskm...@gmail.com> wrote:

> Hi, Gyula.
>
> Thanks a lot for your response.
>
> > In a production environment the gateway would hold the credentials to the
> different catalogs and may even contain temporary tables etc.
>
> In the FLIP-295[1], SQL Gateway has already supported using catalog stores
> to initialize the catalog lazily. I think we can fetch the required
> credentials in need rather than put all credentials into the JobManager.
>
> I agree we need 2 different runners or designs here. But I don't think we
> should only support json plan in the sql-gateway because json plan is much
> limited comparing to than script even though we supports to ship the
> temporary objects definition:
>
> 1. json plan can only contain one job but script can submit multiple jobs.
> Running multiple jobs in a cluster is useful in batch mode because users
> usually run a simple query as broadcast variables to speed up the later
> job. You can refer to the spark document for more details[2].
> In Flink, users may run ANALYZE TABLE[3] to calculate the statistics and
> then use the collected statistics to get a better execution plan for the
> later job.
>
> 2. json plan does not work for some DMLs. For example, CREATE TABLE AS
> SELECT syntax doesn't have a valid json plan because planner doesn't
> support getting a json plan if the target table is not present.
>
> Many frameworks support compiling scripts in the cluster, e.g.
> Spark(Driver) or Trino(Coordinator). I don't think Flink JM is limited to
> managing the job only.
>
> Finally, FLIP-480 does not conflict with FLIP-316 and we can support both
> in the sql-gateway. In our inner implementation, we use script to
> initialize the table environment(register required temporary objects) and
> then use the json plan to create the job. I think we can continue our
> discussion about the compiled plan in FLIP-316.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> [2]
>
> https://medium.com/@ARishi/broadcast-variables-in-spark-work-like-they-sound-c08097b80ac4
> [3]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/analyze/
>
> Gyula Fóra <gyula.f...@gmail.com> 于2024年11月8日周五 18:11写道：
>
> > Hey!
> >
> > I am a bit concerned about the design for gateway based submission model.
> >
> > I think it's not a good model to access catalog information on the
> > JobManager side. In a production environment the gateway would hold the
> > credentials to the different catalogs and may even contain temporary
> tables
> > etc.
> > The JobManager (user environment) would then need to get access to all
> the
> > catalog credentials and what not, we would also need to ship temporary
> > tables from the users current gateway session somehow as part of the
> > script.
> >
> > I think we should leverage the table environment compilePlan feature to
> > generate the plan on the gateway side and execute that on the JM, then
> > catalogs wouldn't need to be accessed from the job.
> >
> > In other words, I think we need 2 different runners/designs. The direct
> app
> > submission should be able to run the script as is, but in the gateway I
> > think we need to have a runner based on the compiled sql plan instead.
> >
> > Cheers,
> > Gyula
> >
> >
> > On Thu, Nov 7, 2024 at 2:49 AM Shengkai Fang <fskm...@gmail.com> wrote:
> >
> > > Hi, eveyone.
> > >
> > > The discussion doesn't receive any response for a while. I will close
> the
> > > discussion and start the vote tomorrow if the discussion doesn't
> receive
> > > any response today. Thanks all yours response!
> > >
> > > Best,
> > > Shengkai
> > >
> > > Shengkai Fang <fskm...@gmail.com> 于2024年11月5日周二 14:00写道：
> > >
> > > > Hi, Lincoln. Thanks for your response.
> > > >
> > > > > since both scriptPath and script(statements) can be null,  we need
> to
> > > > clarify the behavior when both are empty, such as throwing an error
> > > >
> > > > Yes, you are correct. I have updated the FLIP about this. When these
> > > > fields are both empty, the server throws an exception to notify
> users.
> > > >
> > > > > use a unified name for script vs statements, like 'script'?
> > > >
> > > > Updated.
> > > >
> > > > > Regarding Python UDFs, should we change it to a description of
> > > > "Additional Python resources," corresponding to "Additional Jar
> > > Resources"
> > > >
> > > > Updated.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > >
> > > >
> > > > Lincoln Lee <lincoln.8...@gmail.com> 于2024年11月5日周二 11:17写道：
> > > >
> > > >> Thanks Shengkai for driving this! Overall, looks good!\
> > > >>
> > > >> I have two minor questions:
> > > >> 1. Regarding the interface parameters (including REST API
> > > >> & Java interfaces), since both scriptPath and script(statements)
> > > >> can be null, we need to clarify the behavior when both are
> > > >> empty, such as throwing an error?
> > > >> Also use a unified name for script vs statements, like 'script'?
> > > >>
> > > >> 2. Regarding Python UDFs, should we change it to a
> > > >> description of "Additional Python resources," corresponding
> > > >> to "Additional Jar Resources"?
> > > >>
> > > >>
> > > >> Best,
> > > >> Lincoln Lee
> > > >>
> > > >>
> > > >> Shengkai Fang <fskm...@gmail.com> 于2024年11月5日周二 10:16写道：
> > > >>
> > > >> > Hi, Ferenc.
> > > >> >
> > > >> > Thanks for your clarification. We can hard code these different
> > > options
> > > >> in
> > > >> > the sql-gateway module. I have updated the FLIP and PoC branch
> about
> > > >> this
> > > >> > part. But I think we should provide a unified API to ship
> artifacts
> > to
> > > >> > different deployment.
> > > >> >
> > > >> > Best,
> > > >> > Shengkai
> > > >> >
> > > >> >
> > > >> >
> > > >> > Ferenc Csaky <ferenc.cs...@pm.me.invalid> 于2024年11月4日周一 21:05写道：
> > > >> >
> > > >> > > Hi Shengkai,
> > > >> > >
> > > >> > > Thank you for driving this FLIP! I think this is a good way to
> > > >> > > close this gap on the short-term until FLIP-316 can be finished.
> > > >> > >
> > > >> > > I would only like to add one thing: YARN has a `yarn.ship-files`
> > > >> > > config option that ships local or DFS files/directories to the
> > > >> > > YARN cluster [1].
> > > >> > >
> > > >> > > Best,
> > > >> > > Ferenc
> > > >> > >
> > > >> > > [1]
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/config/
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Monday, November 4th, 2024 at 10:11, Xuyang <
> > xyzhong...@163.com>
> > > >> > wrote:
> > > >> > >
> > > >> > > >
> > > >> > > >
> > > >> > > > Hi, Shegnkai.
> > > >> > > >
> > > >> > > > Thank you for your answer. I have no further questions.
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > >
> > > >> > > > Best！
> > > >> > > > Xuyang
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > At 2024-11-04 10:00:32, "Shengkai Fang" fskm...@gmail.com
> > wrote:
> > > >> > > >
> > > >> > > > > Hi, Xuyang. Thanks a lot for your response!
> > > >> > > > >
> > > >> > > > > > Does that means we will support multi DMLs, multi DQLs,
> > mixed
> > > >> DMLs
> > > >> > &
> > > >> > > DQLs
> > > >> > > > > > in one sql script?
> > > >> > > > >
> > > >> > > > > According to the doc[1], application mode only supports one
> > job
> > > >> in ha
> > > >> > > > > mode[2]. If users submit multiple jobs, dispatcher throws a
> > > >> > > > > DuplicateJobSubmissionException to notify users.
> > > >> > > > >
> > > >> > > > > In non-ha mode, the application mode doesn't have job number
> > > >> > > limitation.
> > > >> > > > > The SQL driver runs statement one by one and it is similar
> to
> > > >> > > submitting
> > > >> > > > > job to a session cluster. But just as the doc says, when any
> > of
> > > >> > > multiple
> > > >> > > > > running jobs in Application Mode (submitted for example
> using
> > > >> > > > > executeAsync()) gets cancelled, all jobs will be stopped and
> > the
> > > >> > > JobManager
> > > >> > > > > will shut down.
> > > >> > > > >
> > > >> > > > > [1]
> > > >> > > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#application-mode
> > > >> > > > > [2]
> > > >> > > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java#L218
> > > >> > > > >
> > > >> > > > > Best,
> > > >> > > > > Shengkai
> > > >> > > > >
> > > >> > > > > Xuyang xyzhong...@163.com 于2024年10月31日周四 17:10写道：
> > > >> > > > >
> > > >> > > > > > Hi, Shengkai.
> > > >> > > > > >
> > > >> > > > > > Thanks for driving this great work. LGTM overall, I just
> > have
> > > >> one
> > > >> > > > > > question.
> > > >> > > > > >
> > > >> > > > > > IIUC, application mode supports to run multi-execute in a
> > > single
> > > >> > > `main`
> > > >> > > > > > function[1]. Does that means
> > > >> > > > > >
> > > >> > > > > > we will support multi DMLs, multi DQLs, mixed DMLs & DQLs
> in
> > > one
> > > >> > sql
> > > >> > > > > > script? If yes, can you explain
> > > >> > > > > >
> > > >> > > > > > a little about how do they work?
> > > >> > > > > >
> > > >> > > > > > [1]
> > > >> > > > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#application-mode
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > >
> > > >> > > > > > Best！
> > > >> > > > > > Xuyang
> > > >> > > > > >
> > > >> > > > > > 在 2024-10-31 10:18:13，"Ron Liu" ron9....@gmail.com 写道：
> > > >> > > > > >
> > > >> > > > > > > Hi, Shengkai
> > > >> > > > > > >
> > > >> > > > > > > Thanks for your quick response. It looks good to me.
> > > >> > > > > > >
> > > >> > > > > > > Best
> > > >> > > > > > > Ron
> > > >> > > > > > >
> > > >> > > > > > > Shengkai Fang fskm...@gmail.com 于2024年10月31日周四 10:08写道：
> > > >> > > > > > >
> > > >> > > > > > > > Hi, Ron!
> > > >> > > > > > > >
> > > >> > > > > > > > > I noticed that you say this FLIP focuses on
> supporting
> > > >> deploy
> > > >> > > sql
> > > >> > > > > > > > > scripts to the application cluster, does it mean
> that
> > it
> > > >> only
> > > >> > > supports
> > > >> > > > > > > > > non-interactive gateway mode?
> > > >> > > > > > > >
> > > >> > > > > > > > Yes. This FLIP only supports to deploy a script in
> > > >> > > non-interactive mode.
> > > >> > > > > > > >
> > > >> > > > > > > > > Whether all SQL commands such as DDL & DML & SELECT
> > are
> > > >> > > supported.
> > > >> > > > > > > >
> > > >> > > > > > > > We supports all SQL commands and the execution results
> > are
> > > >> > > visible in
> > > >> > > > > > > > the
> > > >> > > > > > > > JM log. But application cluster has some limitations
> > that
> > > >> only
> > > >> > > one job
> > > >> > > > > > > > is
> > > >> > > > > > > > allowed to run in the dedicated cluster.
> > > >> > > > > > > >
> > > >> > > > > > > > > How to dynamically download the JAR specified by the
> > > user
> > > >> > when
> > > >> > > > > > > > > submitting the sql script, and whether it is
> possible
> > to
> > > >> > > specify a local
> > > >> > > > > > > > > jar?
> > > >> > > > > > > >
> > > >> > > > > > > > This is a good question. I think it's totally up to
> the
> > > >> > > deployment api.
> > > >> > > > > > > > For
> > > >> > > > > > > > example, kubernetes deployment provides the option
> > > >> > > > > > > > `kubernetes-artifacts-local-upload-enabled`[1] to
> upload
> > > the
> > > >> > > artifact to
> > > >> > > > > > > > the DFS but yarn deployment doesn't support to ship
> the
> > > >> > > artifacts to
> > > >> > > > > > > > DFS in
> > > >> > > > > > > > application mode. If runtime API can provide unified
> > > >> interface,
> > > >> > > I think
> > > >> > > > > > > > we
> > > >> > > > > > > > can use the unified API to upload local artifacts.
> > > >> > > Alternatively, we can
> > > >> > > > > > > > provide a special service that allows sql-gateway to
> > > support
> > > >> > > pulling
> > > >> > > > > > > > jar.
> > > >> > > > > > > > You can read the future work for more details.
> > > >> > > > > > > >
> > > >> > > > > > > > [1]
> > > >> > > > > >
> > > >> > > > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-artifacts-local-upload-enabled
> > > >> > > > > >
> > > >> > > > > > > > Shengkai Fang fskm...@gmail.com 于2024年10月31日周四
> 09:30写道：
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi, Feng!
> > > >> > > > > > > > >
> > > >> > > > > > > > > > if only clusterID is available, it may not be very
> > > >> > > convenient to
> > > >> > > > > > > > > > connect
> > > >> > > > > > > > > > to this application later on.
> > > >> > > > > > > > >
> > > >> > > > > > > > > If FLIP-479 is accepted, I think we can just adapt
> the
> > > >> > > sql-gateway
> > > >> > > > > > > > > behaviour to the behaviour that FLIP-479 mentioned.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Best,
> > > >> > > > > > > > > Shengkai
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Re: Re: [DISCUSS] FLIP-480: Support to deploy script in application mode

Reply via email to