Re: [DISCUSS] Persistent SQL Gateway

Ferenc Csaky Thu, 29 Jun 2023 05:08:09 -0700

Hi Shammon,

Thank you for your answer and explanation, my latest experiment was a SELECT 
query and my assumptions were based on that, INSERT works as described.


Regarding the state of FLIP-295, I just checked out the recently created jiras 
[1] and if I can help out with any part, please let me know.

Cheers,
F

[1] https://issues.apache.org/jira/browse/FLINK-32427


------- Original Message -------
On Tuesday, June 27th, 2023 at 13:39, Shammon FY <[email protected]> wrote:


> 
> 
> Hi Ferenc,
> 
> If I understand correctly, there will be two types of jobs in sql-gateway:
> `SELECT` and `NON-SELECT` such as `INSERT`.
> 
> 1. `SELECT` jobs need to collect results from Flink cluster in a
> corresponding session of sql gateway, and when the session is closed, the
> job should be canceled. These jobs are generally short queries similar to
> OLAP and I think it may be acceptable.
> 
> 2. `NON-SELECT` jobs may be batch or streaming jobs, and when the jobs are
> submitted successfully, they won't be killed or canceled even if the
> session or sql-gateway is closed. After these assignments are successfully
> submitted, the lifecycle is no longer managed by SQL gateway.
> 
> I don't know if it covers your usage scenario. Could you describe yours for
> us to test and confirm?
> 
> Best,
> Shammon FY
> 
> 
> On Tue, Jun 27, 2023 at 6:43 PM Ferenc Csaky [email protected]
> 
> wrote:
> 
> > Hi Jark,
> > 
> > In the current implementation, any job submitted via the SQL Gateway has
> > to be done through a session, cause all the operations are grouped under
> > sessions.
> > 
> > Starting from there, if I close a session, that will close the
> > "SessionContext", which closes the "OperationManager" [1], and the
> > "OperationManager" closes all submitted operations tied to that session
> > [2], which results closing all the jobs executed in the session.
> > 
> > Maybe I am missing something, but my experience is that the jobs I submit
> > via the SQL Gateway are getting cleaned up on gateway session close.
> > 
> > WDYT?
> > 
> > Cheers,
> > F
> > 
> > [1]
> > https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/context/SessionContext.java#L204
> > [2]
> > https://github.com/apache/flink/blob/149a5e34c1ed8d8943c901a98c65c70693915811/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/operation/OperationManager.java#L194
> > 
> > ------- Original Message -------
> > On Tuesday, June 27th, 2023 at 04:37, Jark Wu [email protected] wrote:
> > 
> > > Hi Ferenc,
> > > 
> > > But the job lifecycle doesn't tie to the SQL Gateway session.
> > > Even if the session is closed, all the running jobs are not affected.
> > > 
> > > Best,
> > > Jark
> > > 
> > > On Tue, 27 Jun 2023 at 04:14, Ferenc Csaky [email protected]
> > > 
> > > wrote:
> > > 
> > > > Hi Jark,
> > > > 
> > > > Thank you for pointing out FLIP-295 abouth catalog persistence, I was
> > > > not
> > > > aware the current state. Although as far as I see, that persistent
> > > > catalogs
> > > > are necessary, but not sufficient achieving a "persistent gateway".
> > > > 
> > > > The current implementation ties the job lifecycle to the SQL gateway
> > > > session, so if it gets closed, it will cancel all the jobs. So that
> > > > would
> > > > be the next step I think. Any work or thought regarding this aspect?
> > > > We are
> > > > definitely willing to help out on this front.
> > > > 
> > > > Cheers,
> > > > F
> > > > 
> > > > ------- Original Message -------
> > > > On Sunday, June 25th, 2023 at 06:23, Jark Wu [email protected] wrote:
> > > > 
> > > > > Hi Ferenc,
> > > > > 
> > > > > Making SQL Gateway to be an easy-to-use platform infrastructure of
> > > > > Flink
> > > > > SQL
> > > > > is one of the important roadmaps 1.
> > > > > 
> > > > > The persistence ability of the SQL Gateway is a major work in 1.18
> > > > > release.
> > > > > One of the persistence demand is that the registered catalogs are
> > > > > currently
> > > > > kept in memory and lost when Gateway restarts. There is an accepted
> > > > > FLIP
> > > > > (FLIP-295)[2] target to resolve this issue and make Gateway can
> > > > > persist
> > > > > the
> > > > > registered catalogs information into files or databases.
> > > > > 
> > > > > I'm not sure whether this is something you are looking for?
> > > > > 
> > > > > Best,
> > > > > Jark
> > > > > 
> > > > > [2]:
> > 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-295%3A+Support+lazy+initialization+of+catalogs+and+persistence+of+catalog+configurations
> > 
> > > > > On Fri, 23 Jun 2023 at 00:25, Ferenc Csaky [email protected]
> > > > > 
> > > > > wrote:
> > > > > 
> > > > > > Hello devs,
> > > > > > 
> > > > > > I would like to open a discussion about persistence possibilitis
> > > > > > for
> > > > > > the
> > > > > > SQL Gateway. At Cloudera, we are happy to see the work already
> > > > > > done on
> > > > > > this
> > > > > > project and looking for ways to utilize it on our platform as
> > > > > > well, but
> > > > > > currently it lacks some features that would be essential in our
> > > > > > case,
> > > > > > where
> > > > > > we could help out.
> > > > > > 
> > > > > > I am not sure if any thought went into gateway persistence
> > > > > > specifics
> > > > > > already, and this feature could be implemented in fundamentally
> > > > > > differnt
> > > > > > ways, so I think the frist step could be to agree on the basics.
> > > > > > 
> > > > > > First, in my opinion, persistence should be an optional feature of
> > > > > > the
> > > > > > gateway, that can be enabled if desired. There can be a lot of
> > > > > > implementation details, but there can be some major directions to
> > > > > > follow:
> > > > > > 
> > > > > > - Utilize Hive catalog: The Hive catalog can already be used to
> > > > > > have
> > > > > > persistenct meta-objects, so the crucial thing that would be
> > > > > > missing in
> > > > > > this case is other catalogs. Personally, I would not pursue this
> > > > > > option,
> > > > > > because in my opinion it would limit the usability of this feature
> > > > > > too
> > > > > > much.
> > > > > > - Serialize the session as is: Saving the whole session (or its
> > > > > > context)
> > > > > > 1 as is to durable storage, so it can be kept and picked up again.
> > > > > > - Serialize the required elements (catalogs, tables, functions,
> > > > > > etc.),
> > > > > > not
> > > > > > necessarily as a whole: The main point here would be to serialize a
> > > > > > different object, so the persistent data will not be that
> > > > > > sensitive to
> > > > > > changes of the session (or its context). There can be numerous
> > > > > > factors
> > > > > > here, like try to keep the model close to the session itself, so
> > > > > > the
> > > > > > boilerplate required for the mapping can be kept to minimal, or
> > > > > > focus
> > > > > > on
> > > > > > saving what is actually necessary, making the persistent storage
> > > > > > more
> > > > > > portable.
> > > > > > 
> > > > > > WDYT?
> > > > > > 
> > > > > > Cheers,
> > > > > > F
> > > > > > 
> > > > > > 1
> > 
> > https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/service/session/Session.java

Re: [DISCUSS] Persistent SQL Gateway

Reply via email to