Hi Ferenc

Could you please help to create a FLIP page if everything looks fine from
the above discussion/google doc. Thanks in advance.

Regards
Lajith K

On Thu, Nov 7, 2024 at 12:02 PM Lajith Koova <lajith...@gmail.com> wrote:

> Thank you so much, Ferenc ,  and regarding your observation , FLIP covers
> the changes in FlinkDeployment CR which defines Flink Application and
> Session cluster deployments, hence it is referred to as FlinkDeployment.
> Thank you
>
> Regards
> Lajith
>
> On Thu, Nov 7, 2024 at 12:46 AM Ferenc Csaky <ferenc.cs...@pm.me.invalid>
> wrote:
>
>> Hi,
>>
>> I can help to create a FLIP page, from the gdoc, but one thing
>> that I noteced is under "Session mode" both the text and the code
>> snippets refer to "FlinkDeployment". I believe that should be
>> "FlinkSessionJob".
>>
>> Best,
>> Ferenc
>>
>>
>>
>> On Wednesday, November 6th, 2024 at 17:33, David Radley <
>> david_rad...@uk.ibm.com> wrote:
>>
>> >
>> >
>> > Hi lajith,
>> > Yes I like the simplicity of the current proposal.
>> >
>> > Hi Gyula,
>> > The next stage is to assign a Flip number and move the content of the
>> google doc into the flip wiki. Unfortunately, as we are not committers, we
>> are not authorized to do either of these activities. Are you able to copy
>> this over or get another committer to do this please; so we can get this
>> moving.
>> >
>> > Kind regards, David.
>> >
>> > From: Lajith Koova lajith...@gmail.com
>> >
>> > Date: Monday, 14 October 2024 at 08:52
>> > To: dev@flink.apache.org dev@flink.apache.org
>> >
>> > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink
>> CRD
>> > Thank you all for the valuable feedback .
>> >
>> >
>> > Following the procedure outlined on the Flink Improvement Proposal
>> >
>> > Confluence page [1], we kindly ask the PMC/Committers to transfer the
>> >
>> > content from the Add K8S conditions to CRD's Status [2] and assign a
>> >
>> > FLIP Number for us, which we will use for voting.
>> >
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process
>> >
>> > [2]
>> >
>> >
>> https://docs.google.com/document/d/12wlJCL_Vq2KZnABzK7OR7gAd1jZMmo0MxgXQXqtWODs/edit?tab=t.0
>> >
>> >
>> > Thanks
>> >
>> > Lajith
>> >
>> > On Mon, Sep 23, 2024 at 11:54 PM Gyula Fóra gyula.f...@gmail.com wrote:
>> >
>> > > Hey!
>> > >
>> > > I think the proposal is now simple enough :
>> > > - Running condition for Applications / SessionJobs
>> > > - Ready condition for Session clusters
>> > >
>> > > I think we should formalize this into a Flip page and start the vote
>> on
>> > > this from my side.
>> > > The next step to consider is having an independent condition that
>> captures
>> > > the upgrade process itself (if a resource is fully upgraded /
>> reconciled)
>> > >
>> > > Cheers,
>> > > Gyula
>> > >
>> > > On Mon, Sep 23, 2024 at 12:16 PM David Radley david_rad...@uk.ibm.com
>> > > wrote:
>> > >
>> > > > Hi Lajith,
>> > > > The updated document is much more detailed and looks good. As you
>> say the
>> > > > only situation that is not handled currently is when there are
>> multiple
>> > > > Flink jobs running in Application Mode.
>> > > >
>> > > > As discussed , you are looking to test this situation so we know
>> how it
>> > > > will perform.
>> > > >
>> > > > When you say “During transition of Job state, there will be only one
>> > > > condition for the
>> > > > Flink Deployment in application mode.”. I am not sure I understand.
>> > > >
>> > > > * I thought we have 1 condition per Flink job state, so I assume we
>> > > > have one true condition and potentially other historical false ones.
>> > > > * When you say during transition, are you thinking of some small
>> time
>> > > > window between states. I am not sure what you are saying here.
>> > > >
>> > > > Kind regards , David
>> > > >
>> > > > From: Lajith Koova lajith...@gmail.com
>> > > > Date: Wednesday, 11 September 2024 at 03:01
>> > > > To: dev@flink.apache.org dev@flink.apache.org
>> > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to
>> Flink
>> > > > CRD
>> > > > Hi,
>> > > >
>> > > > Here is the updated Proposal doc
>> > > > <
>> > >
>> > >
>> https://docs.google.com/document/d/12wlJCL_Vq2KZnABzK7OR7gAd1jZMmo0MxgXQXqtWODs/edit#heading=h.cz8x5nsncuwb
>> > >
>> > > > .
>> > > >
>> > > > *Summary : *
>> > > >
>> > > > Session Mode:
>> > > >
>> > > > Status conditions will be populated with status of Job manager.
>> > > >
>> > > > Application Mode:
>> > > >
>> > > > 1. In application mode , status conditions will be populated with
>> status
>> > > > of
>> > > > Job running in the cluster.
>> > > >
>> > > > 2. Each Flink Job state will have one condition associated with.
>> > > >
>> > > > 3. During transition of Job state, there will be only one condition
>> for
>> > > > the
>> > > > Flink Deployment in application mode.
>> > > >
>> > > > 4. If there are multiple Jobs in application, how to handle them in
>> > > > populating the condition status?. does condition status should
>> contain
>> > > > information about multiple jobs?.
>> > > >
>> > > > Please let me know your inputs and suggestions.
>> > > >
>> > > > Regards
>> > > >
>> > > > Lajith
>> > > >
>> > > > On Fri, Jun 7, 2024 at 10:25 AM Lajith Koova lajith...@gmail.com
>> > > > wrote:
>> > > >
>> > > > > Thank you Gyula for the feedback.
>> > > > >
>> > > > > From the above proposed conditions, so will be having two
>> conditions
>> > > > > as
>> > > > > below
>> > > > >
>> > > > > status:
>> > > > > conditions:
>> > > > > - type: JobReady
>> > > > > message: The Job is running
>> > > > > reason: running
>> > > > > status: 'True'
>> > > > > lastTransitionTime: ''
>> > > > > - type: ReconciliationSucceed
>> > > > > message: The resource deployment is considered to be stable and
>> won’t
>> > > > > be
>> > > > > rolled back
>> > > > > reason: stable
>> > > > > status: 'True'
>> > > > > lastTransitionTime: ''
>> > > > >
>> > > > > Condition JobReady is derived from JobStatus and Condition
>> > > > > ReconciliationSucceed
>> > > > > derived from LifecycleState.
>> > > > >
>> > > > > Please correct me if I missed anything.
>> > > > >
>> > > > > Thanks
>> > > > > Lajith K
>> > > > >
>> > > > > On Thu, May 30, 2024 at 2:23 PM Gyula Fóra gyula.f...@gmail.com
>> > > > > wrote:
>> > > > >
>> > > > > > David,
>> > > > > >
>> > > > > > The problem is exactly that ResourceLifecycleStates do not
>> correspond
>> > > > > > to
>> > > > > > specific Job statuses (JobReady condition) in most cases. Let
>> me give
>> > > > > > you
>> > > > > > a
>> > > > > > concrete example:
>> > > > > >
>> > > > > > ResourceLifecycleState.STABLE means that app/job defined in the
>> spec
>> > > > > > has
>> > > > > > been successfully deployed and was observed running, and this
>> spec is
>> > > > > > now
>> > > > > > considered to be stable (won't be rolled back). Once a resource
>> > > > > > (FlinkDeployment) reached STABLE state, it won't change unless
>> the
>> > > > > > user
>> > > > > > changes the spec. At the same time, this doesn't really say
>> anything
>> > > > > > about
>> > > > > > job health/readiness at any given future time. 10 minutes later
>> the
>> > > > > > job
>> > > > > > can
>> > > > > > go in an unrecoverable failure loop and never reach a running
>> status,
>> > > > > > the
>> > > > > > ResourceLifecycleState will remain STABLE.
>> > > > > >
>> > > > > > This is actually not a problem with the ResourceLifecycleState
>> but
>> > > > > > more
>> > > > > > with the understanding of it. It's called
>> ResourceLifecycleState and
>> > > > > > not
>> > > > > > JobState exactly because it refers to the
>> upgrade/rollback/suspend etc
>> > > > > > lifecycle of the FlinkDeployment/FlinkSessionJob resource and
>> not the
>> > > > > > underlying flink job itself.
>> > > > > >
>> > > > > > But this is a crucial detail here that we need to consider
>> otherwise
>> > > > > > the
>> > > > > > "Ready" condition that we may create will be practically
>> useless.
>> > > > > >
>> > > > > > This is the reason why @morh...@apache.org morh...@apache.org
>> and
>> > > > > > I suggest separating this to at least 2 independent conditions.
>> One
>> > > > > > could
>> > > > > > be the UpgradeCompleted/ReconciliationCompleted or something
>> along
>> > > > > > these
>> > > > > > lines computed based on LifecycleState (as described in your
>> proposal
>> > > > > > but
>> > > > > > with a different name). The other should be JobReady which could
>> > > > > > initially
>> > > > > > work based on the JobStatus.state field but ideally would be
>> user
>> > > > > > configurable ready condition such as (job running at least 10
>> minutes,
>> > > > > > running and have taken checkpoints etcetc).
>> > > > > >
>> > > > > > These 2 conditions should be enough to start with and would
>> actually
>> > > > > > provide a tangible value to users. We can probably leave out
>> > > > > > ClusterReady
>> > > > > > on a second thought.
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Gyula
>> > > > > >
>> > > > > > On Wed, May 29, 2024 at 5:16 PM David Radley <
>> david_rad...@uk.ibm.com
>> > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi Gyula,
>> > > > > > > Thank you for the quick response and confirmation we need a
>> Flip. I
>> > > > > > > am
>> > > > > > > not
>> > > > > > > an expert at K8s, Lajith will answer in more detail. Some
>> questions
>> > > > > > > I
>> > > > > > > had
>> > > > > > > anyway:
>> > > > > > >
>> > > > > > > I assume each of the ResourceLifecycleState do have a
>> corresponding
>> > > > > > > jobReady status. You point out some mistakes in the table, for
>> > > > > > > example
>> > > > > > > that
>> > > > > > > STABLE should be NotReady; thankyou. If we put a reason
>> mentioning
>> > > > > > > the
>> > > > > > > stable state, this would help us understand the jobStatus.
>> > > > > > >
>> > > > > > > I guess the jobReady is one perspective that we know is
>> useful (with
>> > > > > > > corrected mappings from ResourceLifecycleState and with
>> reasons).
>> > > > > > > Can I
>> > > > > > > check that the 2 proposed conditions would also be useful
>> > > > > > > additions?
>> > > > > > > I
>> > > > > > > assume that in your proposal when jobReady is true, then
>> > > > > > > UpgradeCompleted
>> > > > > > > condition would not be present and ClusterReady would always
>> be
>> > > > > > > true?
>> > > > > > > I
>> > > > > > > know conditions do not need to be orthogonal, but I wanted to
>> check
>> > > > > > > what
>> > > > > > > your thoughts are.
>> > > > > > >
>> > > > > > > Kind regards, David.
>> > > > > > >
>> > > > > > > From: Gyula Fóra gyula.f...@gmail.com
>> > > > > > > Date: Wednesday, 29 May 2024 at 15:28
>> > > > > > > To: dev@flink.apache.org dev@flink.apache.org
>> > > > > > > Cc: morh...@apache.org morh...@apache.org
>> > > > > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions
>> to
>> > > > > > > Flink
>> > > > > > > CRD
>> > > > > > > Hi David!
>> > > > > > >
>> > > > > > > This change definitely warrants a FLIP even if the code
>> change is
>> > > > > > > not
>> > > > > > > huge,
>> > > > > > > there are quite some implications going forward.
>> > > > > > >
>> > > > > > > Looping in @morh...@apache.org morh...@apache.org for this
>> > > > > > > discussion.
>> > > > > > >
>> > > > > > > I have some questions / suggestions regarding the condition's
>> > > > > > > meaning
>> > > > > > > and
>> > > > > > > naming.
>> > > > > > >
>> > > > > > > In your proposal you have:
>> > > > > > > - Ready (True/False) -> This condition is intended for
>> resources
>> > > > > > > which
>> > > > > > > are
>> > > > > > > fully ready and operational
>> > > > > > > - Error (True) -> This condition can be used in scenarios
>> where any
>> > > > > > > exception/error during resource reconcile process
>> > > > > > >
>> > > > > > > The problem with the above is that the implementation does
>> not well
>> > > > > > > reflect
>> > > > > > > this. ResourceLifecycleState STABLE/ROLLED_BACK does not
>> actually
>> > > > > > > mean
>> > > > > > > the
>> > > > > > > job is running, it just means that the resource is fully
>> reconciled
>> > > > > > > and
>> > > > > > > it
>> > > > > > > will not be rolled back (so the current pending upgrade is
>> > > > > > > completed).
>> > > > > > > This
>> > > > > > > is mainly a fault of the ResourceLifecycleState as it doesn't
>> > > > > > > capture
>> > > > > > > the
>> > > > > > > job status but one could argue that it was "designed" this
>> way.
>> > > > > > >
>> > > > > > > I think we should probably have more condition types to
>> capture the
>> > > > > > > difference:
>> > > > > > > - JobReady (True/False) -> Flink job is running (Basically job
>> > > > > > > status
>> > > > > > > but
>> > > > > > > with transition time)
>> > > > > > > - ClusterReady (True/False) -> Session / Application cluster
>> is
>> > > > > > > deployed
>> > > > > > > (Basically JM deployment status but with transition time)
>> > > > > > > - UpgradeCompleted (True/False) -> Similar to what you call
>> Ready
>> > > > > > > now
>> > > > > > > which should correspond to the STABLE/ROLLED_BACK states and
>> mostly
>> > > > > > > tracks
>> > > > > > > in-progress CR updates
>> > > > > > >
>> > > > > > > This is my best idea at the moment, not great as it feels a
>> little
>> > > > > > > redundant with the current status fields. But maybe thats not
>> a
>> > > > > > > problem
>> > > > > > > or
>> > > > > > > a way to eliminate the old fields later?
>> > > > > > >
>> > > > > > > I am not so sure of the Error status and what this means in
>> > > > > > > practice.
>> > > > > > > Why
>> > > > > > > do we want to track the last error in 2 places? It's already
>> in the
>> > > > > > > status.
>> > > > > > >
>> > > > > > > What do you think?
>> > > > > > > Gyula
>> > > > > > >
>> > > > > > > On Wed, May 29, 2024 at 3:55 PM David Radley <
>> > > > > > > david_rad...@uk.ibm.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > > Thanks Lajith for raising this discussion thread under the
>> Flip
>> > > > > > > > title.
>> > > > > > > >
>> > > > > > > > To summarise the concerns from the other discussion thread.
>> > > > > > > >
>> > > > > > > > “
>> > > > > > > > - I echo Gyula that including some examples and further
>> > > > > > > > explanations
>> > > > > > > > might
>> > > > > > > > ease reader's work. With the current version, the FLIP is a
>> bit
>> > > > > > > > hard
>> > > > > > > > to
>> > > > > > > > follow. - Will the usage of Conditions be enabled by
>> default? Or
>> > > > > > > > will
>> > > > > > > > there
>> > > > > > > > be any disadvantages for Flink users? If Conditions with
>> the same
>> > > > > > > > type
>> > > > > > > > already exist in the Status Conditions
>> > > > > > > >
>> > > > > > > > - Do you think we should have clear rules about handling
>> rules for
>> > > > > > > > how
>> > > > > > > > these Conditions should be managed, especially when multiple
>> > > > > > > > Conditions
>> > > > > > > > of
>> > > > > > > > the same type are present? For example, resource has
>> multiple
>> > > > > > > > causes
>> > > > > > > > for
>> > > > > > > > the same condition (e.g., Error due to network and Error
>> due to
>> > > > > > > > I/O).
>> > > > > > > > Then,
>> > > > > > > > overriding the old condition with the new one is not the
>> best
>> > > > > > > > approach
>> > > > > > > > no?
>> > > > > > > > Please correct me if I misunderstood.
>> > > > > > > > “
>> > > > > > > >
>> > > > > > > > I see the Google doc link has been reformatted to match the
>> Flip
>> > > > > > > > template.
>> > > > > > > >
>> > > > > > > > To explicitly answer the questions from Jeyhun and Gyula:
>> > > > > > > > - “Will the usage of Conditions be enabled by default?”
>> Yes, but
>> > > > > > > > this
>> > > > > > > > is
>> > > > > > > > just making the status content useful, whereas before it
>> was not
>> > > > > > > > useful.
>> > > > > > > > - in terms of examples, I am not sure what you would like
>> to see,
>> > > > > > > > the
>> > > > > > > > table Lajith provided shows the status for various
>> > > > > > > > ResourceLifecycleStates.
>> > > > > > > > How the operator gets into these states is the current
>> behaviour.
>> > > > > > > > The
>> > > > > > > > change just shows the appropriate corresponding high level
>> status
>> > > > > > > > –
>> > > > > > > > that
>> > > > > > > > could be shown on the User Interfaces.
>> > > > > > > > - “will there be any disadvantages for Flink users?” None ,
>> there
>> > > > > > > > is
>> > > > > > > > just
>> > > > > > > > more information in the status, without this it is more
>> difficult
>> > > > > > > > to
>> > > > > > > > work
>> > > > > > > > out the status of the job.
>> > > > > > > > - Multiple conditions question. The status is showing
>> whether the
>> > > > > > > > job
>> > > > > > > > is
>> > > > > > > > ready or not, so as long as the last condition is the one
>> that is
>> > > > > > > > shown -
>> > > > > > > > all is as expected. I don’t think this needs rules for
>> precedence
>> > > > > > > > and
>> > > > > > > > the
>> > > > > > > > like.
>> > > > > > > > - The condition’s Reason is going to be more specific.
>> > > > > > > >
>> > > > > > > > Gyula and Jeyhun, is the google doc clear enough for you
>> now? Do
>> > > > > > > > you
>> > > > > > > > feel
>> > > > > > > > you feedback has been addressed? Lajith and I are happy to
>> provide
>> > > > > > > > more
>> > > > > > > > details.
>> > > > > > > >
>> > > > > > > > I wonder whether this change is big enough to warrant a
>> Flip, as
>> > > > > > > > it
>> > > > > > > > is so
>> > > > > > > > small. We could do this in an issue. WDYT?
>> > > > > > > >
>> > > > > > > > Kind regards, David.
>> > > > > > > >
>> > > > > > > > From: Lajith Koova lajith...@gmail.com
>> > > > > > > > Date: Wednesday, 29 May 2024 at 13:41
>> > > > > > > > To: dev@flink.apache.org dev@flink.apache.org
>> > > > > > > > Subject: [EXTERNAL] [DISCUSS] FLIP-XXX Add K8S conditions
>> to Flink
>> > > > > > > > CRD
>> > > > > > > > Hello ,
>> > > > > > > >
>> > > > > > > > Discussion thread here:
>> > > > > > > >
>> https://lists.apache.org/thread/dvy8w17pyjv68c3t962w49frl9odoz4z
>> > > > > > > > to
>> > > > > > > > discuss a proposal to add Conditions field in the CR status
>> of
>> > > > > > > > Flink
>> > > > > > > > Deployment and FlinkSessionJob.
>> > > > > > > >
>> > > > > > > > Note : Starting this new thread as discussion thread title
>> has
>> > > > > > > > been
>> > > > > > > > modified to follow the FLIP process.
>> > > > > > > >
>> > > > > > > > Thank you.
>> > > > > > > >
>> > > > > > > > Unless otherwise stated above:
>> > > > > > > >
>> > > > > > > > IBM United Kingdom Limited
>> > > > > > > > Registered in England and Wales with number 741598
>> > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
>> Hants.
>> > > > > > > > PO6
>> > > > > > > > 3AU
>> > > > > > >
>> > > > > > > Unless otherwise stated above:
>> > > > > > >
>> > > > > > > IBM United Kingdom Limited
>> > > > > > > Registered in England and Wales with number 741598
>> > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth,
>> Hants. PO6
>> > > > > > > 3AU
>> > > >
>> > > > Unless otherwise stated above:
>> > > >
>> > > > IBM United Kingdom Limited
>> > > > Registered in England and Wales with number 741598
>> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
>> 3AU
>> >
>> >
>> > Unless otherwise stated above:
>> >
>> > IBM United Kingdom Limited
>> > Registered in England and Wales with number 741598
>> > Registered office: Building C, IBM Hursley Office, Hursley Park Road,
>> Winchester, Hampshire SO21 2JN
>>
>

Reply via email to