Thank you so much, Ferenc , and regarding your observation , FLIP covers the changes in FlinkDeployment CR which defines Flink Application and Session cluster deployments, hence it is referred to as FlinkDeployment. Thank you
Regards Lajith On Thu, Nov 7, 2024 at 12:46 AM Ferenc Csaky <ferenc.cs...@pm.me.invalid> wrote: > Hi, > > I can help to create a FLIP page, from the gdoc, but one thing > that I noteced is under "Session mode" both the text and the code > snippets refer to "FlinkDeployment". I believe that should be > "FlinkSessionJob". > > Best, > Ferenc > > > > On Wednesday, November 6th, 2024 at 17:33, David Radley < > david_rad...@uk.ibm.com> wrote: > > > > > > > Hi lajith, > > Yes I like the simplicity of the current proposal. > > > > Hi Gyula, > > The next stage is to assign a Flip number and move the content of the > google doc into the flip wiki. Unfortunately, as we are not committers, we > are not authorized to do either of these activities. Are you able to copy > this over or get another committer to do this please; so we can get this > moving. > > > > Kind regards, David. > > > > From: Lajith Koova lajith...@gmail.com > > > > Date: Monday, 14 October 2024 at 08:52 > > To: dev@flink.apache.org dev@flink.apache.org > > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to Flink > CRD > > Thank you all for the valuable feedback . > > > > > > Following the procedure outlined on the Flink Improvement Proposal > > > > Confluence page [1], we kindly ask the PMC/Committers to transfer the > > > > content from the Add K8S conditions to CRD's Status [2] and assign a > > > > FLIP Number for us, which we will use for voting. > > > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process > > > > [2] > > > > > https://docs.google.com/document/d/12wlJCL_Vq2KZnABzK7OR7gAd1jZMmo0MxgXQXqtWODs/edit?tab=t.0 > > > > > > Thanks > > > > Lajith > > > > On Mon, Sep 23, 2024 at 11:54 PM Gyula Fóra gyula.f...@gmail.com wrote: > > > > > Hey! > > > > > > I think the proposal is now simple enough : > > > - Running condition for Applications / SessionJobs > > > - Ready condition for Session clusters > > > > > > I think we should formalize this into a Flip page and start the vote on > > > this from my side. > > > The next step to consider is having an independent condition that > captures > > > the upgrade process itself (if a resource is fully upgraded / > reconciled) > > > > > > Cheers, > > > Gyula > > > > > > On Mon, Sep 23, 2024 at 12:16 PM David Radley david_rad...@uk.ibm.com > > > wrote: > > > > > > > Hi Lajith, > > > > The updated document is much more detailed and looks good. As you > say the > > > > only situation that is not handled currently is when there are > multiple > > > > Flink jobs running in Application Mode. > > > > > > > > As discussed , you are looking to test this situation so we know how > it > > > > will perform. > > > > > > > > When you say “During transition of Job state, there will be only one > > > > condition for the > > > > Flink Deployment in application mode.”. I am not sure I understand. > > > > > > > > * I thought we have 1 condition per Flink job state, so I assume we > > > > have one true condition and potentially other historical false ones. > > > > * When you say during transition, are you thinking of some small time > > > > window between states. I am not sure what you are saying here. > > > > > > > > Kind regards , David > > > > > > > > From: Lajith Koova lajith...@gmail.com > > > > Date: Wednesday, 11 September 2024 at 03:01 > > > > To: dev@flink.apache.org dev@flink.apache.org > > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions to > Flink > > > > CRD > > > > Hi, > > > > > > > > Here is the updated Proposal doc > > > > < > > > > > > > https://docs.google.com/document/d/12wlJCL_Vq2KZnABzK7OR7gAd1jZMmo0MxgXQXqtWODs/edit#heading=h.cz8x5nsncuwb > > > > > > > . > > > > > > > > *Summary : * > > > > > > > > Session Mode: > > > > > > > > Status conditions will be populated with status of Job manager. > > > > > > > > Application Mode: > > > > > > > > 1. In application mode , status conditions will be populated with > status > > > > of > > > > Job running in the cluster. > > > > > > > > 2. Each Flink Job state will have one condition associated with. > > > > > > > > 3. During transition of Job state, there will be only one condition > for > > > > the > > > > Flink Deployment in application mode. > > > > > > > > 4. If there are multiple Jobs in application, how to handle them in > > > > populating the condition status?. does condition status should > contain > > > > information about multiple jobs?. > > > > > > > > Please let me know your inputs and suggestions. > > > > > > > > Regards > > > > > > > > Lajith > > > > > > > > On Fri, Jun 7, 2024 at 10:25 AM Lajith Koova lajith...@gmail.com > > > > wrote: > > > > > > > > > Thank you Gyula for the feedback. > > > > > > > > > > From the above proposed conditions, so will be having two > conditions > > > > > as > > > > > below > > > > > > > > > > status: > > > > > conditions: > > > > > - type: JobReady > > > > > message: The Job is running > > > > > reason: running > > > > > status: 'True' > > > > > lastTransitionTime: '' > > > > > - type: ReconciliationSucceed > > > > > message: The resource deployment is considered to be stable and > won’t > > > > > be > > > > > rolled back > > > > > reason: stable > > > > > status: 'True' > > > > > lastTransitionTime: '' > > > > > > > > > > Condition JobReady is derived from JobStatus and Condition > > > > > ReconciliationSucceed > > > > > derived from LifecycleState. > > > > > > > > > > Please correct me if I missed anything. > > > > > > > > > > Thanks > > > > > Lajith K > > > > > > > > > > On Thu, May 30, 2024 at 2:23 PM Gyula Fóra gyula.f...@gmail.com > > > > > wrote: > > > > > > > > > > > David, > > > > > > > > > > > > The problem is exactly that ResourceLifecycleStates do not > correspond > > > > > > to > > > > > > specific Job statuses (JobReady condition) in most cases. Let me > give > > > > > > you > > > > > > a > > > > > > concrete example: > > > > > > > > > > > > ResourceLifecycleState.STABLE means that app/job defined in the > spec > > > > > > has > > > > > > been successfully deployed and was observed running, and this > spec is > > > > > > now > > > > > > considered to be stable (won't be rolled back). Once a resource > > > > > > (FlinkDeployment) reached STABLE state, it won't change unless > the > > > > > > user > > > > > > changes the spec. At the same time, this doesn't really say > anything > > > > > > about > > > > > > job health/readiness at any given future time. 10 minutes later > the > > > > > > job > > > > > > can > > > > > > go in an unrecoverable failure loop and never reach a running > status, > > > > > > the > > > > > > ResourceLifecycleState will remain STABLE. > > > > > > > > > > > > This is actually not a problem with the ResourceLifecycleState > but > > > > > > more > > > > > > with the understanding of it. It's called ResourceLifecycleState > and > > > > > > not > > > > > > JobState exactly because it refers to the > upgrade/rollback/suspend etc > > > > > > lifecycle of the FlinkDeployment/FlinkSessionJob resource and > not the > > > > > > underlying flink job itself. > > > > > > > > > > > > But this is a crucial detail here that we need to consider > otherwise > > > > > > the > > > > > > "Ready" condition that we may create will be practically useless. > > > > > > > > > > > > This is the reason why @morh...@apache.org morh...@apache.org > and > > > > > > I suggest separating this to at least 2 independent conditions. > One > > > > > > could > > > > > > be the UpgradeCompleted/ReconciliationCompleted or something > along > > > > > > these > > > > > > lines computed based on LifecycleState (as described in your > proposal > > > > > > but > > > > > > with a different name). The other should be JobReady which could > > > > > > initially > > > > > > work based on the JobStatus.state field but ideally would be user > > > > > > configurable ready condition such as (job running at least 10 > minutes, > > > > > > running and have taken checkpoints etcetc). > > > > > > > > > > > > These 2 conditions should be enough to start with and would > actually > > > > > > provide a tangible value to users. We can probably leave out > > > > > > ClusterReady > > > > > > on a second thought. > > > > > > > > > > > > Cheers, > > > > > > Gyula > > > > > > > > > > > > On Wed, May 29, 2024 at 5:16 PM David Radley < > david_rad...@uk.ibm.com > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Gyula, > > > > > > > Thank you for the quick response and confirmation we need a > Flip. I > > > > > > > am > > > > > > > not > > > > > > > an expert at K8s, Lajith will answer in more detail. Some > questions > > > > > > > I > > > > > > > had > > > > > > > anyway: > > > > > > > > > > > > > > I assume each of the ResourceLifecycleState do have a > corresponding > > > > > > > jobReady status. You point out some mistakes in the table, for > > > > > > > example > > > > > > > that > > > > > > > STABLE should be NotReady; thankyou. If we put a reason > mentioning > > > > > > > the > > > > > > > stable state, this would help us understand the jobStatus. > > > > > > > > > > > > > > I guess the jobReady is one perspective that we know is useful > (with > > > > > > > corrected mappings from ResourceLifecycleState and with > reasons). > > > > > > > Can I > > > > > > > check that the 2 proposed conditions would also be useful > > > > > > > additions? > > > > > > > I > > > > > > > assume that in your proposal when jobReady is true, then > > > > > > > UpgradeCompleted > > > > > > > condition would not be present and ClusterReady would always be > > > > > > > true? > > > > > > > I > > > > > > > know conditions do not need to be orthogonal, but I wanted to > check > > > > > > > what > > > > > > > your thoughts are. > > > > > > > > > > > > > > Kind regards, David. > > > > > > > > > > > > > > From: Gyula Fóra gyula.f...@gmail.com > > > > > > > Date: Wednesday, 29 May 2024 at 15:28 > > > > > > > To: dev@flink.apache.org dev@flink.apache.org > > > > > > > Cc: morh...@apache.org morh...@apache.org > > > > > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Add K8S conditions > to > > > > > > > Flink > > > > > > > CRD > > > > > > > Hi David! > > > > > > > > > > > > > > This change definitely warrants a FLIP even if the code change > is > > > > > > > not > > > > > > > huge, > > > > > > > there are quite some implications going forward. > > > > > > > > > > > > > > Looping in @morh...@apache.org morh...@apache.org for this > > > > > > > discussion. > > > > > > > > > > > > > > I have some questions / suggestions regarding the condition's > > > > > > > meaning > > > > > > > and > > > > > > > naming. > > > > > > > > > > > > > > In your proposal you have: > > > > > > > - Ready (True/False) -> This condition is intended for > resources > > > > > > > which > > > > > > > are > > > > > > > fully ready and operational > > > > > > > - Error (True) -> This condition can be used in scenarios > where any > > > > > > > exception/error during resource reconcile process > > > > > > > > > > > > > > The problem with the above is that the implementation does not > well > > > > > > > reflect > > > > > > > this. ResourceLifecycleState STABLE/ROLLED_BACK does not > actually > > > > > > > mean > > > > > > > the > > > > > > > job is running, it just means that the resource is fully > reconciled > > > > > > > and > > > > > > > it > > > > > > > will not be rolled back (so the current pending upgrade is > > > > > > > completed). > > > > > > > This > > > > > > > is mainly a fault of the ResourceLifecycleState as it doesn't > > > > > > > capture > > > > > > > the > > > > > > > job status but one could argue that it was "designed" this way. > > > > > > > > > > > > > > I think we should probably have more condition types to > capture the > > > > > > > difference: > > > > > > > - JobReady (True/False) -> Flink job is running (Basically job > > > > > > > status > > > > > > > but > > > > > > > with transition time) > > > > > > > - ClusterReady (True/False) -> Session / Application cluster is > > > > > > > deployed > > > > > > > (Basically JM deployment status but with transition time) > > > > > > > - UpgradeCompleted (True/False) -> Similar to what you call > Ready > > > > > > > now > > > > > > > which should correspond to the STABLE/ROLLED_BACK states and > mostly > > > > > > > tracks > > > > > > > in-progress CR updates > > > > > > > > > > > > > > This is my best idea at the moment, not great as it feels a > little > > > > > > > redundant with the current status fields. But maybe thats not a > > > > > > > problem > > > > > > > or > > > > > > > a way to eliminate the old fields later? > > > > > > > > > > > > > > I am not so sure of the Error status and what this means in > > > > > > > practice. > > > > > > > Why > > > > > > > do we want to track the last error in 2 places? It's already > in the > > > > > > > status. > > > > > > > > > > > > > > What do you think? > > > > > > > Gyula > > > > > > > > > > > > > > On Wed, May 29, 2024 at 3:55 PM David Radley < > > > > > > > david_rad...@uk.ibm.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > Thanks Lajith for raising this discussion thread under the > Flip > > > > > > > > title. > > > > > > > > > > > > > > > > To summarise the concerns from the other discussion thread. > > > > > > > > > > > > > > > > “ > > > > > > > > - I echo Gyula that including some examples and further > > > > > > > > explanations > > > > > > > > might > > > > > > > > ease reader's work. With the current version, the FLIP is a > bit > > > > > > > > hard > > > > > > > > to > > > > > > > > follow. - Will the usage of Conditions be enabled by > default? Or > > > > > > > > will > > > > > > > > there > > > > > > > > be any disadvantages for Flink users? If Conditions with the > same > > > > > > > > type > > > > > > > > already exist in the Status Conditions > > > > > > > > > > > > > > > > - Do you think we should have clear rules about handling > rules for > > > > > > > > how > > > > > > > > these Conditions should be managed, especially when multiple > > > > > > > > Conditions > > > > > > > > of > > > > > > > > the same type are present? For example, resource has multiple > > > > > > > > causes > > > > > > > > for > > > > > > > > the same condition (e.g., Error due to network and Error due > to > > > > > > > > I/O). > > > > > > > > Then, > > > > > > > > overriding the old condition with the new one is not the best > > > > > > > > approach > > > > > > > > no? > > > > > > > > Please correct me if I misunderstood. > > > > > > > > “ > > > > > > > > > > > > > > > > I see the Google doc link has been reformatted to match the > Flip > > > > > > > > template. > > > > > > > > > > > > > > > > To explicitly answer the questions from Jeyhun and Gyula: > > > > > > > > - “Will the usage of Conditions be enabled by default?” Yes, > but > > > > > > > > this > > > > > > > > is > > > > > > > > just making the status content useful, whereas before it was > not > > > > > > > > useful. > > > > > > > > - in terms of examples, I am not sure what you would like to > see, > > > > > > > > the > > > > > > > > table Lajith provided shows the status for various > > > > > > > > ResourceLifecycleStates. > > > > > > > > How the operator gets into these states is the current > behaviour. > > > > > > > > The > > > > > > > > change just shows the appropriate corresponding high level > status > > > > > > > > – > > > > > > > > that > > > > > > > > could be shown on the User Interfaces. > > > > > > > > - “will there be any disadvantages for Flink users?” None , > there > > > > > > > > is > > > > > > > > just > > > > > > > > more information in the status, without this it is more > difficult > > > > > > > > to > > > > > > > > work > > > > > > > > out the status of the job. > > > > > > > > - Multiple conditions question. The status is showing > whether the > > > > > > > > job > > > > > > > > is > > > > > > > > ready or not, so as long as the last condition is the one > that is > > > > > > > > shown - > > > > > > > > all is as expected. I don’t think this needs rules for > precedence > > > > > > > > and > > > > > > > > the > > > > > > > > like. > > > > > > > > - The condition’s Reason is going to be more specific. > > > > > > > > > > > > > > > > Gyula and Jeyhun, is the google doc clear enough for you > now? Do > > > > > > > > you > > > > > > > > feel > > > > > > > > you feedback has been addressed? Lajith and I are happy to > provide > > > > > > > > more > > > > > > > > details. > > > > > > > > > > > > > > > > I wonder whether this change is big enough to warrant a > Flip, as > > > > > > > > it > > > > > > > > is so > > > > > > > > small. We could do this in an issue. WDYT? > > > > > > > > > > > > > > > > Kind regards, David. > > > > > > > > > > > > > > > > From: Lajith Koova lajith...@gmail.com > > > > > > > > Date: Wednesday, 29 May 2024 at 13:41 > > > > > > > > To: dev@flink.apache.org dev@flink.apache.org > > > > > > > > Subject: [EXTERNAL] [DISCUSS] FLIP-XXX Add K8S conditions to > Flink > > > > > > > > CRD > > > > > > > > Hello , > > > > > > > > > > > > > > > > Discussion thread here: > > > > > > > > > https://lists.apache.org/thread/dvy8w17pyjv68c3t962w49frl9odoz4z > > > > > > > > to > > > > > > > > discuss a proposal to add Conditions field in the CR status > of > > > > > > > > Flink > > > > > > > > Deployment and FlinkSessionJob. > > > > > > > > > > > > > > > > Note : Starting this new thread as discussion thread title > has > > > > > > > > been > > > > > > > > modified to follow the FLIP process. > > > > > > > > > > > > > > > > Thank you. > > > > > > > > > > > > > > > > Unless otherwise stated above: > > > > > > > > > > > > > > > > IBM United Kingdom Limited > > > > > > > > Registered in England and Wales with number 741598 > > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, > Hants. > > > > > > > > PO6 > > > > > > > > 3AU > > > > > > > > > > > > > > Unless otherwise stated above: > > > > > > > > > > > > > > IBM United Kingdom Limited > > > > > > > Registered in England and Wales with number 741598 > > > > > > > Registered office: PO Box 41, North Harbour, Portsmouth, > Hants. PO6 > > > > > > > 3AU > > > > > > > > Unless otherwise stated above: > > > > > > > > IBM United Kingdom Limited > > > > Registered in England and Wales with number 741598 > > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 > 3AU > > > > > > Unless otherwise stated above: > > > > IBM United Kingdom Limited > > Registered in England and Wales with number 741598 > > Registered office: Building C, IBM Hursley Office, Hursley Park Road, > Winchester, Hampshire SO21 2JN >