>From a procedural point of view, we shouldn't make FLIPs sub-tasks for existing FLIPs that have been voted/are released. That will only cause confusion down the line. A new FLIP should take existing functionality (like FLIP-304) into account, and propose how to improve on what that original FLIP has introduced or how you're going to leverage what's already there.
On Tue, Apr 23, 2024 at 11:42 AM ramkrishna vasudevan < ramvasu.fl...@gmail.com> wrote: > Hi Gyula and Ahmed, > > I totally agree that there is an interlap in the final goal that both the > FLIPs are achieving here and infact FLIP-304 is more comprehensive for job > failures. > > But as a proposal to move forward can we make Swathi's FLIP/JIRA as a sub > task for FLIP-304 and continue with the PR since the main aim is to get the > cluster failure pushed to the termination log for K8s based deployments. > And once it is completed we can work to make FLIP-304 to support job > failure propagation to termination log? > > Regards > Ram > > On Thu, Apr 18, 2024 at 10:07 PM Swathi C <swathi.c.apa...@gmail.com> > wrote: > > > Hi Gyula and Ahmed, > > > > Thanks for reviewing this. > > > > @gyula.f...@gmail.com <gyula.f...@gmail.com> , currently since our aim > as > > part of this FLIP was only to fail the cluster when job manager/flink has > > issues such that the cluster would no longer be usable, hence, we > proposed > > only related to that. > > Your right, that it covers only job main class errors, job manager run > time > > failures, if the Job manager wants to write any metadata to any other > > system ( ABFS, S3 , ... ) and the job failures will not be covered. > > > > FLIP-304 is mainly used to provide Failure enrichers for job failures. > > Since, this FLIP is mainly for flink Job manager failures, let us know if > > we can leverage the goodness of both and try to extend FLIP-304 and add > our > > plugin implementation to cover the job level issues ( propagate this info > > to the /dev/termination-log such that, the container status reports it > for > > flink on K8S by implementing Failure Enricher interface and > > processFailure() to do this ) and use this FLIP proposal for generic > flink > > cluster (Job manager/cluster ) failures. > > > > Regards, > > Swathi C > > > > On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy <hamdy10...@gmail.com> > wrote: > > > > > Hi Swathi! > > > Thanks for the proposal. > > > Could you please elaborate what this FLIP offers more than Flip-304[1]? > > > Flip 304 proposes a Pluggable mechanism for enriching Job failures, If > I > > am > > > not mistaken this proposal looks like a subset of it. > > > > > > 1- > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers > > > > > > Best Regards > > > Ahmed Hamdy > > > > > > > > > On Thu, 18 Apr 2024 at 08:23, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > > > > > Hi Swathi! > > > > > > > > Thank you for creating this proposal. I really like the general idea > of > > > > increasing the K8s native observability of Flink job errors. > > > > > > > > I took a quick look at your reference PR, the termination log related > > > logic > > > > is contained completely in the ClusterEntrypoint. What type of errors > > > will > > > > this actually cover? > > > > > > > > To me this seems to cover only: > > > > - Job main class errors (ie startup errors) > > > > - JobManager failures > > > > > > > > Would regular job errors (that cause only job failover but not JM > > errors) > > > > be reported somehow with this plugin? > > > > > > > > Thanks > > > > Gyula > > > > > > > > On Tue, Apr 16, 2024 at 8:21 AM Swathi C <swathi.c.apa...@gmail.com> > > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > I would like to start a discussion on FLIP-XXX : [Plugin] Enhancing > > > Flink > > > > > Failure Management in Kubernetes with Dynamic Termination Log > > > > Integration. > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing > > > > > > > > > > > > > > > This FLIP proposes an improvement plugin and focuses mainly on > Flink > > on > > > > > K8S but can be used as a generic plugin and add further > enhancements. > > > > > > > > > > Looking forward to everyone's feedback and suggestions. Thank you > !! > > > > > > > > > > Best Regards, > > > > > Swathi Chandrashekar > > > > > > > > > > > > > > >