I would prefer a separate FLIP On Wed, Apr 24, 2024 at 3:25 PM Swathi C <swathi.c.apa...@gmail.com> wrote:
> Sure Ahmed and Martijn. > Fetching the flink particular job related failure and adding this logic to > termination-log is definitely a sub-task of pluggable enricher as we can > leverage pluggable enricher to achieve this. > But for CRUD level failures, which is mainly used to notify if the job > manager failed might not be using the pluggable enricher. So, let us know > if that needs to be there as a separate FLIP or we can combine that as well > under the pluggable enricher ( by adding another sub task ) ? > > Regards, > Swathi C > > On Wed, Apr 24, 2024 at 3:46 PM Ahmed Hamdy <hamdy10...@gmail.com> wrote: > > > Hi, > > I agree with the Martijn, We can reformulate the FLIP to introduce > > termination log as supported pluggable enricher. If you believe the scope > > of work is a subset (Further implementation) we can just add a Jira > ticket > > for it. IMO this will also help with implementation taking the existing > > enrichers into reference. > > Best Regards > > Ahmed Hamdy > > > > > > On Tue, 23 Apr 2024 at 15:23, Martijn Visser <martijnvis...@apache.org> > > wrote: > > > > > From a procedural point of view, we shouldn't make FLIPs sub-tasks for > > > existing FLIPs that have been voted/are released. That will only cause > > > confusion down the line. A new FLIP should take existing functionality > > > (like FLIP-304) into account, and propose how to improve on what that > > > original FLIP has introduced or how you're going to leverage what's > > already > > > there. > > > > > > On Tue, Apr 23, 2024 at 11:42 AM ramkrishna vasudevan < > > > ramvasu.fl...@gmail.com> wrote: > > > > > > > Hi Gyula and Ahmed, > > > > > > > > I totally agree that there is an interlap in the final goal that both > > the > > > > FLIPs are achieving here and infact FLIP-304 is more comprehensive > for > > > job > > > > failures. > > > > > > > > But as a proposal to move forward can we make Swathi's FLIP/JIRA as a > > sub > > > > task for FLIP-304 and continue with the PR since the main aim is to > get > > > the > > > > cluster failure pushed to the termination log for K8s based > > deployments. > > > > And once it is completed we can work to make FLIP-304 to support job > > > > failure propagation to termination log? > > > > > > > > Regards > > > > Ram > > > > > > > > On Thu, Apr 18, 2024 at 10:07 PM Swathi C <swathi.c.apa...@gmail.com > > > > > > wrote: > > > > > > > > > Hi Gyula and Ahmed, > > > > > > > > > > Thanks for reviewing this. > > > > > > > > > > @gyula.f...@gmail.com <gyula.f...@gmail.com> , currently since our > > aim > > > > as > > > > > part of this FLIP was only to fail the cluster when job > manager/flink > > > has > > > > > issues such that the cluster would no longer be usable, hence, we > > > > proposed > > > > > only related to that. > > > > > Your right, that it covers only job main class errors, job manager > > run > > > > time > > > > > failures, if the Job manager wants to write any metadata to any > other > > > > > system ( ABFS, S3 , ... ) and the job failures will not be > covered. > > > > > > > > > > FLIP-304 is mainly used to provide Failure enrichers for job > > failures. > > > > > Since, this FLIP is mainly for flink Job manager failures, let us > > know > > > if > > > > > we can leverage the goodness of both and try to extend FLIP-304 and > > add > > > > our > > > > > plugin implementation to cover the job level issues ( propagate > this > > > info > > > > > to the /dev/termination-log such that, the container status reports > > it > > > > for > > > > > flink on K8S by implementing Failure Enricher interface and > > > > > processFailure() to do this ) and use this FLIP proposal for > generic > > > > flink > > > > > cluster (Job manager/cluster ) failures. > > > > > > > > > > Regards, > > > > > Swathi C > > > > > > > > > > On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy <hamdy10...@gmail.com> > > > > wrote: > > > > > > > > > > > Hi Swathi! > > > > > > Thanks for the proposal. > > > > > > Could you please elaborate what this FLIP offers more than > > > Flip-304[1]? > > > > > > Flip 304 proposes a Pluggable mechanism for enriching Job > failures, > > > If > > > > I > > > > > am > > > > > > not mistaken this proposal looks like a subset of it. > > > > > > > > > > > > 1- > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers > > > > > > > > > > > > Best Regards > > > > > > Ahmed Hamdy > > > > > > > > > > > > > > > > > > On Thu, 18 Apr 2024 at 08:23, Gyula Fóra <gyula.f...@gmail.com> > > > wrote: > > > > > > > > > > > > > Hi Swathi! > > > > > > > > > > > > > > Thank you for creating this proposal. I really like the general > > > idea > > > > of > > > > > > > increasing the K8s native observability of Flink job errors. > > > > > > > > > > > > > > I took a quick look at your reference PR, the termination log > > > related > > > > > > logic > > > > > > > is contained completely in the ClusterEntrypoint. What type of > > > errors > > > > > > will > > > > > > > this actually cover? > > > > > > > > > > > > > > To me this seems to cover only: > > > > > > > - Job main class errors (ie startup errors) > > > > > > > - JobManager failures > > > > > > > > > > > > > > Would regular job errors (that cause only job failover but not > JM > > > > > errors) > > > > > > > be reported somehow with this plugin? > > > > > > > > > > > > > > Thanks > > > > > > > Gyula > > > > > > > > > > > > > > On Tue, Apr 16, 2024 at 8:21 AM Swathi C < > > > swathi.c.apa...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > I would like to start a discussion on FLIP-XXX : [Plugin] > > > Enhancing > > > > > > Flink > > > > > > > > Failure Management in Kubernetes with Dynamic Termination Log > > > > > > > Integration. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing > > > > > > > > > > > > > > > > > > > > > > > > This FLIP proposes an improvement plugin and focuses mainly > on > > > > Flink > > > > > on > > > > > > > > K8S but can be used as a generic plugin and add further > > > > enhancements. > > > > > > > > > > > > > > > > Looking forward to everyone's feedback and suggestions. Thank > > you > > > > !! > > > > > > > > > > > > > > > > Best Regards, > > > > > > > > Swathi Chandrashekar > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >