Sure Ahmed and Martijn.
Fetching the flink particular job related failure and adding this logic to
termination-log is definitely a sub-task of pluggable enricher as we can
leverage pluggable enricher to achieve this.
But for CRUD level failures, which is mainly used to notify if the job
manager failed might not be using the pluggable enricher. So, let us know
if that needs to be there as a separate FLIP or we can combine that as well
under the pluggable enricher ( by adding another sub task ) ?

Regards,
Swathi C

On Wed, Apr 24, 2024 at 3:46 PM Ahmed Hamdy <hamdy10...@gmail.com> wrote:

> Hi,
> I agree with the Martijn, We can reformulate the FLIP to introduce
> termination log as supported pluggable enricher. If you believe the scope
> of work is a subset (Further implementation) we can just add a Jira ticket
> for it. IMO this will also help with implementation taking the existing
> enrichers into reference.
> Best Regards
> Ahmed Hamdy
>
>
> On Tue, 23 Apr 2024 at 15:23, Martijn Visser <martijnvis...@apache.org>
> wrote:
>
> > From a procedural point of view, we shouldn't make FLIPs sub-tasks for
> > existing FLIPs that have been voted/are released. That will only cause
> > confusion down the line. A new FLIP should take existing functionality
> > (like FLIP-304) into account, and propose how to improve on what that
> > original FLIP has introduced or how you're going to leverage what's
> already
> > there.
> >
> > On Tue, Apr 23, 2024 at 11:42 AM ramkrishna vasudevan <
> > ramvasu.fl...@gmail.com> wrote:
> >
> > > Hi Gyula and Ahmed,
> > >
> > > I totally agree that there is an interlap in the final goal that both
> the
> > > FLIPs are achieving here and infact FLIP-304 is more comprehensive for
> > job
> > > failures.
> > >
> > > But as a proposal to move forward can we make Swathi's FLIP/JIRA as a
> sub
> > > task for FLIP-304 and continue with the PR since the main aim is to get
> > the
> > > cluster failure pushed to the termination log for K8s based
> deployments.
> > > And once it is completed we can work to make FLIP-304 to support job
> > > failure propagation to termination log?
> > >
> > > Regards
> > > Ram
> > >
> > > On Thu, Apr 18, 2024 at 10:07 PM Swathi C <swathi.c.apa...@gmail.com>
> > > wrote:
> > >
> > > > Hi Gyula and  Ahmed,
> > > >
> > > > Thanks for reviewing this.
> > > >
> > > > @gyula.f...@gmail.com <gyula.f...@gmail.com> , currently since our
> aim
> > > as
> > > > part of this FLIP was only to fail the cluster when job manager/flink
> > has
> > > > issues such that the cluster would no longer be usable, hence, we
> > > proposed
> > > > only related to that.
> > > > Your right, that it covers only job main class errors, job manager
> run
> > > time
> > > > failures, if the Job manager wants to write any metadata to any other
> > > > system ( ABFS, S3 , ... )  and the job failures will not be covered.
> > > >
> > > > FLIP-304 is mainly used to provide Failure enrichers for job
> failures.
> > > > Since, this FLIP is mainly for flink Job manager failures, let us
> know
> > if
> > > > we can leverage the goodness of both and try to extend FLIP-304 and
> add
> > > our
> > > > plugin implementation to cover the job level issues ( propagate this
> > info
> > > > to the /dev/termination-log such that, the container status reports
> it
> > > for
> > > > flink on K8S by implementing Failure Enricher interface and
> > > > processFailure() to do this ) and use this FLIP proposal for generic
> > > flink
> > > > cluster (Job manager/cluster ) failures.
> > > >
> > > > Regards,
> > > > Swathi C
> > > >
> > > > On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy <hamdy10...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Swathi!
> > > > > Thanks for the proposal.
> > > > > Could you please elaborate what this FLIP offers more than
> > Flip-304[1]?
> > > > > Flip 304 proposes a Pluggable mechanism for enriching Job failures,
> > If
> > > I
> > > > am
> > > > > not mistaken this proposal looks like a subset of it.
> > > > >
> > > > > 1-
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
> > > > >
> > > > > Best Regards
> > > > > Ahmed Hamdy
> > > > >
> > > > >
> > > > > On Thu, 18 Apr 2024 at 08:23, Gyula Fóra <gyula.f...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Swathi!
> > > > > >
> > > > > > Thank you for creating this proposal. I really like the general
> > idea
> > > of
> > > > > > increasing the K8s native observability of Flink job errors.
> > > > > >
> > > > > > I took a quick look at your reference PR, the termination log
> > related
> > > > > logic
> > > > > > is contained completely in the ClusterEntrypoint. What type of
> > errors
> > > > > will
> > > > > > this actually cover?
> > > > > >
> > > > > > To me this seems to cover only:
> > > > > >  - Job main class errors (ie startup errors)
> > > > > >  - JobManager failures
> > > > > >
> > > > > > Would regular job errors (that cause only job failover but not JM
> > > > errors)
> > > > > > be reported somehow with this plugin?
> > > > > >
> > > > > > Thanks
> > > > > > Gyula
> > > > > >
> > > > > > On Tue, Apr 16, 2024 at 8:21 AM Swathi C <
> > swathi.c.apa...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I would like to start a discussion on FLIP-XXX : [Plugin]
> > Enhancing
> > > > > Flink
> > > > > > > Failure Management in Kubernetes with Dynamic Termination Log
> > > > > > Integration.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing
> > > > > > >
> > > > > > >
> > > > > > > This FLIP proposes an improvement plugin and focuses mainly on
> > > Flink
> > > > on
> > > > > > > K8S but can be used as a generic plugin and add further
> > > enhancements.
> > > > > > >
> > > > > > > Looking forward to everyone's feedback and suggestions. Thank
> you
> > > !!
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Swathi Chandrashekar
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to