Re: [hackathon] health checks

Andrei Dulvac Wed, 19 Sep 2018 01:38:26 -0700

Hi guys.

So first of all I acknowledge that conceptually there is some overlap - the
concepts of health, readiness, liveness themselves overlap.
When we wrote systemready, we did know of the Sling HCs (at least I did)
and how they're used. And that was one of the reasons why we decided not to
use them.


They're currently used, as Justin put it, for a much broader scope. A
system can fail a HC and it doesn't mean it's not ready. In one of
Bertrand's adapt.to presentation from 2013 [0], a security checklist is
mentioned explicitly - which we use in AEM. It's one of those things that
requires manual input to turn healthy. The docu also mentions a lot of
stuff, including susing them as serverside Junit tests [1]. And all those
things are great.

> Now the system readyness framework was mostly created to have something
on Felix level and the capabilities of the Sling Health Checks weren’t
known.

Not entirely accurate. We knew of the sling HCs and initially we wanted to
donate systemready to sling; but it's definitely good it went into felix.

> The dependencies of Sling HC to Sling are minimal today already: It’s
Sling thread pool (a felix pendant or just a plain java one can be used)
and Sling Scheduler (also this can easily be replaced by the standard java
mechanism).

In my opinion, that's A LOT. And they're prefixed by "Sling-". Systemready
has two dependencies: javax.servlet and the osgi API. And it can
technically run on any framewok. The deps were another reason why we didn't
use the HCs. Of course, those might grow as it becomes more mature.


> What would make sense is a bridge where a subset of health checks could
be fed into the readyness framework (i.e. if these X health checks pass,
the system is considered "ready" and/or "alive").

> (you just create two tags for readiness and liveness each).

These don't seem to contradict each other.
Stefan, did you mean that the SystemReady checks would also become some
tagged HCs or the other way around? That some tagged HCs would be fed into
systemready?

So I'm game for unifying a bit at the felix level and hopefully we don't go
overboard. I alone just don't have a solution yet that I can say I love
100%.

BTW, Sorry I couldn't make it to the hackathon, it would have been great to
be part of the discussion.

- Andrei




---
[0] https://adapt.to/2013/en/schedule/18_healthcheck.html
[1]
https://sling.apache.org/documentation/bundles/sling-health-check-tool.html#health-checks-as-server-side-junit-tests


On Wed, Sep 19, 2018 at 1:15 AM Justin Edelson <jus...@justinedelson.com>
wrote:

> Hi Georg,
> Great. It looks like I misread Stefan's notes as being more dramatic than
> they actually were intended to be :)
>
> Regards,
> Justin
>
> On Tue, Sep 18, 2018 at 4:48 PM Georg Henzler <slin...@ghenzler.de> wrote:
>
> > Hi Justin,
> >
> > there was quite some discussion at adaptTo() around this topic already.
> So
> > as it stands all requirements to run Sling-based applications in
> Kubernetes
> > are met already by Sling Health Checks (you just create two tags for
> > readiness and liveness each). HCs were developed from the first day with
> > the goal to have them used by load balancers (and not only manual). Also
> > Sling HCs are more mature in terms of parallel execution, timeout
> handling,
> > response customizing and special handling like asynchronous checks.
>
>
> > Now the system readyness framework was mostly created to have something
> on
> > Felix level and the capabilities of the Sling Health Checks weren’t
> known.
> >
> > I do agree that it would make sense to have it on Felix level though
> (more
> > visible to the non-Sling world, as a low level mechanism maybe best
> located
> > at the lowest framework level). The dependencies of Sling HC to Sling are
> > minimal today already: It’s Sling thread pool (a felix pendant or just a
> > plain java one can be used) and Sling Scheduler (also this can easily be
> > replaced by the standard java mechanism).
>
>
> > > It might make more sense to invert this and identify what the readyness
> > framework does (mostly in its OOTB checks and servlets)
> > > and merge that functionality into Sling Health Checks and then move
> Sling
> > > Health Checks (or solid chunks of it) to Felix.
> >
> > This was the intention, but let’s wait for the feedback from Andrei and
> > Christian.
> >
> > -Georg
> >
> > Sent from my iPhone
> >
> > > On 18. Sep 2018, at 16:31, Justin Edelson <jus...@justinedelson.com>
> > wrote:
> > >
> > > Hi,
> > > After reviewing the presentation, this seems like kind of a stretch to
> > me.
> > > IIUC, the System Readyness Framework is (as its name would suggest)
> > solely
> > > concerned with "readyness"  and "liveness" (as seen in the example use
> > > cases on slide 3) and the API is explicitly designed for this purpose
> > > without any opportunity for namespace extension (i.e. you can extend
> how
> > > "readyness" and "liveness" are determined but you can't add new
> > > categories). Sling Health Checks is concerned with a broader concept of
> > > "health" with no restrictions on namespacing. There are all kinds of
> > > reasons why a system may be considered "ready" but still fails specific
> > > health checks. In other words, I'm doubtful that there is an overlap
> here
> > > at a framework level. What would make sense is a bridge where a subset
> of
> > > health checks could be fed into the readyness framework (i.e. if these
> X
> > > health checks pass, the system is considered "ready" and/or "alive").
> But
> > > I'd strongly suggest that the gamut of expression possible with the
> > health
> > > check framework goes far beyond the scope of what the readyness
> framework
> > > is designed to do. It might make more sense to invert this and identify
> > > what the readyness framework does (mostly in its OOTB checks and
> > servlets)
> > > and merge that functionality into Sling Health Checks and then move
> Sling
> > > Health Checks (or solid chunks of it) to Felix.
> > >
> > > Or perhaps I've misunderstood the intention of this email/F2F
> discussion.
> > > But the way this looks is that we are going to take something with a
> > decent
> > > install base and replace it with something a few months old and a much
> > > smaller functional scope. Just doesn't make sense to me.
> > >
> > > Regards,
> > > Justin
> > >
> > > On Thu, Sep 13, 2018 at 1:03 PM Stefan Seifert <sseif...@pro-vision.de
> >
> > > wrote:
> > >
> > >> - currently there is some overlap between sling health checks and the
> > new
> > >> felix system readyness framework presented [1]
> > >> - the idea is to bring this together within felix
> > >> - provide a facade for the sling healthcheck API for backwards
> > >> compatibility
> > >>
> > >> stefan
> > >>
> > >> [1]
> > >>
> >
> https://adapt.to/2018/en/schedule/system-readiness-framework-makes-deployment-automation-a-breeze.html
> > >>
> > >>
> > >>
> >
>

Re: [hackathon] health checks

Reply via email to