Thanks for the important clarification - reading up on UCX, it makes sense to
implement a health check abstraction. A couple follow up musings:
1. "Hosting" vs "data plane" communication protocols. Clearly this isn't
something worth bundling in right now, but it seems to me that health checks
fall under a class of server behavior that is specific to the hosting
environment as opposed to transporting data to the client efficiently. Going
further, a simple non-GRPC HTTP health probe function would provide even more
broad support for the server across multiple hosting platforms (eg, K8s without
an experimental change) without changing any client facing semantics. Of
course, this has added burden to the team maintaining the implementation, but
something to consider/clarify as goals for a reference server implementation.
2. To make sure I understand the compatibility concern, I assume this is only
for the case of applications that have already embedded a "sibling" health
check RPC today due to the lack of built-in FSB implementation and not for
existing FSB derivations that can simply use a built-in implementation on
"upgrade", correct? Given the downsides of the existing workarounds
(essentially the flight server can't easily be taken into consideration when
deciding on OK response), I think it makes sense to have it opt-in within a
minor version to ease transition, though I would think the next major version
would flip the default. It's possible I didn't fully understand this
complication, however, and I also don't feel very strongly (some hosting
environments also have optional health checks, so perhaps it aligns well) due
to being uninformed.
3. RE: not exposing the full breadth, did you mean that of the 2 messages in
the gRPC proto (Check vs Watch) and 2 "branches" in the specified
implementation (empty + specified "service name") there is a clear abstraction
for users to customize the output/response for Check {service=""} vs the other
forms? Or did you mean that the contract itself specified by grpc health check
wouldn't be fulfilled by the server implementation? I think Check {service=""}
is the most impactful by far certainly, but I'm not aware of downsides when
partially implementing a check protocol like that (not sure what hosting
environments might assume the presence of Watch, for example). If you meant
consumers of FSB are only exposed to an abstraction that allows customizing
Check (maybe subset) responses only, but the server implements the full grpc
health protocol with trivial implementations for the rest, I think that makes
total sense.
AK
-----Original Message-----
From: David Li <[email protected]>
Sent: Monday, October 3, 2022 2:45 PM
To: [email protected]
Subject: [EXTERNAL] Re: [C++][Python]Built-in GRPC health checks in
FlightServerBase
[You don't often get email from [email protected]. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
Hi Akshaya,
Thanks for chiming in. At first glance, I think this is a reasonable use case
to cover. However, Flight does try to be independent of gRPC. So maybe the
approach would be to define a "Flight health check" that would map to gRPC
health check when using gRPC, and something else when not (for UCX, presumably
a custom method). And the only other complication would be making this opt-in
to avoid conflicting with existing applications.
That does mean we wouldn't be exposing the full breadth of the gRPC health
check API. But Python would be covered and we could avoid having to redefine
this all the time.
-David
On Mon, Oct 3, 2022, at 17:39, Akshaya Annavajhala (AK) wrote:
> Hello, I was helpfully redirected by David from [ARROW-17920] Built-in
> GRPC health checks in FlightServerBase - ASF JIRA
> (apache.org)<https://nam06.safelinks.protection.outlook.com/?url=https
> %3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-17920&data=05%7
> C01%7Cakannava%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72
> f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SJGSCFvKBOcfjC%2BBjzVQ%2F7XR4AcMjZ1MJKlcH7fP%2Bmg%3D&reserved=0>
> to the mailing list to discuss how a consumer of FlightServerBase might
> meaningfully implement probe endpoints/health checks for production
> environments more easily.
>
> The issue links to a related previous item that adds a C++ example for
> running multiple GRPC servers behind the same
> port<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> Fgithub.com%2Fapache%2Farrow%2Fpull%2F11524&data=05%7C01%7Cakannav
> a%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf86f141a
> f91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> 3000%7C%7C%7C&sdata=GXKLrLbhUNciFcgBptEbcbsxbpIhlq2iaWDjNdu1GAY%3D
> &reserved=0> which I can translate simply into a side-by-side
> health service and FSB in the same application. However, the
> issue<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%
> 2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-14440&data=05%7C01%7Ca
> kannava%40microsoft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf8
> 6f141af91ab2d7cd011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C3000%7C%7C%7C&sdata=%2Fz62q8KzcdQoNOf0zDNXM3gM%2BpLoTtffWwa4
> wlo1UYU%3D&reserved=0> notes this same technique is unavailable to
> python (implementations of
> FlightServerBase) and the separate servers implies a lack of
> comingling logic related to the flight server itself and the health
> check, effectively making the health probes very dumb logic/check box
> implementation.
>
> In the issue I opened, I proposed built-in gRPC health check
> protocol<https://nam06.safelinks.protection.outlook.com/?url=https%3A%
> 2F%2Fgithub.com%2Fgrpc%2Fgrpc%2Fblob%2Fmaster%2Fdoc%2Fhealth-checking.
> md%23grpc-health-checking-protocol&data=05%7C01%7Cakannava%40micro
> soft.com%7C198f81361dae42167d3b08daa588a73c%7C72f988bf86f141af91ab2d7c
> d011db47%7C1%7C0%7C638004303595147983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7
> C%7C&sdata=vkF3vUK6nijaT7xiV%2B5%2F%2BsY5mc5ZbbbmPEJ%2BzBElXjU%3D&
> amp;reserved=0> in the native FSB that would thus apply to a "simple"
> python derivation without modifications as well as perhaps in the
> future allowing the python space to override the implementation
> (similar to do_*) for more advanced use cases that might require
> extensive python space initialization, etc.
>
> I'd love to hear others' thoughts on the high-level
> direction/alternatives on this topic!
>
> Regards,
> AK