Hey Vladimir, Thanks for raising this thread. I'm also reluctant to add this to the application layer. We would also need to support this with the other clients that are out there. Did you give JB's suggestion around the PoolingHttpClientConnectionManager a try?
Kind regards, Fokko Op di 17 dec 2024 om 23:37 schreef Vladimir Ozerov <voze...@querifylabs.com >: > Hi Jean, > > Thanks for the response, I agree with all points. > > For reference, you mentioned Apache Ignite - I worked on it for many > years, and used to be an active committer/PMC there. This project is a very > good example of how multiple failures to keep the complexity under control > significantly slowed down adoption and development pace for vet time. So > bringing the unnecessary complexity is the last thing I’d like to advocate > for. > > My questions about HA appeared because my colleagues and I developed a > REST catalog for our needs (hence some other REST related spam from myself > on the dev list). Then we started working on its integration into several > hardware appliances that have strict fault-tolerance requirements for all > software and hardware components. We started thinking about various > solutions, from simple active-active instances like in HMS to a > fully-fledged synchronous replication with RAFT, etc. But irrespective of > the approach, the same blocker appeared over and over again - how to let > the client application know that multiple catalog instances exist? As > explained above, a separate proxy in front of a catalog client is not a > desired solution, because it requires separate HA considerations. > > Regarding your questions - it is true that many additional considerations > may appear over time, but Iceberg community do not have to accept them all. > My proposal is no different - if we think this idea is not appropriate at > this point, it is rejected. > > Speaking of concrete behaviour, I think that HMS is a good reference > point. It is the most popular catalog in the world, powering all sorts of > critical analytical infrastructure around the globe for more than a decade. > Yet, it offers only two configuration properties for HA: (1) the list of > URIs, and (2) URI selection strategy - random or sequential. This implies > that throughout all these years there were no sufficient business demand to > introduce more sophisticated config options. Business demand is not always > aligned with real user expectations, but still. > > That said, there is a good chance that just bringing these two small > pieces of config to the official Iceberg library (so that all engines can > use it seamlessly) can cover most of the practical cases for many years. > > Moreover, if there are some HMS-like stateless catalog implementations > already (i.e., just delegating to storage, no caching, proper JWT > management, whatsoever), their users will be able to add HA right away with > minimal efforts and no catalog code changes. Though, I am not sure whether > popular catalogs like Polaris or Unity fall into this category. > > If there will be a hint of consensus that this feature is at least worth > trying, I can create and demonstrate a prototype. > > Regards, > *Vladimir Ozerov* > > Вт, 17 дек. 2024 г. в 16:16, Jean-Baptiste Onofré <j...@nanthrax.net>: > >> Hi Vladimir >> >> As I said in my previous email, I can already "inject" the >> PoolingHttpClientConnectionManager in the client. So, technically >> speaking, I think it's do-able. >> So, we can always document how to use that with several endpoints. >> >> I understand your points and they make sense. However, implementing >> several endpoints could quickly become a little complex: >> - are you using ordered list endpoint selection (like the first one in >> the list doesn't work, you try the second one) ? >> - are you using round-robin or random on the endpoints list ? >> - are you using weight base balancing or priority backup ? >> - do you want to use exponential backoff when an endpoint is not >> available before selecting the next available one ? >> - static discovery (list) is the first option but some might want to >> have other discovery mechanisms (powered by Apache Ignite, Hazelcast, >> K8S pods, ...) >> The endpoint selection logic can be very different depending on the needs. >> A possible approach is to ship a HA wrapper the REST Client (like a >> sidecar): the Iceberg REST client connect to localhost:xx and >> localhost:xx is actually proxying to the endpoints (something similar >> to https://karaf.apache.org/manual/latest/webcontainer, see "HTTP >> Proxy" and balancing sections). >> >> All to say that there's not really complex infrastructure to do around >> the Iceberg REST client, we can wrap/enrich it. Don't get me wrong: >> I'm not against having a "simple" Iceberg REST client plugin to deal >> with several endpoints. My concern is that it can evolve from "simple" >> to "complex" (depending on the user needs). >> >> Regards >> JB >> >> On Tue, Dec 17, 2024 at 11:45 AM Vladimir Ozerov >> <voze...@querifylabs.com> wrote: >> > >> > Hi, >> > >> > Thank you for the feedback. I understand the concerns about adding more >> and more features to the protocol, especially if they might be implemented >> elsewhere. And every added bit of complexity should have clear cost/benefit >> ratio. >> > >> > Iceberg is becoming the de-facto standard for multiple workloads around >> the world. In 2018 it was about huge companies. Now even relatively small >> businesses may benefit from it. Their infrastructure may differ >> dramaticallly: from fully-fledged clouds to small on-premises deployments. >> Thus lowering the bar for product adoption is important. >> > >> > Catalog is an integral part of Iceberg, even though we suppose that it >> is a third-party application. Unlike metadata layers in monolithic systems, >> catalog unavailability in lakehouses mean that all you data workload is >> effectively paralysed: no ETL from Spark, no ad-hoc analytics from Trino, >> no streaming ingestion from Kafka. Catalog becomes a huge single point of >> failure, so it becomes an important question - who is responsible for >> catalog HA now? >> > >> > It is true, that the problem can be solved elsewhere. E.g., one can set >> a load-balancer in front of catalog instances. But now you need HA for the >> load balancer itself, which often requires for additional infrastructure. >> So, yes, HA can be moved to another layer, but it doesn’t implies that this >> is a preferable approach. >> > >> > Hive Metastore is the most popular catalog in the world. It doesn’t >> think that HA is not its problem. At the client level you can provide >> multiple metastore URLs and also define the URL selection strategy ; >> > (RANDOM, SEQUENTIAL). See [1]. This allows engines work with HA HMS >> without intermediaries. And from the client perspective the configuration >> is trivial. >> > >> > My proposal is to look for a simple yet sufficient solution at the >> Iceberg library level to allow for REST catalog HA without relying to other >> products. Looks like this could be done pretty much similarly to HMS client. >> > >> > The minimal required change is to allow multiple URLs for REST Catalog, >> and a relatively small change to the RESTSessionCatalog to switch between >> URLs in the case a special error is returned. In its simplest form it could >> be the receipt of 301 error code or so. >> > >> > This doesn’t seem like a serious complication, but will let REST >> catalog developers build implementations that are on par with HMS The n >> terms of HA. >> > >> > WDYT? >> > >> > [1] >> > >> https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-cdh/topics/hive-hms-ha-configuration.html >> > >> > Vladimir Ozerov >> > >> > Вт, 10 дек. 2024 г. в 00:57, Yufei Gu <flyrain...@gmail.com>: >> >> >> >> Load balancing operates at a different layer than APIs, with various >> implementations available, such as etcd and Zookeeper. I’d prefer to avoid >> introducing additional complexity at the web service API level. >> >> >> >> >> >> Yufei >> >> >> >> >> >> On Mon, Dec 9, 2024 at 8:35 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> >> >>> Hi Vladimir, >> >>> >> >>> As you said, today, it's possible to use a LB in front of multiple >> >>> instances (using nginx, ELB, ...). >> >>> I think it's pretty easy to setup and at "infrastructure" level. >> >>> >> >>> As it's possible to plug the HTTP5 client in Iceberg REST client, I >> >>> think it's possible to inject PoolingHttpClientConnectionManager with >> >>> multiple routes/host. >> >>> >> >>> I'm still prefer to define this at "infra" level (to decouple >> >>> endpoints definition from the pure client). >> >>> >> >>> Regards >> >>> JB >> >>> >> >>> >> >>> On Mon, Dec 9, 2024 at 5:10 PM Vladimir Ozerov < >> voze...@querifylabs.com> wrote: >> >>> > >> >>> > Hi, >> >>> > >> >>> > Catalog is a critical part of Iceberg infrastructure and may >> require highly available setup. In similar services (e.g., HMS, etc) this >> is often done as follows: >> >>> > >> >>> > Start several service instances >> >>> > Decide which one is coordinator via etcd, Zookeper, Ratis, etc >> >>> > Expose HA endpoint to a client: multiple endpoints OR a single >> endpoint via proxy >> >>> > >> >>> > Currently, there is no way to expose multiple endpoints to a REST >> server. This may work in some cases if you hide multiple REST server >> instances behind a proxy/balancer. But this proxy requires own HA setup, >> which complicates the overall deployment. >> >>> > >> >>> > I'd like to ask the community whether we can extend REST >> specification with multiple endpoints to support HA REST catalog without >> proxies. This extension could contain two essential parts: >> >>> > >> >>> > How to provide multiple endpoints to RESTSessionCatalog. This could >> be encoded into URL or as additional property >> >>> > Some additional headers and/or error codes to allow a REST server >> instance communicate peer endpoints and the current coordinator >> >>> > >> >>> > A very rough example how this can work in practice. Configuration: >> >>> > >> >>> > mycatalog.endpoint=https://host1 >> >>> > mycatalog.endpoints=https://host1,https://host2,https://host3 >> >>> > >> >>> > REST server response headers with additional endpoints: >> >>> > >> >>> > X-HA-Coordinator: https://host2 >> >>> > X-HA-Endpoints: https://host1,https://host2,https://host3 >> >>> > >> >>> > WDYT? >> >>> > >> >>> > Regards, >> >>> > Vladimir >> >