* Running a containerized SolrCloud with embedded ZooKeepers seems to be impossible.
Most certainly not true, but yes it does require some engineering effort. We (Autotrader UK) run solr and zookeeper on Kubernetes and have done for circa 7 years without any major issues. We are a high read and write setup which is challenging, but we’ve managed it. For what it’s worth we run on GCP and use Hyperdisks to be able to scale the IOPS and Throughput of our disks independently to the Capacity. We also run on ARM nodes (C4A’s). We wrote our own helm chart however, we do not use off the shelf charts or operators, this gives us complete control over all aspects of the deployment. We run 9 solr replicas and 5 zookeeper replicas per cluster. Ensure you have anti affinity rules to ensure pods don’t run on the same nodes and leverage pod topological constraints to ensure distribution across available zones, and as others have said, run ZK on different nodes to Solr. We also have a PDB in place that spreads both solr and zk to control only one pod across the whole lot being offline at any time. This makes rollouts slower, but safer. Also our CICD pipelines deploy ZK first, then solr. When it comes to ZK hostnames and ip’s, you need to ensure the ip’s don’t change. To do this create a service for each pod (let’s say you have 5 zk nodes) and use the statefulset selector statefulset.kubernetes.io/pod-name: zookeeper-X. That way you always have zookeeper-X.namespace.svc.cluster.local which resolves to a static ip (the service) which routes to the pod (dynamic). You can then predictably template your ZK_HOSTS. Hope this helps you to get going, it’s completely possible. From: Gus Heck <gus.h...@gmail.com> Date: Tuesday, 22 July 2025 at 00:53 To: users@solr.apache.org <users@solr.apache.org> Subject: Re: Annoying problem when running SolrCloud fully containerized. At present embedded zookeeper is "supported" only for initial testing, basically to ease the running of tutorials. It's not designed to form clusters that provide redundancy, nor is there much thought put into facilitating management of its data store or securing it from unwanted access. There are some folks who would like to move it to a more scalable level of support, but that has not come to pass yet. (see https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FSOLR%2FSIP-14%2BEmbedded%2BZookeeper&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C5edb7af0bee149a82b4a08ddc8b1d29f%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638887388263035095%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=b%2BEeijGFUHfsRwykyX2F6FkHiRtID9H8nEqnqfe2M1k%3D&reserved=0)<https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper> The strong recommendation is that you use an external zookeeper in production. In a kubernetes context, this should mean maintaining a stateful set for a Zk cluster, and maintaining a separate stateful set for the Solr cluster. The Solr nodes should be provisioned with the current zookeeper string on start. How to communicate that string is a concern dictated by how you manage your systems, and ideally for ease of management it will be similar to the way you communicate the location of solr (or your database) to your user facing applications or other downstream systems. The good news is that you can usually provision smaller, cheaper instances for your zookeeper cluster, so the cost of doing this is not high (relative to the cost of hardware/disk for Solr), just don't completely ignore the zookeeper docs regarding performance, and of course monitor it like any other system. There certainly are folks who've run small single node systems using embedded zk successfully, but usually not in mission critical or near zero down time scenarios. In a non-kubernetes environment I've also known folks less concerned about uptime, and not facing strong performance requirements to put Zk on some of the machines used by solr, but this is not a high performance, highly available configuration. Neither Zk nor Solr like competition from other services for disk or memory. I definately wouldn't try this in kubernetes. Some places who use zookeeper for other things (i.e kafka) have larger beefier existing zookeeper clusters that just get a Solr zk root added to them, usual caveats about system interdependence and making sure competing apps can't stomp on each other apply of course. -Gus -- https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.needhamsoftware.com%2F&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C5edb7af0bee149a82b4a08ddc8b1d29f%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638887388263052961%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Uqh3P2rfJYTMvd%2BP7eRkTZUTaViH3UB3VWr7zmRc%2BEE%3D&reserved=0<http://www.needhamsoftware.com/> (work) https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fa.co%2Fd%2Fb2sZLD9&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C5edb7af0bee149a82b4a08ddc8b1d29f%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638887388263062215%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=YG9gPcNlD554BuOU%2Fj0CJKd2ir8OfdhMV%2FgugRqiJTU%3D&reserved=0<https://a.co/d/b2sZLD9> (my fantasy fiction book) Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.