Re: Annoying problem when running SolrCloud fully containerized.

Karl Stoney Mon, 21 Jul 2025 23:24:46 -0700

* Running a containerized SolrCloud with embedded ZooKeepers seems to be
impossible.


Most certainly not true, but yes it does require some engineering effort.  We 
(Autotrader UK) run solr and zookeeper on Kubernetes and have done for circa 7 
years without any major issues.  We are a high read and write setup which is 
challenging, but we’ve managed it.  For what it’s worth we run on GCP and use 
Hyperdisks to be able to scale the IOPS and Throughput of our disks 
independently to the Capacity.  We also run on ARM nodes (C4A’s).

We wrote our own helm chart however, we do not use off the shelf charts or 
operators, this gives us complete control over all aspects of the deployment.  
We run 9 solr replicas and 5 zookeeper replicas per cluster.  Ensure you have 
anti affinity rules to ensure pods don’t run on the same nodes and leverage pod 
topological constraints to ensure distribution across available zones, and as 
others have said, run ZK on different nodes to Solr.  We also have a PDB in 
place that spreads both solr and zk to control only one pod across the whole 
lot being offline at any time.  This makes rollouts slower, but safer.  Also 
our CICD pipelines deploy ZK first, then solr.

When it comes to ZK hostnames and ip’s, you need to ensure the ip’s don’t 
change.  To do this create a service for each pod (let’s say you have 5 zk 
nodes) and use the statefulset selector statefulset.kubernetes.io/pod-name: 
zookeeper-X.  That way you always have zookeeper-X.namespace.svc.cluster.local 
which resolves to a static ip (the service) which routes to the pod (dynamic).  
You can then predictably template your ZK_HOSTS.

Hope this helps you to get going, it’s completely possible.

From: Gus Heck <gus.h...@gmail.com>
Date: Tuesday, 22 July 2025 at 00:53
To: users@solr.apache.org <users@solr.apache.org>
Subject: Re: Annoying problem when running SolrCloud fully containerized.

At present embedded zookeeper is "supported" only for initial testing,
basically to ease the running of tutorials. It's not designed to form
clusters that provide redundancy, nor is there much thought put into
facilitating management of its data store or securing it from unwanted
access.

There are some folks who would like to move it to a more scalable level of
support, but that has not come to pass yet. (see
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FSOLR%2FSIP-14%2BEmbedded%2BZookeeper&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C5edb7af0bee149a82b4a08ddc8b1d29f%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638887388263035095%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=b%2BEeijGFUHfsRwykyX2F6FkHiRtID9H8nEqnqfe2M1k%3D&reserved=0)<https://cwiki.apache.org/confluence/display/SOLR/SIP-14+Embedded+Zookeeper>

The strong recommendation is that you use an external zookeeper in
production.

In a kubernetes context, this should mean maintaining a stateful set for a
Zk cluster, and maintaining a separate stateful set for the Solr cluster.
The Solr nodes should be provisioned with the current zookeeper string on
start. How to communicate that string is a concern dictated by how you
manage your systems, and ideally for ease of management it will be similar
to the way you communicate the location of solr (or your database) to your
user facing applications or other downstream systems. The good news is that
you can usually provision smaller, cheaper instances for your zookeeper
cluster, so the cost of doing this is not high (relative to the cost of
hardware/disk for Solr), just don't completely ignore the zookeeper docs
regarding performance, and of course monitor it like any other system.

There certainly are folks who've run small single node systems using
embedded zk successfully, but usually not in mission critical or near zero
down time scenarios.

In a non-kubernetes environment I've also known folks less concerned about
uptime, and not facing strong performance requirements to put Zk on some of
the machines used by solr, but this is not a high performance, highly
available configuration. Neither Zk nor Solr like competition from
other services for disk or memory. I definately wouldn't try this in
kubernetes.

Some places who use zookeeper for other things (i.e kafka) have larger
beefier existing zookeeper clusters that just get a Solr zk root added to
them, usual caveats about system interdependence and making sure competing
apps can't stomp on each other apply of course.

-Gus

--
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.needhamsoftware.com%2F&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C5edb7af0bee149a82b4a08ddc8b1d29f%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638887388263052961%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Uqh3P2rfJYTMvd%2BP7eRkTZUTaViH3UB3VWr7zmRc%2BEE%3D&reserved=0<http://www.needhamsoftware.com/>
 (work)
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fa.co%2Fd%2Fb2sZLD9&data=05%7C02%7CKarl.Stoney%40autotrader.co.uk%7C5edb7af0bee149a82b4a08ddc8b1d29f%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C638887388263062215%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=YG9gPcNlD554BuOU%2Fj0CJKd2ir8OfdhMV%2FgugRqiJTU%3D&reserved=0<https://a.co/d/b2sZLD9>
 (my fantasy fiction book)


Unless expressly stated otherwise in this email, this e-mail is sent on behalf 
of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, 
Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited 
is part of the Auto Trader Group Plc group. This email and any files 
transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

Re: Annoying problem when running SolrCloud fully containerized.

Reply via email to