Hi all,

I'm currently migrating some Spark apps to use latest SolrJ client version
and I'm wondering what is the best way to configure it.
We currently have some Cloud clusters and we historically use zookeeper
host to connect to the clusters.

After reading some docs/mails, I seemed to understand that zookeeper
connections is somewhat discouraged for security reasons.
Instead the client can use Solr urls and create an Http2ClusterState with
it.
And it works pretty well so far.

My questions are :
- Will it get the same performances (the documentation mention that "The
ZooKeeper based connection is the most reliable and performant means for
CloudSolrClient to work") ?
- If my zookeeper cluster is not exposed anywhere publicly and is
restricted (from network POV) to my usage only, do I have any interest to
switch to Solr urls instead of zookeeper hosts ?
- In case using urls is better, do I need to add all solr urls to the
client (which may grow the spark apps params list quickly) ? Or only a
subet of them is enough ?

Hi all,

I'm currently migrating some Spark applications to use the latest SolrJ
client version, and I’m wondering what the best way to configure it.

We currently have several SolrCloud clusters and we’ve historically been
using ZooKeeper hosts to connect to them.
After reading some documentation and previous discussions, it seems that
connecting directly to ZooKeeper is now somewhat discouraged for security
reasons.
Instead the client can use Solr URLs and create an Http2ClusterState, which
appears to work quite well so far.

I have a few questions:
- Will this approach provide the same performance? (The documentation
mentions that “the ZooKeeper-based connection is the most reliable and
performant means for CloudSolrClient to work.”)
- If my ZooKeeper cluster is not publicly exposed and is network-restricted
to my usage only, is there any real benefit in switching to Solr URLs
instead of ZooKeeper hosts?
- If using Solr URLs is indeed preferred, should I configure all Solr URLs
in the client (which might quickly increase the Spark app parameter list),
or is a subset sufficient (like 2/3 nodes of each AZ) ?

Thanks in advance for your help!

Best regards,

-- 
Guillaume

Reply via email to