Hi all, I'm currently migrating some Spark apps to use latest SolrJ client version and I'm wondering what is the best way to configure it. We currently have some Cloud clusters and we historically use zookeeper host to connect to the clusters.
After reading some docs/mails, I seemed to understand that zookeeper connections is somewhat discouraged for security reasons. Instead the client can use Solr urls and create an Http2ClusterState with it. And it works pretty well so far. My questions are : - Will it get the same performances (the documentation mention that "The ZooKeeper based connection is the most reliable and performant means for CloudSolrClient to work") ? - If my zookeeper cluster is not exposed anywhere publicly and is restricted (from network POV) to my usage only, do I have any interest to switch to Solr urls instead of zookeeper hosts ? - In case using urls is better, do I need to add all solr urls to the client (which may grow the spark apps params list quickly) ? Or only a subet of them is enough ? Hi all, I'm currently migrating some Spark applications to use the latest SolrJ client version, and I’m wondering what the best way to configure it. We currently have several SolrCloud clusters and we’ve historically been using ZooKeeper hosts to connect to them. After reading some documentation and previous discussions, it seems that connecting directly to ZooKeeper is now somewhat discouraged for security reasons. Instead the client can use Solr URLs and create an Http2ClusterState, which appears to work quite well so far. I have a few questions: - Will this approach provide the same performance? (The documentation mentions that “the ZooKeeper-based connection is the most reliable and performant means for CloudSolrClient to work.”) - If my ZooKeeper cluster is not publicly exposed and is network-restricted to my usage only, is there any real benefit in switching to Solr URLs instead of ZooKeeper hosts? - If using Solr URLs is indeed preferred, should I configure all Solr URLs in the client (which might quickly increase the Spark app parameter list), or is a subset sufficient (like 2/3 nodes of each AZ) ? Thanks in advance for your help! Best regards, -- Guillaume
