Re: Solr Cloud Architecture Recommendations

Satya Nand Thu, 08 Sep 2022 23:32:47 -0700

Hi Matthew,

In my experience sharding really slows you down because of all the
> extra network chatter.



Yes, we have also faced the same, But it is not about the cloud we could
never match the response time of our old solr(6.5) with an upgraded
one(8.7,8.10), even without the cloud. 6.5 was always low probably due to
how some graph queries were re-implemented in solr 8.5.
https://lists.apache.org/thread/kbjgztckqdody9859knq05swvx5xj20f

But the cloud has helped us bring the response time down after 85
percentiles. So reduced timeouts.

Do you index continuously or nightly or what?  You should never need
> to optimize.


Our application involves lots of daily updates in data. We regularly update
approx 40-50%(~50 million) and we index it continuously.(15 minutes commit
interval)
earlier we used to optimize with standalone solr to reduce response time.

Check out your cache performance (in JMX or the solr ui) and increase
> those if you index infrequently.  Ideally your entire index should be
> landing in memory.



The are some cache stats on a randomly taken node from the cluster.(8GB
Heap size). Let me know if you find something very wrong. We took the same
configuration from our standalone solr (6.5)



*queryResultCacheclass:org.apache.solr.search.CaffeineCachedescription:Caffeine
Cache(maxSize=30000, initialSize=1000, autowarmCount=100,
regenerator=org.apache.solr.search.SolrIndexSearcher$3@477e8951)*
stats:
CACHE.searcher.queryResultCache.lookups:18315
CACHE.searcher.queryResultCache.cumulative_lookups:14114139
CACHE.searcher.queryResultCache.ramBytesUsed:453880928
CACHE.searcher.queryResultCache.inserts:12747
CACHE.searcher.queryResultCache.warmupTime:11397
CACHE.searcher.queryResultCache.hitratio:0.3576303576303576
CACHE.searcher.queryResultCache.maxRamMB:-1
CACHE.searcher.queryResultCache.cumulative_inserts:9995188
CACHE.searcher.queryResultCache.evictions:0
CACHE.searcher.queryResultCache.cumulative_evictions:83119
CACHE.searcher.queryResultCache.size:11836
CACHE.searcher.queryResultCache.cumulative_hitratio:0.34904764647705394
CACHE.searcher.queryResultCache.cumulative_hits:4926507
CACHE.searcher.queryResultCache.hits:6550





*filterCacheclass:org.apache.solr.search.CaffeineCachedescription:Caffeine
Cache(maxSize=1000, initialSize=300, autowarmCount=100,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@4b97c627)*
stats:
CACHE.searcher.filterCache.hits:254221
CACHE.searcher.filterCache.cumulative_evictions:18495260
CACHE.searcher.filterCache.size:1000
CACHE.searcher.filterCache.maxRamMB:-1
CACHE.searcher.filterCache.hitratio:0.8998527506601443
CACHE.searcher.filterCache.warmupTime:4231
CACHE.searcher.filterCache.evictions:27376
CACHE.searcher.filterCache.cumulative_hitratio:0.9034759627596836
CACHE.searcher.filterCache.lookups:282514
CACHE.searcher.filterCache.cumulative_hits:187752452
CACHE.searcher.filterCache.cumulative_inserts:20058521
CACHE.searcher.filterCache.ramBytesUsed:192294056
CACHE.searcher.filterCache.inserts:28293
CACHE.searcher.filterCache.cumulative_lookups:207811231




*documentCacheclass:org.apache.solr.search.CaffeineCachedescription:Caffeine
Cache(maxSize=25000, initialSize=512, autowarmCount=512, regenerator=null)*
stats:
CACHE.searcher.documentCache.evictions:341795
CACHE.searcher.documentCache.hitratio:0.5356143571564221
CACHE.searcher.documentCache.ramBytesUsed:60603608
CACHE.searcher.documentCache.cumulative_hitratio:0.5356143571564221
CACHE.searcher.documentCache.lookups:789850
CACHE.searcher.documentCache.hits:423055
CACHE.searcher.documentCache.cumulative_hits:423055
CACHE.searcher.documentCache.cumulative_evictions:341795
CACHE.searcher.documentCache.maxRamMB:-1
CACHE.searcher.documentCache.cumulative_lookups:789850
CACHE.searcher.documentCache.size:25000
CACHE.searcher.documentCache.inserts:366795
CACHE.searcher.documentCache.warmupTime:0
CACHE.searcher.documentCache.cumulative_inserts:366795

On Fri, Sep 9, 2022 at 1:43 AM matthew sporleder <[email protected]>
wrote:

> In my experience sharding really slows you down because of all the
> extra network chatter.
>
> Do you index continuously or nightly or what?  You should never need
> to optimize.
>
> Check out your cache performance (in JMX or the solr ui) and increase
> those if you index infrequently.  Ideally your entire index should be
> landing in memory.
>
> On Thu, Sep 8, 2022 at 1:59 AM Satya Nand
> <[email protected]> wrote:
> >
> > Hi All,
> >
> > We have recently moved from solr 6.5 to solr cloud 8.10.
> >
> >
> > *Earlier Architecture:*We were using a master-slave architecture where we
> > had 4 slaves(14 cpu, 96 GB ram, 20 GB Heap, 110 GB index size). We used
> to
> > optimize and replicate nightly.
> >
> > *Now.*
> > We didn't have a clear direction on the number of shards. So we did some
> > POC with variable numbers of shards. We found that with 8 shards we were
> > close to the response time we were getting earlier without using too much
> > infrastructure.
> > Based on our queries we couldn't find a routing parameter so now all
> > queries are being broadcasted to every shard.
> >
> > Now, we have 8+1 solr nodes cluster. Where 1 Indexing node contains
> all(8)
> > NRT Primary shards. This is where all indexing happens. Then We have
> > another 8 nodes each having ( 10 cpu, 42 GB ram,8 GB heap ~23 GB Index)
> > consisting of one pull replica of each primary shard. For querying, we
> have
> > used *shard.preference as PULL *so that all queries are returned from
> pull
> > replicas.
> >
> > Our thought process was that we should have the indexing layer and query
> > layer separate so one does not affect the other.
> >
> > we made it live this week. Though it didn't help in reducing the response
> > time, in fact, we found an increase in average response time. We found a
> > substantial impact on response time after 85 percentile response time, So
> > timeouts reduced significantly.
> >
> > *Now I have a few questions for all the guys who are using solr cloud to
> > help me understand and increase the stability of my cluster. *
> >
> > 1. Were we right to assume to separate indexing and query layer? is it a
> > good idea? or something else could have been done better?  because right
> > now it can affect our cluster stability, if in case replica node is not
> > available then queries will start going to indexing node, which is very
> > weak and it could choke the whole cluster.
> >
> > 2. is there any guideline for the number of shards and shards size?
> >
> > 3. How to decide the ideal number of CPUs to have per node? is there any
> > metric we can follow like load or CPU usage?
> > what should be the ideal CPU usages and load average based on the number
> of
> > CPU ?
> > because our response time increases exponentially with the traffic. 250
> ms
> > to 400 ms in peak hours. Peak hour traffic remains at 2000 requests per
> > minute. cpu usages at 55% and load average at ~6(10 cpu)
> >
> > 4. How do decide the number of nodes based on shards or any other metric?
> > should one increase nodes or CPUs on existing nodes?
> >
> > 5 how to handle dev and stage environments, should we have other smaller
> > clusters or any other approach?
> >
> > 6. Did your infrastructure requirement also increase compared to
> standalone
> > when moving to the cloud, if yes then how much?
> >
> > 7. How do you maintain versioning of config in zookeeper?
> > 8, any performance issue you faced or any other recommendation?
>

Re: Solr Cloud Architecture Recommendations

Reply via email to