actually not using HDFSDirectory, it’s a leftover in the config from some previous tests.
I don’t see anything in the logs related to maxWarmingSearchers, nor other errors/warnings show in the logs. I tried to reduce maxWarmingSearchers to 3 and increased the Hard commit maxTime to 2mins, the results improved significantly, from ~350ms p99 to ~210ms p99, which is still higher than NRT result, but better than it was. I also tried with only TLOG replicas, and the results are more or less the same, ~340ms p99 and ~110ms p95. So, both are slower, TLOG + PULL and TLOG only. > On 11. Jun 2021, at 5:28 PM, Mike Drob <md...@apache.org> wrote: > > Are you using HDFSDirectory to serve your indices? I noticed that > tlogDfsReplication is set, so that's why I'm asking. > > 8 maxWarmingSearchers is very high, typically that value is 2 or maybe 4, but > you would know if this was an issue by looking at your logs. > > I'm assuming that you had 30 NRT replicas before? If you had fewer, then your > tail latencies might be higher because you're seeing cache misses on the > queries. Do you have metrics on the response times for TLOG v PULL? Are they > both slower, or just one? > > Mike > > On 2021/06/11 12:55:31, Nick Vladiceanu <vladicean...@gmail.com> wrote: >> hello, >> I’m facing some performance issues when moving from NRT replica types to >> TLOG + PULL. We’re constantly indexing new data and heavily querying (~2k >> rps). >> >> - index size is ~ 2.5Gi; >> - number of docs ~4.6M; >> - 2 shards; >> - 7 cores and 14Gi of memory >> - 30 instances >> - JVM Heap is 12Gi >> >> When running on NRT only, the response time in avg is ~150ms p99 and 40ms >> p95. When changing to TLOG (6 tlog replicas) + 24 PULL, the response time >> grows to ~350ms p99 and 120ms p95. >> >> Here are some fragments from our solrconfig: >> >> >>> <updateHandler class="solr.DirectUpdateHandler2"> >>> <updateLog> >>> <str name="dir">${solr.data.dir:}</str> >>> <int >>> name="tlogDfsReplication">${solr.ulog.tlogDfsReplication:3}</int> >>> </updateLog> >>> >>> <autoCommit> >>> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> >>> <maxDocs>${solr.autoCommit.maxDocs:10000}</maxDocs> >>> <openSearcher>true</openSearcher> >>> </autoCommit> >>> >>> <autoSoftCommit> >>> <maxTime>${solr.autoSoftCommit.maxTime:300000}</maxTime> >>> </autoSoftCommit> >>> </updateHandler> >> >>> <query> >>> <maxBooleanClauses>1000</maxBooleanClauses> >>> <filterCache class="solr.CaffeineCache" >>> size="${filterCache.size:32768}" >>> initialSize="${filterCache.initialSize:32768}" >>> autowarmCount="20%"/> >>> >>> <queryResultCache class="solr.CaffeineCache" >>> size="${queryResultCache.size:32768}" >>> initialSize="${queryResultCache.initialSize:32768}" >>> autowarmCount="0%"/> >>> >>> <documentCache class="solr.CaffeineCache" >>> size="${documentCache.size:150000}" >>> initialSize="${documentCache.initialSize:150000}" >>> autowarmCount="0%"/> >>> >>> <enableLazyFieldLoading>true</enableLazyFieldLoading> >>> <useFilterForSortedQuery>true</useFilterForSortedQuery> >>> >>> <queryResultWindowSize>160</queryResultWindowSize> >>> <queryResultMaxDocsCached>300</queryResultMaxDocsCached> >>> >>> <listener event="newSearcher" class="solr.QuerySenderListener"> >>> </listener> >>> <listener event="firstSearcher" class="solr.QuerySenderListener"> >>> </listener> >>> >>> <useColdSearcher>false</useColdSearcher> >>> <maxWarmingSearchers>8</maxWarmingSearchers> >>> </query> >> >> One of my assumption was to reduce the maxWarmingSearchers and to increase >> the autoCommit maxTime, since the softCommit isn’t available anymore in TLOG >> replicas. Is that valid? >> >> I couldn’t find any documents with the differences/considerations we need to >> take into account between NRT and TLOG, could you please help? Thanks a lot >> in advance. Please let me know if there is anything else required. >> >> Best regards, >> Nick Vladiceanu