Re: Migration from NRT to TLOG performance issues

Nick Vladiceanu Fri, 11 Jun 2021 08:38:11 -0700

actually not using HDFSDirectory, it’s a leftover in the config from some 
previous tests.


I don’t see anything in the logs related to maxWarmingSearchers, nor other 
errors/warnings show in the logs. I tried to reduce maxWarmingSearchers to 3 
and increased the Hard commit maxTime to 2mins, the results improved 
significantly, from ~350ms p99 to ~210ms p99, which is still higher than NRT 
result, but better than it was.

I also tried with only TLOG replicas, and the results are more or less the 
same, ~340ms p99 and ~110ms p95. So, both are slower, TLOG + PULL and TLOG 
only. 



> On 11. Jun 2021, at 5:28 PM, Mike Drob <md...@apache.org> wrote:
> 
> Are you using HDFSDirectory to serve your indices? I noticed that 
> tlogDfsReplication is set, so that's why I'm asking.
> 
> 8 maxWarmingSearchers is very high, typically that value is 2 or maybe 4, but 
> you would know if this was an issue by looking at your logs.
> 
> I'm assuming that you had 30 NRT replicas before? If you had fewer, then your 
> tail latencies might be higher because you're seeing cache misses on the 
> queries. Do you have metrics on the response times for TLOG v PULL? Are they 
> both slower, or just one?
> 
> Mike
> 
> On 2021/06/11 12:55:31, Nick Vladiceanu <vladicean...@gmail.com> wrote: 
>> hello,
>> I’m facing some performance issues when moving from NRT replica types to 
>> TLOG + PULL. We’re constantly indexing new data and heavily querying (~2k 
>> rps).
>> 
>> - index size is ~ 2.5Gi;
>> - number of docs ~4.6M;
>> - 2 shards;
>> - 7 cores and 14Gi of memory
>> - 30 instances
>> - JVM Heap is 12Gi
>> 
>> When running on NRT only, the response time in avg is ~150ms p99 and 40ms 
>> p95. When changing to TLOG (6 tlog replicas) + 24 PULL, the response time 
>> grows to ~350ms p99 and 120ms p95.
>> 
>> Here are some fragments from our solrconfig:
>> 
>> 
>>>    <updateHandler class="solr.DirectUpdateHandler2">
>>>        <updateLog>
>>>            <str name="dir">${solr.data.dir:}</str>
>>>            <int 
>>> name="tlogDfsReplication">${solr.ulog.tlogDfsReplication:3}</int>
>>>        </updateLog>
>>> 
>>>        <autoCommit>
>>>            <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
>>>            <maxDocs>${solr.autoCommit.maxDocs:10000}</maxDocs>
>>>            <openSearcher>true</openSearcher>
>>>        </autoCommit>
>>> 
>>>        <autoSoftCommit>
>>>            <maxTime>${solr.autoSoftCommit.maxTime:300000}</maxTime>
>>>        </autoSoftCommit>
>>>    </updateHandler>
>> 
>>>     <query>
>>>        <maxBooleanClauses>1000</maxBooleanClauses>
>>>        <filterCache class="solr.CaffeineCache"
>>>                     size="${filterCache.size:32768}"
>>>                     initialSize="${filterCache.initialSize:32768}"
>>>                     autowarmCount="20%"/>
>>> 
>>>        <queryResultCache class="solr.CaffeineCache"
>>>                          size="${queryResultCache.size:32768}"
>>>                          initialSize="${queryResultCache.initialSize:32768}"
>>>                          autowarmCount="0%"/>
>>> 
>>>        <documentCache class="solr.CaffeineCache"
>>>                       size="${documentCache.size:150000}"
>>>                       initialSize="${documentCache.initialSize:150000}"
>>>                       autowarmCount="0%"/>
>>> 
>>>        <enableLazyFieldLoading>true</enableLazyFieldLoading>
>>>        <useFilterForSortedQuery>true</useFilterForSortedQuery>
>>> 
>>>        <queryResultWindowSize>160</queryResultWindowSize>
>>>        <queryResultMaxDocsCached>300</queryResultMaxDocsCached>
>>> 
>>>        <listener event="newSearcher" class="solr.QuerySenderListener">
>>>        </listener>
>>>        <listener event="firstSearcher" class="solr.QuerySenderListener">
>>>        </listener>
>>> 
>>>        <useColdSearcher>false</useColdSearcher>
>>>        <maxWarmingSearchers>8</maxWarmingSearchers> 
>>>    </query>
>> 
>> One of my assumption was to reduce the maxWarmingSearchers and to increase 
>> the autoCommit maxTime, since the softCommit isn’t available anymore in TLOG 
>> replicas. Is that valid? 
>> 
>> I couldn’t find any documents with the differences/considerations we need to 
>> take into account between NRT and TLOG, could you please help? Thanks a lot 
>> in advance. Please let me know if there is anything else required.
>> 
>> Best regards,
>> Nick Vladiceanu

Re: Migration from NRT to TLOG performance issues

Reply via email to