I felt the need to follow up on this for others reading. We experimented switching `ioThrottle` to `false` on the CMS settings and on face value, all our problems have gone away. Merges are significantly faster (average minor merge time down from 9mins to 2) and we're consistently using more of our disks available IO. As a result we have less concurrently on going merges at any given time.
My concern was that this would impact search performance, however we heavily rely on caches (filterCache and IO cache) so actually apart from after a softCommit (15mins) we have very little disk activity at all. Subsequently, somehow, has improved our p95 search latencies significantly during periods of heavy indexing (previously, we would see a 30-50% increase @ p95 during heavy update periods, and now we're seeing 5% at most). I'd love anyone else with experience of running solr in cloud & who has used this option to speak up, as I can't help but feel I'm missing something critical here as it almost seems too good to be true. Thanks Karl On 09/04/2021, 10:49, "Karl Stoney" <karl.sto...@autotrader.co.uk> wrote: We've lowered our autoCommit from 1 min to 3 min and that's help quite a lot with the number of small segments being constantly merged and has lowered the overall load on solr. Will continue to monitor. We're tempted to go to 5 minutes, but the size of tlogs then would be a bit uncomfortable (at 3mins under peak write load they're about 1.5gb each). On 08/04/2021, 19:31, "Karl Stoney" <karl.sto...@autotrader.co.uk> wrote: The documents are pretty large yes, 650 fields, circa 20kb/document so at peak (300/sec) that's circa 6meg/sec. ramBufferSizeMB is 512 so we'd be averaging 1 segment every 90 seconds (ish)? > This means you never explicitly commits from the client? But You autoCommit openSearcher=false every minute to flush transLog, and then autoSoftCommit every 15min to make changes visible? Yes, this is correct > Have you considered using PULL replicas for reading? Then you could tailor the HW on those servers to only serve reads, and they would replicate index from leader. You'd have e.g. 2xNRT + 7xPULL. I did, but we unfortunately rely on RealTimeGet so need NRT. On 08/04/2021, 15:10, "Jan Høydahl" <jan....@cominvent.com> wrote: > - Update rate, and how you do commits? > > Update rates very throughout the day, but range from 20ops/sec to 300ops/sec. Commits are done using autoCommit on 1 min interval, softCommit on 15min interval. This means you never explicitly commits from the client? But You autoCommit openSearcher=false every minute to flush transLog, and then autoSoftCommit every 15min to make changes visible? I cannot see why these seeings would cause constant merging, unless you commit more frequently? Your documents are large so the RAM-buffer will fill up quickly and cause a flush, perhaps try increasing ramBufferSizeMb will lower number of flushes and merges. Have you considered using PULL replicas for reading? Then you could tailor the HW on those servers to only serve reads, and they would replicate index from leader. You'd have e.g. 2xNRT + 7xPULL. Jan Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.