I felt the need to follow up on this for others reading.
We experimented switching `ioThrottle` to `false` on the CMS settings and on 
face value, all our problems have gone away.
Merges are significantly faster (average minor merge time down from 9mins to 2) 
and we're consistently using more of our disks available IO.  As a result we 
have less concurrently on going merges at any given time.

My concern was that this would impact search performance, however we heavily 
rely on caches (filterCache and IO cache) so actually apart from after a 
softCommit (15mins) we have very little disk activity at all.  Subsequently, 
somehow, has improved our p95 search latencies significantly during periods of 
heavy indexing (previously, we would see a 30-50% increase @ p95 during heavy 
update periods, and now we're seeing 5% at most).

I'd love anyone else with experience of running solr in cloud & who has used 
this option to speak up, as I can't help but feel I'm missing something 
critical here as it almost seems too good to be true.

Thanks
Karl


On 09/04/2021, 10:49, "Karl Stoney" <karl.sto...@autotrader.co.uk> wrote:

    We've lowered our autoCommit from 1 min to 3 min and that's help quite a 
lot with the number of small segments being constantly merged and has lowered 
the overall load on solr.  Will continue to monitor.  We're tempted to go to 5 
minutes, but the size of tlogs then would be a bit uncomfortable (at 3mins 
under peak write load they're about 1.5gb each).



    On 08/04/2021, 19:31, "Karl Stoney" <karl.sto...@autotrader.co.uk> wrote:

        The documents are pretty large yes, 650 fields, circa 20kb/document so 
at peak (300/sec) that's circa 6meg/sec.  ramBufferSizeMB is 512 so we'd be 
averaging 1 segment every 90 seconds (ish)?

        >    This means you never explicitly commits from the client? But You 
autoCommit openSearcher=false every minute to flush transLog, and then 
autoSoftCommit every 15min to make changes visible?

        Yes, this is correct

        >    Have you considered using PULL replicas for reading? Then you 
could tailor the HW on those servers to only serve reads, and they would 
replicate index from leader. You'd have e.g. 2xNRT + 7xPULL.

        I did, but we unfortunately rely on RealTimeGet so need NRT.



        On 08/04/2021, 15:10, "Jan Høydahl" <jan....@cominvent.com> wrote:

            >    - Update rate, and how you do commits?
            >
            > Update rates very throughout the day, but range from 20ops/sec to 
300ops/sec.  Commits are done using autoCommit on 1 min interval, softCommit on 
15min interval.

            This means you never explicitly commits from the client? But You 
autoCommit openSearcher=false every minute to flush transLog, and then 
autoSoftCommit every 15min to make changes visible?

            I cannot see why these seeings would cause constant merging, unless 
you commit more frequently? Your documents are large so the RAM-buffer will 
fill up quickly and cause a flush, perhaps try increasing ramBufferSizeMb will 
lower number of flushes and merges.

            Have you considered using PULL replicas for reading? Then you could 
tailor the HW on those servers to only serve reads, and they would replicate 
index from leader. You'd have e.g. 2xNRT + 7xPULL.

            Jan



Unless expressly stated otherwise in this email, this e-mail is sent on behalf 
of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, 
Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited 
is part of the Auto Trader Group Plc group. This email and any files 
transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

Reply via email to