Re: 8.11.2 Performance degradation

Richard Goodman Tue, 13 Dec 2022 02:21:01 -0800

Hi Alessandro,

Apologies for the delays. Yes, that's correct the performance
degradation still persisted, and metrics seemed very similar to the
parallel 8.11.2 cluster that was using http2.


Appreciate that this is going to be difficult to decipher what causes the
degradation. But for extra information, this cluster is being used solely
for querying, with faceting being used quite often and in some degrees of
heaviness.

I couldn't see much mention of garbage collection changing *(except the
default type)* in v8+, but one metric that is distinctively different is
that in 7.7.2 Old gen was being used the most on average (around 275GB),
where as with our 8.11.2 it sits at around 90GB and it appears Eden Space
is churning quite a lot *(it's also worth mentioning here, that both
clusters use G1GC)*.

If you need more information on the types of queries being ran, I can try
to get examples / more detailed description.

Cheers,

On Fri, 2 Dec 2022 at 16:31, Alessandro Benedetti <a.benede...@sease.io>
wrote:

> Hi Richard,
> when you mention "In particular which sparked interest, and so we spun up a
> parallel cluster
> with -Dsolr.http1=true, and there was no difference in performance. ", do
> you mean that you still see the degradation in performance right?
>
> I will probably state the obvious but normally you would require a detailed
> deep investigation to understand your issue.
> I suspect that without putting our hands on your
> cluster/config/architecture is going to be difficult to give meaningful
> suggestions.
>
> Especially with no reference to what you are currently using in Solr,
> e.g. do you see the degradation in:
> - indexing? indexing how? indexing what? The extent of the degradation
> - searching? what kind of queries? faceting? reranking?...
>
> That would definitely help but I suspect it's not going to be an easy one.
>
> Cheers
>
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Fri, 2 Dec 2022 at 13:15, Richard Goodman <richa...@brandwatch.com>
> wrote:
>
> > Hi Charlie,
> >
> > Gah, thanks for informing me of that, here is a link to the images is
> here
> > <https://imgur.com/a/yEmBGuv>
> >
> > Cheers,
> >
> >
> > On Tue, 29 Nov 2022 at 13:23, Charlie Hull <
> > ch...@opensourceconnections.com>
> > wrote:
> >
> > > Hey Richard,
> > >
> > > Attachments are stripped by this list so you might want to upload them
> > > somewhere and link to them.
> > >
> > > Cheers
> > >
> > > Charlie
> > >
> > > On 25/11/2022 17:33, Richard Goodman wrote:
> > > > Hi there,
> > > >
> > > > We have a cluster spread over 72 instances on k8s hosting around 12.5
> > > > billion documents (made up of 30 collections, each collection having
> 12
> > > > shards). We were originally using 7.7.2 and performance was okay
> enough
> > > for
> > > > us for our business needs. We then recently upgraded our cluster to
> > > > v8.11.2, and have noticed a drop in performance. I appreciate that
> > there
> > > > have been a lot of changes from 7.7.2 to 8.11.2, but I have been
> > > collecting
> > > > metrics, and although the configuration (instance type and resource
> > > > allocation, start up opts) are the same, we are completely at a loss
> as
> > > to
> > > > why it's performing worse, and was wondering if anyone had any
> > guidance?
> > > >
> > > > I recently stumbled across the tickets;
> > > >
> > > >     - SOLR-15840<https://issues.apache.org/jira/browse/SOLR-15840>
> -
> > > >     Performance degradation with http2
> > > >     - SOLR-16099<https://issues.apache.org/jira/browse/SOLR-16099>
> -
> > > HTTP
> > > >     Client threads can hang
> > > >
> > > > In particular which sparked interest, and so we spun up a parallel
> > > cluster
> > > > with -Dsolr.http1=true, and there was no difference in performance.
> > We're
> > > > testing a couple of other ideas, such as different DirectoryFatory
> > *(as I
> > > > saw a message from someone in the Solr Slack about there being an
> issue
> > > > with the MMap directory and vm.max_map_count)*, some GC settings, but
> > are
> > > > really open to any suggestions. We're also happy if it'll help with
> any
> > > > performance related topics to use this cluster to test patches at a
> > large
> > > > scale to see if it'll help with performance *(more specifically to
> the
> > > two
> > > > Solr tickets listed above)*.
> > > >
> > > > I thought it would be useful to show some metrics I collected where
> we
> > > had
> > > > 2 clusters spun up, 1 being 7.7.2 and 1 being 8.11.2 where the 8.11.2
> > > > cluster was the active, and all traffic was being shadow loaded into
> > the
> > > > 7.7.2 cluster to compare against. It's important to note that both
> > > clusters
> > > > had the same configuration, here is a list to name a few:
> > > >
> > > >     - G1GC garbage collector
> > > >     - TLOG replication
> > > >     - 27Gi Memory per instance
> > > >     - 16Gi assigned to -XmX and -Xms
> > > >     - 16 cores
> > > >     - -XX:G1HeapRegionSize=4m
> > > >     - -XX:G1ReservePercent=20
> > > >     - -XX:InitiatingHeapOccupancyPercent=35
> > > >
> > > > One metric that did stand out, was that 8.11.2 was churning through
> *a
> > > lot* of
> > > > eden space in the heap, which can be seen in some of the screenshots
> of
> > > > metrics below;
> > > >
> > > > Total Memory Usage:
> > > > 7.7.2
> > > >
> > > >
> > > > 8.11.2
> > > >
> > > >
> > > > Total Used G1 Pools
> > > > 7.7.2
> > > >
> > > >
> > > > 8.11.2
> > > >
> > > >
> > > > And finally, the overall thread pool
> > > > 7.7.2
> > > >
> > > >
> > > > 8.11.2
> > > >
> > > >
> > > > Any guidance or requests to test for performance wise would be
> > > appreciated.
> > > >
> > > > Thanks,
> > > >
> > > > Richard
> > > >
> > > --
> > > Charlie Hull - Managing Consultant at OpenSource Connections Limited
> > > Founding member of The Search Network <http://www.thesearchnetwork.com
> >
> > > and co-author of Searching the Enterprise
> > > <
> > >
> >
> https://opensourceconnections.com/wp-content/uploads/2020/08/ES_book_final_journal_version.pdf
> > > >
> > > tel/fax: +44 (0)8700 118334
> > > mobile: +44 (0)7767 825828
> > >
> > > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> > > Amtsgericht Charlottenburg | HRB 230712 B
> > > Geschäftsführer: John M. Woodell | David E. Pugh
> > > Finanzamt: Berlin Finanzamt für Körperschaften II
> >
> >
> >
> > --
> >
> > Richard Goodman (he/him)   |    Senior Data Infrastructure engineer
> >
> > richa...@brandwatch.com
> >
> >
> > NEW YORK   |   BOSTON   |   CHICAGO   |   TORONTO   |   *BRIGHTON*   |
> > LONDON   |   COPENHAGEN   |    BERLIN   |   STUTTGART   |   FRANKFURT   |
> > PARIS  |   BUDAPEST   |   SOFIA  |   CHENNAI   |    SINGAPORE   |
>  SYDNEY
> > |   MELBOURNE
> >
>


-- 

Richard Goodman (he/him)   |    Senior Data Infrastructure engineer

richa...@brandwatch.com


NEW YORK   |   BOSTON   |   CHICAGO   |   TORONTO   |   *BRIGHTON*   |
LONDON   |   COPENHAGEN   |    BERLIN   |   STUTTGART   |   FRANKFURT   |
PARIS  |   BUDAPEST   |   SOFIA  |   CHENNAI   |    SINGAPORE   |   SYDNEY
|   MELBOURNE

Re: 8.11.2 Performance degradation

Reply via email to