Le lun. 23 janv. 2023 à 19:50, Shawn Heisey <apa...@elyograg.org> a écrit :

> On 1/23/23 07:38, Dominique Bejean wrote:
> > On a SolrCloud 7.7 environment with 14 servers, we have one collection
> with
> > 1 billion documents.
> > Sharding is 7 shards x 2 replicas (TLOG)
> > Each solr server hosts one replica.
> >
> > Indexing and searching are permanent.
>
> No idea what "permanent" could mean here.

Indexing rate average is 100 docs per second with autocommit each 5 minutes
and autoSoftCommit each minute.
Searching rate average is 1000 queries per seconds



>
> > Suddenly one of the server has CPU usage growing during 30 minutes.
> > Sometimes during a few minutes the CPU usage decreases on this node and
> > increases on other nodes.
> > Here is a screenshot of CPU monitoring
> >
> https://drive.google.com/file/d/1Fp9oiZ8Sl7hb97utN2JRIm7dJKh0St3H/view?usp=share_link
> What CPU characteristic do each of those colors represent?  Especially
> the dark purple.  The image doesn't have that info.
>
Each color represent user cpu of one Solr server
Servers are Linux and dedicated to Solr


> > WARN logs do not provide any relevant information
> > Customer did not generate thread dump.
>
> How about ERROR logs?  Or any other severity?  Have you looked through
> the solr.log to see what requests were being handled at the time the
> problem started and/or ended?  Is there software other than Solr on the
> same machine?  Did you get a look at process performance info on the
> machine while it was happening ... something like top for *NIX, or
> resource monitor on Windows?

I mean log level is WARN and no WARN log line provide relevante
information. No ERROR log line is in the log


>
> > Any idea of what tasks can generate this kind of CPU behaviour ?
> >
> > Huge merge on a shard leader won't be so long and only one node will have
> > to synchronize, not all.
>
> Have you asked them what they started doing between 10:40 and 10:50?  Do
> you have other performance graphs like number of queries per second,
> number of update requests per second, disk utilization, Java memory
> characteristics, and so on?

Customer says nothing special was started. Just regular indexing and
searching occur.



>
> It's difficult to say what the problem might be from just a CPU graph.
>
> Does the problem recur?  If not, and that CPU graph is all you have from
> the event, it might not be possible to get to the root cause.

It is the only graph I have at this time.
What is strange is that only one server has it user cpu grow at 100% during
30 minutes, and sometimes during 1 or 2 minutes it’s cpu go down and at the
same time the other’s server cpu grow up. Without more information, my
question was « did someone already encountered a such user cpu monitoring
pattern and have an idea of the scenario causing it ? »


>
> Thanks,
> Shawn
>

Reply via email to