Le lun. 23 janv. 2023 à 19:50, Shawn Heisey <apa...@elyograg.org> a écrit :
> On 1/23/23 07:38, Dominique Bejean wrote: > > On a SolrCloud 7.7 environment with 14 servers, we have one collection > with > > 1 billion documents. > > Sharding is 7 shards x 2 replicas (TLOG) > > Each solr server hosts one replica. > > > > Indexing and searching are permanent. > > No idea what "permanent" could mean here. Indexing rate average is 100 docs per second with autocommit each 5 minutes and autoSoftCommit each minute. Searching rate average is 1000 queries per seconds > > > Suddenly one of the server has CPU usage growing during 30 minutes. > > Sometimes during a few minutes the CPU usage decreases on this node and > > increases on other nodes. > > Here is a screenshot of CPU monitoring > > > https://drive.google.com/file/d/1Fp9oiZ8Sl7hb97utN2JRIm7dJKh0St3H/view?usp=share_link > What CPU characteristic do each of those colors represent? Especially > the dark purple. The image doesn't have that info. > Each color represent user cpu of one Solr server Servers are Linux and dedicated to Solr > > WARN logs do not provide any relevant information > > Customer did not generate thread dump. > > How about ERROR logs? Or any other severity? Have you looked through > the solr.log to see what requests were being handled at the time the > problem started and/or ended? Is there software other than Solr on the > same machine? Did you get a look at process performance info on the > machine while it was happening ... something like top for *NIX, or > resource monitor on Windows? I mean log level is WARN and no WARN log line provide relevante information. No ERROR log line is in the log > > > Any idea of what tasks can generate this kind of CPU behaviour ? > > > > Huge merge on a shard leader won't be so long and only one node will have > > to synchronize, not all. > > Have you asked them what they started doing between 10:40 and 10:50? Do > you have other performance graphs like number of queries per second, > number of update requests per second, disk utilization, Java memory > characteristics, and so on? Customer says nothing special was started. Just regular indexing and searching occur. > > It's difficult to say what the problem might be from just a CPU graph. > > Does the problem recur? If not, and that CPU graph is all you have from > the event, it might not be possible to get to the root cause. It is the only graph I have at this time. What is strange is that only one server has it user cpu grow at 100% during 30 minutes, and sometimes during 1 or 2 minutes it’s cpu go down and at the same time the other’s server cpu grow up. Without more information, my question was « did someone already encountered a such user cpu monitoring pattern and have an idea of the scenario causing it ? » > > Thanks, > Shawn >