[
https://issues.apache.org/jira/browse/SOLR-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072991#comment-16072991
]
RAHAT BHALLA commented on SOLR-10987:
-------------------------------------
Thanks Yago. how do i put it in the mailing list? that'd really help me.
> Solr Cloud overseer node becomes unreachable. Issue Started Recently
> --------------------------------------------------------------------
>
> Key: SOLR-10987
> URL: https://issues.apache.org/jira/browse/SOLR-10987
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 6.1
> Environment: *The following is the usage on each of the Solr Nodes:*
> Tasks: 254 total, 1 running, 252 sleeping, 0 stopped, 1 zombie
> %Cpu(s): 0.4 us, 0.3 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0
> st
> KiB Mem : 20392276 total, 4169296 free, 2917012 used, 13305968 buff/cache
> KiB Swap: 5111804 total, 5111636 free, 168 used. 16058184 avail Mem
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 21250 solr 20 0 23.599g 1.184g 228440 S 2.0 6.1 59:55.91 java
> *Solr is running on 5 machines with similar configuration:*
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Byte Order: Little Endian
> CPU(s): 4
> On-line CPU(s) list: 0-3
> Thread(s) per core: 1
> Core(s) per socket: 2
> Socket(s): 2
> NUMA node(s): 1
> Vendor ID: GenuineIntel
> CPU family: 6
> Model: 62
> Model name: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> Stepping: 4
> CPU MHz: 2799.033
> BogoMIPS: 5600.00
> Hypervisor vendor: VMware
> Virtualization type: full
> L1d cache: 32K
> L1i cache: 32K
> L2 cache: 256K
> L3 cache: 25600K
> NUMA node0 CPU(s): 0-3
> Reporter: RAHAT BHALLA
> Labels: assistance, critical, customer, impacting, issue, need,
> production
>
> We host a Solr Cloud of 5 Nodes for Solr Instances and 3 Zookeeper nodes to
> maintain the cloud. We have over 70 million docs spread across 13 collections
> with 40K more documents being added every day almost near time within spans
> of 5 to 6 minutes.
> The System was working as expected and as required for th elast 7 months
> until suddenly we saw the following exception and all of our instances went
> offline. We restarted the instances and the cloud ran smoothly for three days
> before it came crashing down again.
> *Exception It gives before it goes down is as follows:*
> 3542285 ERROR
> (OverseerCollectionConfigSetProcessor-98221003671470081-prod-solr-node01:9080_solr-n_0000000106)
> [ ] o.a.s.c.OverseerTaskProcessor
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /overseer_elect/leader
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:348)
> at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
> at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
> at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:345)
> at
> org.apache.solr.cloud.OverseerTaskProcessor.amILeader(OverseerTaskProcessor.java:384)
> at
> org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:191)
> at java.lang.Thread.run(Unknown Source)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]