Hello everyone,

I am writing in hopes of getting an answer to this mail.
We are struggling with this problem without coming to a solution.

Thanks in advance,

Marco

-----Original Message-----
From: Matteo Diarena <m.diar...@volocom.it> 
Sent: lunedì 5 settembre 2022 11:02
To: users@solr.apache.org
Subject: R: SolrCloud node fail to connect to another node in the cluster

Sorry, my fault. I try to rewrite my email without images:

I’m experiencing a strange behaviour with a SolrCloud cluster.

Cluster description
I have a cluster with a total of 38 nodes. All nodes are installed with the 
following features:
        -  OS: Debian GNU/Linux 9.13 (stretch)
        -  JRE: openjdk version "11.0.6" 2020-01-14
        -  Apache Solr: Apache Solr 8.11.2

The cluster nodes are divided as follows:

Nodes used for indexing
solrindex-01
solrindex-02

Nodes used for queries
solrquery-01
solrquery-02

Cluster nodes with collections
solrnode-01
…
solrnode-34

Configuration of the collection
In the cluster I have a collection (i.e testcollection) divided on the various 
nodes through different shards (one shard for each month, i.e. shard_202201, 
shard_202202, ...)

Problem
From time to time the solrquery-01 node is no longer able to query the entire 
collection and in particular it is unable to contact some replicas of the 
collection present on the other nodes of the cluster. The problem does not 
resolve itself but it is necessary to restart the Apache Solr service on the 
solrquery-01 node.

In particular:
If I try to query a specific replica from the solrquery-01 node, the request 
remains pending until it times out

Query
http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=track&shards=http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/

Response
 {
  "response":{"numFound":0,"start":0,"numFoundExact":true,"docs":[]},
  "debug":{
    "track":{
      "rid":"solrquery-01.volo.local-232528",
      "EXECUTE_QUERY":{
        
"http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/":{
          "Exception":"Timeout occured while waiting response from server at: 
http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/select"}}}}
}

By executing the same query from another node (eg: solrnode-01) the query is 
successful.

Query
http://solrnode-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=track&shards=http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/

Response:
 {
  
"response":{"numFound":0,"start":0,"maxScore":0.0,"numFoundExact":true,"docs":[]},
  "debug":{
    "track":{
      "rid":"solrnode-01.volo.local-1849853",
      "EXECUTE_QUERY":{
        
"http://solrnode-24.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n575/":{
          "QTime":"0",
          "ElapsedTime":"28",
          "RequestPurpose":"GET_TOP_IDS,SET_TERM_STATS",
          "NumFound":"0",
        
"Response":"{responseHeader={zkConnected=true,status=0,QTime=0},response={numFound=0,numFoundExact=true,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}"}}}}
}

The same happens if I try to run the query from solrquery-01 node to a 
different replica

Query
http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReference:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED&debug=track&shards=http://solrnode-23.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n573/

Response
 {
  
"response":{"numFound":0,"start":0,"maxScore":0.0,"numFoundExact":true,"docs":[]},
  "debug":{
    "track":{
      "rid":"solrquery-01.volo.local-232531",
      "EXECUTE_QUERY":{
        
"http://solrnode-23.volo.local:8080/solr/volocomapi_search_shard_201501_replica_n573/":{
          "QTime":"0",
          "ElapsedTime":"88",
          "RequestPurpose":"GET_TOP_IDS,SET_TERM_STATS",
          "NumFound":"0",
          
"Response":"{responseHeader={zkConnected=true,status=0,QTime=0},response={numFound=0,numFoundExact=true,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}"}}}}
}


Checking the network traffic with tcpdump on the solrquery-01 machine does not 
show any connection as it does on the solrnode-01 machine

tcpdump from the solrquery-01 machine
 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode 
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes


tcpdump on the solrnode-01 machine

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode 
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
10:57:10.979736 IP solrnode-01.volo.local.39888 > 
solrnode-24.volo.local.http-alt: Flags [P.], seq 881884455:881885148, ack 
1974049136, win 364, options [nop,nop,TS val 561210041 ecr 561833498], length 
693: HTTP
10:57:11.008007 IP solrnode-01.volo.local.39888 > 
solrnode-24.volo.local.http-alt: Flags [.], ack 132, win 364, options 
[nop,nop,TS val 561210048 ecr 561835614], length 0

Question
Do you have any suggestions on how to investigate this issue further? 
Suggestions on possible solutions?

Thank you in advance,
Matteo


Matteo Diarena
Direttore Innovazione

Volocom s.r.l. (www.volocom.it - volo...@pec.it) Via Antonio Cechov, 50 - 20151 
MILANO Via Leone XIII, 95 - 00165 ROMA

Tel +39 02 89453024 / +39 02 89453023
Mobile +39 345 2129244
m.diar...@volocom.it

-----Messaggio originale-----
Da: Vincenzo D'Amore <v.dam...@gmail.com>
Inviato: 05 September 2022 00:34
A: users@solr.apache.org
Oggetto: Re: SolrCloud node fail to connect to another node in the cluster

Hi Matteo, FYI, images has been removed from your email.
The mailing list ate it. You'll need to give us text, not an image.

On Thu, 1 Sep 2022 at 16:35, Matteo Diarena <m.diar...@volocom.it> wrote:

> Dear all,
>
> I’m experiencing a strange behaviour with a SolrCloud cluster.
>
>
>
> *Cluster description *
>
> I have a cluster with a total of 38 nodes. All nodes are installed 
> with the following features:
>
>    - *OS*: Debian GNU/Linux 9.13 (stretch)
>    - JRE: openjdk version "11.0.6" 2020-01-14
>    - Apache Solr: Apache Solr 8.11.2
>
>
>
> The cluster nodes are divided as follows:
>
>
>
> *Nodes used for indexing*
>
> solrindex-01
>
> solrindex-02
>
>
>
> *Nodes used for queries*
>
> solrquery-01
>
> solrquery-02
>
>
>
> *Cluster nodes with collections*
>
> solrnode-01
>
> …
>
> solrnode-34
>
>
>
> *Configuration of the collection*
>
> In the cluster I have a collection (i.e testcollection) divided on the 
> various nodes through different shards (one shard for each month, i.e.
> shard_202201, shard_202202, ...)
>
>
>
> *Problem*
>
> From time to time the solrquery-01 node is no longer able to query the 
> entire collection and in particular it is unable to contact some 
> replicas of the collection present on the other nodes of the cluster.
> The problem does not resolve itself but it is necessary to restart the 
> Apache Solr service on the solrquery-01 node.
>
>
>
> In particular:
>
> If I try to query a specific replica from the solrquery-01 node, the 
> request remains pending until it times out
>
>
>
> Query
>
>
> http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReferen
> ce:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFE
> D&debug=true&shards=http://solrnode-24.volo.local:8080/solr/volocomapi
> _search_shard_201501_replica_n575/
>
>
>
> Response
>
>
>
> By executing the same query from another node (eg: solrnode-01) the 
> query is successful.
>
>
>
> Query
>
>
> http://solrnode-01:8080/solr/volocomapi_search/select?q=UniqueReferenc
> e:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFED
> &debug=true&shards=http://solrnode-24.volo.local:8080/solr/volocomapi_
> search_shard_201501_replica_n575/
>
>
>
>
>
> Response:
>
>
>
> The same happens if I try to run the query to a different replica
>
>
>
> Query
>
>
> http://solrquery-01:8080/solr/volocomapi_search/select?q=UniqueReferen
> ce:DOC_EBF3D4C11F1239852490280F583D052FC214A10D6E716BD98C19CBC599E5EFE
> D&debug=true&shards=http://solrnode-23.volo.local:8080/solr/volocomapi
> _search_shard_201501_replica_n573/
>
>
>
> Response
>
>
>
>
>
> Checking the network traffic with tcpdump on the solrquery-01 machine 
> does not show any connection as it does on the solrnode-01 machine
>
>
>
> *tcpdump from the solrquery-01 machine*
>
>
>
> *tcpdump on the solrnode-01 machine*
>
>
>
> *Question*
>
> Do you have any suggestions on how to investigate this issue further?
> Suggestions on possible solutions?
>
>
>
>
>
> Thank you in advance,
>
> Matteo
>
--
Vincenzo D'Amore

Reply via email to