Hi team,
  The dns reslove expiration cause the ovsdb-server startup failed.
```
ovsdb-server --relay-leader-only --remote=db:OVN_Southbound,SB_Global,connections -vconsole:info -vsyslog:off -vfile:off --log-file=/var/log/ovn/ovn-ovsdb-sb.log --pidfile=/var/run/ovn/ovnsb_db.pid --unixctl=/var/run/ovn/ovnsb_db.ctl relay:OVN_Southbound:tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642,tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642,tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642
```
The sb relay need to connect sb-0/1/2 when it startup, but maybe failed as sb-0/1/2 can't provide service at this time. So sb relay will print the below logs:
```
|2025-06-25T21:25:46Z|00021|memory|INFO|atoms:13 cells:20 monitors:0 n-weak-refs:0 ||2025-06-25T21:26:39Z|00022|stream_tcp|WARN|Dropped 11 log messages in last 63 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:26:39Z|00023|stream_tcp|WARN|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol ||2025-06-25T21:27:43Z|00024|stream_tcp|WARN|Dropped 7 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:27:43Z|00025|stream_tcp|WARN|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol ||2025-06-25T21:28:47Z|00026|stream_tcp|WARN|Dropped 7 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:28:47Z|00027|stream_tcp|WARN|tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol ||2025-06-25T21:29:51Z|00028|stream_tcp|WARN|Dropped 7 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:29:51Z|00029|stream_tcp|WARN|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol ||2025-06-25T21:30:55Z|00030|stream_tcp|WARN|Dropped 7 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:30:55Z|00031|stream_tcp|WARN|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol ||2025-06-25T21:31:59Z|00032|stream_tcp|WARN|Dropped 7 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:31:59Z|00033|stream_tcp|WARN|tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol ||2025-06-25T21:33:03Z|00034|stream_tcp|WARN|Dropped 7 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:33:03Z|00035|stream_tcp|WARN|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol ||2025-06-25T21:34:07Z|00036|stream_tcp|WARN|Dropped 7 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate ||2025-06-25T21:34:07Z|00037|stream_tcp|WARN|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol|
```
After that, the reconnect will enter connecting and backoff phase repeatedly.
```
|2025-06-27T07:37:25Z|01023|reconnect|DBG|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering CONNECTING ||2025-06-27T07:37:25Z|01024|reconnect|DBG|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering BACKOFF ||2025-06-27T07:37:27Z|01025|poll_loop|DBG|wakeup due to 1982-ms timeout at ovsdb/ovsdb-server.c:401 (0% CPU usage) ||2025-06-27T07:37:29Z|01026|poll_loop|DBG|wakeup due to 2501-ms timeout at ovsdb/ovsdb-server.c:401 ||2025-06-27T07:37:32Z|01027|poll_loop|DBG|wakeup due to 2502-ms timeout at ovsdb/ovsdb-server.c:401 ||2025-06-27T07:37:33Z|01028|poll_loop|DBG|wakeup due to 1015-ms timeout at lib/reconnect.c:677 (0% CPU usage) ||2025-06-27T07:37:33Z|01029|reconnect|DBG|tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering CONNECTING ||2025-06-27T07:37:33Z|01030|reconnect|DBG|tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering BACKOFF ||2025-06-27T07:37:34Z|01031|poll_loop|DBG|wakeup due to 1487-ms timeout at ovsdb/ovsdb-server.c:401 (0% CPU usage) ||2025-06-27T07:37:37Z|01032|poll_loop|DBG|wakeup due to 2502-ms timeout at ovsdb/ovsdb-server.c:401 ||2025-06-27T07:37:39Z|01033|poll_loop|DBG|wakeup due to 2502-ms timeout at ovsdb/ovsdb-server.c:401 ||2025-06-27T07:37:41Z|01034|poll_loop|DBG|wakeup due to 1511-ms timeout at lib/reconnect.c:677 (0% CPU usage) ||2025-06-27T07:37:41Z|01035|reconnect|DBG|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering CONNECTING ||2025-06-27T07:37:41Z|01036|reconnect|DBG|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering BACKOFF ||2025-06-27T07:37:49Z|01054|poll_loop|DBG|wakeup due to 2007-ms timeout at lib/reconnect.c:677 (0% CPU usage) ||2025-06-27T07:37:49Z|01055|reconnect|DBG|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering CONNECTING ||2025-06-27T07:37:49Z|01056|reconnect|DBG|tcp:ovn-ovsdb-sb-1.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering BACKOFF ||2025-06-27T07:37:49Z|01057|poll_loop|DBG|wakeup due to 492-ms timeout at ovsdb/ovsdb-server.c:401 (0% CPU usage) ||2025-06-27T07:37:52Z|01058|poll_loop|DBG|wakeup due to 2503-ms timeout at ovsdb/ovsdb-server.c:401 (0% CPU usage) ||2025-06-27T07:37:54Z|01059|poll_loop|DBG|wakeup due to 2502-ms timeout at ovsdb/ovsdb-server.c:401 ||2025-06-27T07:37:57Z|01077|reconnect|DBG|tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering CONNECTING ||2025-06-27T07:37:57Z|01078|reconnect|DBG|tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering BACKOFF ||2025-06-27T07:37:59Z|01079|poll_loop|DBG|wakeup due to 2501-ms timeout at ovsdb/ovsdb-server.c:401 (0% CPU usage) ||2025-06-27T07:38:02Z|01080|poll_loop|DBG|wakeup due to 2503-ms timeout at ovsdb/ovsdb-server.c:401 (0% CPU usage) ||2025-06-27T07:38:04Z|01081|poll_loop|DBG|wakeup due to 2502-ms timeout at ovsdb/ovsdb-server.c:401 ||2025-06-27T07:38:05Z|01082|poll_loop|DBG|wakeup due to 494-ms timeout at lib/reconnect.c:677 ||2025-06-27T07:38:05Z|01083|reconnect|DBG|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering CONNECTING ||2025-06-27T07:38:05Z|01084|reconnect|DBG|tcp:ovn-ovsdb-sb-0.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642: entering BACKOFF ||2025-06-27T07:38:05Z|01085|poll_loop|DBG|wakeup due to [POLLIN] on fd 19 (0.0.0.0:6642<->) at lib/stream-fd.c:274|
```
We print the stack info:
```
bash-4.4# gdb attach 17
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-15.es8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 17
[New LWP 18]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f0520055a41 in poll () from /lib64/libc.so.6
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-151.es8_1.x86_64 libevent-2.1.8-5.es8.x86_64 libgcc-8.4.1-1.es8.x86_64 libibverbs-32.0-4.es8.x86_64 numactl-libs-2.0.12-11.es8.x86_64 openssl-libs-1.1.1k-12.es8.x86_64 python3-libs-3.6.8-39.es8.rocky.0.x86_64 zlib-1.2.11-17.es8.x86_64
(gdb) b resolve_check_valid__
Breakpoint 1 at 0x7f052074d899: file lib/dns-resolve.c, line 234.
(gdb) c
Continuing.

Thread 1 "ovsdb-server" hit Breakpoint 1, resolve_check_valid__ (req=0x5616829ce740) at lib/dns-resolve.c:234
234    lib/dns-resolve.c: No such file or directory.
(gdb) bt
#0  resolve_check_valid__ (req=0x5616829ce740) at lib/dns-resolve.c:234
#1  dns_resolve (name=<optimized out>, addr=addr@entry=0x7ffeaec04b20) at lib/dns-resolve.c:173 #2  0x00007f05206ed73f in parse_sockaddr_components_dns (s=0x56168269d1f0 "ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642", default_port=65535, port_s=0x56168269c0c2 "6642",
    host_s=<optimized out>, ss=0x7ffeaec04bf0) at lib/socket-util.c:510
#3  parse_sockaddr_components (ss=ss@entry=0x7ffeaec04bf0, host_s=<optimized out>, port_s=0x56168269c0c2 "6642", default_port=default_port@entry=65535,     s=s@entry=0x56168269d1f0 "ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642", resolve_host=resolve_host@entry=true, dns_failure=<optimized out>) at lib/socket-util.c:510 #4  0x00007f05206ede71 in inet_parse_active (target_=target_@entry=0x56168269d1f0 "ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642", default_port=default_port@entry=-1,     ss=ss@entry=0x7ffeaec04bf0, resolve_host=resolve_host@entry=true, dns_failure=dns_failure@entry=0x0) at lib/socket-util.c:559 #5  0x00007f05206eea22 in inet_open_active (style=style@entry=1, target=0x56168269d1f0 "ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642", default_port=default_port@entry=-1,     ssp=ssp@entry=0x0, fdp=fdp@entry=0x7ffeaec04cc4, dscp=<optimized out>) at lib/socket-util.c:597 #6  0x00007f05206f3e5e in tcp_open (name=0x5616829ce870 "tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642", suffix=<optimized out>, streamp=0x7ffeaec04d10,
    dscp=<optimized out>) at lib/stream-tcp.c:56
#7  0x00007f05206f47be in stream_open (name=name@entry=0x5616829ce870 "tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642", streamp=streamp@entry=0x5616826c5c70,
    dscp=dscp@entry=48 '0') at lib/stream.c:233
#8  0x00007f05206f5037 in stream_open_with_default_port (name_=<optimized out>, name_@entry=0x5616826c6020 "tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642",     default_port=default_port@entry=6640, streamp=streamp@entry=0x5616826c5c70, dscp=<optimized out>) at lib/stream.c:749 #9  0x00007f052063f074 in jsonrpc_stream_open (name=name@entry=0x5616826c6020 "tcp:ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local:6642", streamp=streamp@entry=0x5616826c5c70,
    dscp=<optimized out>) at lib/jsonrpc.c:72
#10 0x00007f0520640980 in jsonrpc_session_connect (s=0x5616826c5c40) at lib/jsonrpc.c:994
#11 jsonrpc_session_run (s=0x5616826c5c40) at lib/jsonrpc.c:1088
#12 0x00007f05206c6a2e in ovsdb_cs_run (cs=0x5616826c5e00, events=events@entry=0x7ffeaec04e70) at lib/ovsdb-cs.c:621
#13 0x00007f0520aacf21 in ovsdb_relay_run () at ovsdb/relay.c:387
#14 0x0000561657a07bdb in main_loop (exiting=0x7ffeaec04fad, run_process=0x0, remotes=0x7ffeaec05070, unixctl=0x5616826e94a0, all_dbs=0x7ffeaec05010, jsonrpc=0x5616826bca10, config=0x7ffeaec050d0)
    at ovsdb/ovsdb-server.c:334
#15 main (argc=<optimized out>, argv=<optimized out>) at ovsdb/ovsdb-server.c:895
(gdb) p req
$1 = (struct resolve_request *) 0x5616829ce740
(gdb) p req->name
$2 = 0x56168269bb30 "ovn-ovsdb-sb-2.ovn-ovsdb-sb-discovery.openstack.svc.cluster.local"
(gdb) p req->addr
$3 = 0x5616829cd360 "10.232.4.170"
(gdb) p req->state
$4 = RESOLVE_GOOD
(gdb) p req->time
$5 = 55695
(gdb) p req->ub_result->ttl
$6 = 5
(gdb)
```
Based on the above info and code analysis:
The sb relay reconnects to an sb service every 8 seconds (based on RECONNECT_DEFAULT_MAX_BACKOFF). There are three sb services, so the sb relay attempts to reconnect to one of them every 24 seconds. DNS uses synchronous resolving with a TTL of 5 seconds. For example: - The first time, sb relay uses synchronous DNS resolving for sb-0 and succeeds, but no address is returned. - After 24 seconds, when sb relay tries to request sb-0's address again, it finds the data has expired. It then performs synchronous resolving again, but no address is returned.
As a result, sb relay repeats step 2 continuously.




_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to