> On 10 May 2023, at 17:15, Vladislav Odintsov <odiv...@gmail.com> wrote:
> 
> Hi all,
> 
>> On 3 May 2023, at 15:11, Ilya Maximets <i.maxim...@ovn.org> wrote:
>> 
>> On 5/3/23 12:47, Vladislav Odintsov wrote:
>>> Thanks Ilya for your inputs.
>>> 
>>>> On 2 May 2023, at 21:49, Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>> 
>>>> On 5/2/23 19:22, Ilya Maximets wrote:
>>>>> On 5/2/23 19:04, Vladislav Odintsov via discuss wrote:
>>>>>> I ran perf record -F99 -p $(ovsdb-server) -- sleep 30 on ovsdb-server 
>>>>>> process during CPU spike. perf report result:
>>>>>> 
>>>>> 
>>>>> Could you run it for a couple of minutes during that 5-6 minute window?
>>> 
>>> Sure, here it is (this report was collected during ~3 minutes while 
>>> ovsdb-server was under 100% CPU load):
>>> 
>>> # To display the perf.data header info, please use --header/--header-only 
>>> options.
>>> #
>>> #
>>> # Total Lost Samples: 0
>>> #
>>> # Samples: 12K of event 'cpu-clock'
>>> # Event count (approx.): 130030301730
>>> #
>>> # Overhead  Command       Shared Object                Symbol
>>> # ........  ............  ...........................  
>>> ......................................
>>> #
>>>     21.20%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] uuid_compare_3way
>>>     10.49%  ovsdb-server  libc-2.17.so                 [.] 
>>> malloc_consolidate
>>>     10.04%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>> ovsdb_clause_evaluate
>>>      9.40%  ovsdb-server  libc-2.17.so                 [.] _int_malloc
>>>      6.42%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_destroy__
>>>      4.36%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>> ovsdb_atom_compare_3way
>>>      3.29%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>> json_serialize_string
>>>      3.23%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>> ovsdb_condition_match_any_clause
>>>      3.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_serialize
>>>      2.60%  ovsdb-server  [kernel.kallsyms]            [k] clear_page_c_e
>>>      1.87%  ovsdb-server  libc-2.17.so                 [.] 
>>> __memcpy_ssse3_back
>>>      1.80%  ovsdb-server  libc-2.17.so                 [.] free
>>>      1.67%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>> json_serialize_object_member
>>>      1.60%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>> ovsdb_atom_is_default
>>>      1.47%  ovsdb-server  libc-2.17.so                 [.] vfprintf
>>>      1.17%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] resize
>>>      1.12%  ovsdb-server  libc-2.17.so                 [.] _int_free
>>>      1.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>> ovsdb_atom_compare_3way@plt
>>>      1.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] shash_find__
>> 
>> Thanks!  Yeah, the conditional monitoring appears to be the
>> main issue here.
>> 
>> <snip>
>> 
>>> 
>>>>> Also, is it a single 5-6 minute poll interval or a several shorter ones
>>>>> (from the log)?
>>> 
>>> I see from monitoring that during these 5-6 minutes ovsdb-server utilizes 
>>> 100% of one core. With top command it is also constantly shown as 100%.
>>> 
>>> Worth to add that I see next log warnings in ovsdb-server relay (long poll 
>>> interval for 84 seconds):
>>> 
>>> 2023-05-03T10:21:53.928Z|11522|timeval|WARN|Unreasonably long 84348ms poll 
>>> interval (84270ms user, 14ms system)
>>> 2023-05-03T10:21:53.931Z|11523|timeval|WARN|context switches: 0 voluntary, 
>>> 229 involuntary
>>> 2023-05-03T10:21:53.933Z|11524|coverage|INFO|Skipping details of duplicate 
>>> event coverage for hash=580d57f8
>>> 2023-05-03T10:21:53.935Z|11525|poll_loop|INFO|wakeup due to [POLLIN] on fd 
>>> 21 (0.0.0.0:6642<->) at lib/stream-ssl.c:978 (99% CPU usage)
>>> 2023-05-03T10:21:54.094Z|11526|stream_ssl|WARN|SSL_write: system error 
>>> (Broken pipe)
>>> 2023-05-03T10:21:54.096Z|11527|jsonrpc|WARN|ssl:x.x.x.x:46894: send error: 
>>> Broken pipe
>>> 2023-05-03T10:21:54.120Z|11528|stream_ssl|WARN|SSL_accept: unexpected SSL 
>>> connection close
>>> 2023-05-03T10:21:54.120Z|11529|jsonrpc|WARN|ssl:x.x.x.x:46950: receive 
>>> error: Protocol error
>>> 2023-05-03T10:21:54.122Z|11530|poll_loop|INFO|wakeup due to [POLLIN] on fd 
>>> 21 (0.0.0.0:6642<->) at lib/stream-ssl.c:978 (99% CPU usage)
>>> 
>>>>> 
>>>>> And you seem to miss some debug symbols for libovsdb.
>>> 
>>> Thanks. Now I’ve installed openvswitch-debuginfo package.
>>> 
>>>>> 
>>>>> One potentially quick fix for your setup would be to disable conditional
>>>>> monitoring, i.e. set ovn-monitor-all=true.  You can see the condition
>>>>> comparison functions are high in the perf output.  It doesn't scale well.
>>> 
>>> I’ve looked the docs about ovn-monitor-all and it seems that it is what I 
>>> need, because it says:
>>> 
>>>    …  Typically, set it to true for environments that all workloads need to 
>>> be reachable from each other.
>>> 
>>> In my case these servers handle centralized NAT service and need logical 
>>> connection with most hosts.
>> 
>> Yeah.  In general, setting it to true mostly affects memory usage
>> of the ovn-controller.  Might slightly increase CPU usage as well,
>> but that should not be critical.  It dramatically reduces load on
>> the SB DB server though.  So, that's a trade off.
>> 
>> FWIW, the man page seems to be incorrect.  It should say 'OVN database'
>> instead of 'ovs-database', as this option affects SB connection and not
>> the connection to a local OVS database.
>> 
>>> 
>>>> 
>>>> FWIW, conditional monitoring inefficiencies are not all related to
>>>> the database server.  There are significant inefficiencies in a way
>>>> ovn-controller creates condition clauses:
>>>>  https://bugzilla.redhat.com/show_bug.cgi?id=2139194
>>>> Since you're removing a port in your setup, that likely triggers
>>>> condition change requests from all controllers for which this port
>>>> is local.  Assuming you have N ports total in the cluster and
>>>> M ports per node, conditions will contain M * 2 clauses and the
>>>> server will have to perform N * M * 2 comparisons per controller
>>>> on a condition change request.
>>> 
>>> Gonna switch ovn-controllers which connected to one ovsdb relay to 
>>> ovn-monitor-all=true and compare load after this change with other relays, 
>>> whose same clients left intact.
>>> 
>>> I’ll return with results...
>> 
>> Looking forward to see them!
> 
> So after more than one and a half weeks of work of one ovsdb-server, which 
> acts as a dedicated relay for ovn-controllers, which serve centralized NAT 
> services, I can say that ovn-monitor-all=true option totally helps: the load 
> spikes on ovsdb-server process has gone.
> This is confirmed by monitoring graphs from another ovsdb-server relays, 
> where I see spikes of CPU load and don’t see on a server with re-configured 
> clients.
> 
> Also, I’ve recompiled OVS with jemalloc (thnx @Felix Hüttner for the hint). 
> In a short summary, I see that there is almost no work with memory 
> management, which took significant amount of time with glibc implementation. 
> It was ~10.5% in malloc_consolidate plus ~9.4% in _int_malloc.

Forgot to add: memory footprint for ovsdb-server with jemalloc reduced by ~22% 
comparing to glibc after start and didn’t grow during this test period.

> Now I’ve got next perf report from ovsdb-server, which has 6 ovn-controllers 
> connected with disabled ovn-monitor-all. Compiled with jemalloc:
> 
> # Total Lost Samples: 0
> #
> # Samples: 21K of event 'cpu-clock'
> # Event count (approx.): 220989896780
> #
> # Overhead  Command       Shared Object                Symbol
> # ........  ............  ...........................  
> .............................................
> #
>     44.57%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_clause_evaluate
>     13.84%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] uuid_compare_3way
>     13.19%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> ovsdb_atom_compare_3way
>      9.87%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_condition_match_any_clause
>      3.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> uuid_compare_3way@plt
>      1.11%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_destroy__
>      1.10%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> json_serialize_string
>      0.98%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> ovsdb_datum_compare_3way
>      0.79%  ovsdb-server  [kernel.kallsyms]            [k] clear_page_c_e
>      0.73%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_serialize
>      0.68%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_string
>      0.63%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> ovsdb_atom_is_default
>      0.59%  ovsdb-server  libc-2.17.so                 [.] vfprintf
>      0.51%  ovsdb-server  libc-2.17.so                 [.] __memcpy_ssse3_back
>      0.42%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_clone
>      0.38%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] shash_find__
>      0.28%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_monitor_changes_update
>      0.28%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> json_serialize_object_member
>      0.27%  ovsdb-server  libjemalloc.so.2             [.] free
>      0.22%  ovsdb-server  libc-2.17.so                 [.] _IO_default_xsputn
>      0.21%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] resize
>      0.21%  ovsdb-server  [kernel.kallsyms]            [k] __do_softirq
>      0.18%  ovsdb-server  libjemalloc.so.2             [.] malloc
>      0.16%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hash_bytes
>      0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] shash_add_nocopy__
>      0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_destroy
>      0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_equals
>      0.12%  ovsdb-server  [kernel.kallsyms]            [k] 
> audit_filter_rules.isra.8
>      0.12%  ovsdb-server  libjemalloc.so.2             [.] 0x00000000000733c0
>      0.12%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_atom_compare_3way@plt
>      0.11%  ovsdb-server  [kernel.kallsyms]            [k] 
> system_call_after_swapgs
>      0.10%  ovsdb-server  [kernel.kallsyms]            [k] 
> audit_filter_syscall
>      0.10%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> ovsdb_datum_compare_3way@plt
>      0.10%  ovsdb-server  libc-2.17.so                 [.] __strncmp_sse42
>      0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_monitor_row_skip_update.part.0
>      0.09%  ovsdb-server  libc-2.17.so                 [.] _itoa_word
>      0.09%  ovsdb-server  libjemalloc.so.2             [.] 0x0000000000014d7a
>      0.09%  ovsdb-server  libjemalloc.so.2             [.] 0x000000000007341b
>      0.08%  ovsdb-server  libc-2.17.so                 [.] __strchrnul
>      0.08%  ovsdb-server  libjemalloc.so.2             [.] 0x000000000004b81d
>      0.08%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_monitor_get_update
>      0.07%  ovsdb-server  [kernel.kallsyms]            [k] 
> get_page_from_freelist
>      0.07%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] xvasprintf
>      0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_monitor_compose_row_update2
>      0.07%  ovsdb-server  libc-2.17.so                 [.] __memset_sse2
>      0.06%  ovsdb-server  libc-2.17.so                 [.] 
> _IO_str_init_static_internal
>      0.06%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> json_string_create_nocopy
>      0.06%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> ovsdb_datum_is_default
>      0.05%  ovsdb-server  [kernel.kallsyms]            [k] 
> copy_user_enhanced_fast_string
>      0.05%  ovsdb-server  libc-2.17.so                 [.] 
> __strlen_sse2_pminub
>      0.05%  ovsdb-server  libc-2.17.so                 [.] __vsnprintf_chk
>      0.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> ds_put_format_valist
>      0.05%  ovsdb-server  libjemalloc.so.2             [.] 0x000000000004b810
>      0.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ds_put_uninit
>      0.05%  ovsdb-server  [kernel.kallsyms]            [k] __d_lookup_rcu
>      0.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hmap_swap
>      0.05%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_monitor_get_initial
>      0.05%  ovsdb-server  libpthread-2.17.so           [.] __libc_accept
>      0.05%  ovsdb-server  libpthread-2.17.so           [.] __read_nocancel
>      0.04%  ovsdb-server  [kernel.kallsyms]            [k] 
> _raw_spin_unlock_irqrestore
>      0.04%  ovsdb-server  [kernel.kallsyms]            [k] finish_task_switch
>      0.04%  ovsdb-server  ld-2.17.so                   [.] __tls_get_addr
>      0.04%  ovsdb-server  libc-2.17.so                 [.] __strcmp_sse42
>      0.04%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_array_create
>      0.04%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> ovsdb_atom_to_json__
>      0.04%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_monitor_compose_update
>      0.03%  ovsdb-server  libc-2.17.so                 [.] __xstat64
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ds_put_buffer
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_object_create
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] xmalloc
>      0.03%  ovsdb-server  libpthread-2.17.so           [.] __write_nocancel
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_integer_create
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_object_put
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_atom_hash
>      0.03%  ovsdb-server  ovsdb-server                 [.] main
>      0.02%  ovsdb-server  [kernel.kallsyms]            [k] __do_page_fault
>      0.02%  ovsdb-server  [kernel.kallsyms]            [k] audit_filter_inodes
>      0.02%  ovsdb-server  [vdso]                       [.] 
> __vdso_clock_gettime
>      0.02%  ovsdb-server  libc-2.17.so                 [.] __poll_nocancel
>      0.02%  ovsdb-server  libjemalloc.so.2             [.] 0x0000000000014d1a
>      0.02%  ovsdb-server  libjemalloc.so.2             [.] 0x000000000004b82a
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hmap_init
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_array_create_2
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] xmalloc__
>      0.02%  ovsdb-server  [kernel.kallsyms]            [k] fget_light
>      0.02%  ovsdb-server  libjemalloc.so.2             [.] 0x0000000000016379
>      0.02%  ovsdb-server  libjemalloc.so.2             [.] 0x000000000004b899
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_hash
>      0.02%  ovsdb-server  libpthread-2.17.so           [.] 
> pthread_mutex_trylock
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] 
> __audit_syscall_exit
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] 
> __mem_cgroup_uncharge_common
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __memcpy
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __pollwait
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] free_pages_prepare
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] inode_permission
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] iput
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] kmem_cache_alloc
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] run_timer_softirq
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_current_mss
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_recvmsg
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_sendmsg
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] vfs_read
>      0.01%  ovsdb-server  libc-2.17.so                 [.] _IO_old_init
>      0.01%  ovsdb-server  libc-2.17.so                 [.] _IO_setb
>      0.01%  ovsdb-server  libc-2.17.so                 [.] __strchr_sse42
>      0.01%  ovsdb-server  libjemalloc.so.2             [.] 0x0000000000014dd6
>      0.01%  ovsdb-server  libjemalloc.so.2             [.] 0x000000000001633c
>      0.01%  ovsdb-server  libjemalloc.so.2             [.] 0x000000000001636f
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ds_reserve
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hexit_value
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_parser_feed
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
> jsonrpc_session_recv_wait
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_to_json
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] time_msec
>      0.01%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
> ovsdb_monitor_change_set_destroy
>      0.01%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] ovsdb_row_clone
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __audit_inode
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] 
> __audit_syscall_entry
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __dev_queue_xmit
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] 
> __mem_cgroup_commit_charge
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __slab_alloc
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __split_huge_page
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] bond_handle_frame
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] generic_permission
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] 
> get_pageblock_flags_group
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] lock_sock_nested
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] mlx5e_poll_tx_cq
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_poll
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] 
> tcp_stream_memory_free
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] unix_poll
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] unix_release
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] xfs_vn_getattr
>      0.01%  ovsdb-server  libc-2.17.so                 [.] _IO_no_init
>      0.01%  ovsdb-server  libc-2.17.so                 [.] __clock_gettime
>      0.01%  ovsdb-server  libc-2.17.so                 [.] 
> __memmove_ssse3_back
>      0.01%  ovsdb-server  libc-2.17.so                 [.] read_int
>      0.01%  ovsdb-server  libcrypto.so.1.0.2k          [.] gcm_ghash_avx
> <…cut...>
> 
> Thanks everybody, who helped with that! :)
> 
>> 
>>> 
>>>> 
>>>>> 
>>>>> Best regards, Ilya Maximets.
>>>>> 
>>>>>> Samples: 2K of event 'cpu-clock', Event count (approx.): 29989898690
>>>>>> Overhead  Command       Shared Object                Symbol
>>>>>>   58.71%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> uuid_compare_3way
>>>>>>    7.61%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000011058
>>>>>>    6.60%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> ovsdb_atom_compare_3way
>>>>>>    5.93%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> ovsdb_condition_match_any_clause
>>>>>>    2.26%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000011070
>>>>>>    2.19%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> uuid_compare_3way@plt
>>>>>>    2.16%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010ffd
>>>>>>    1.68%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010fe8
>>>>>>    1.65%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x00000000000110ec
>>>>>>    1.38%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000011084
>>>>>>    1.31%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001100e
>>>>>>    1.25%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000011063
>>>>>>    1.08%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> ovsdb_datum_compare_3way
>>>>>>    1.04%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010fe0
>>>>>>    0.67%  ovsdb-server  libc-2.17.so                 [.] _int_malloc
>>>>>>    0.54%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001105a
>>>>>>    0.51%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> ovsdb_monitor_get_update
>>>>>>    0.47%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_hash
>>>>>>    0.30%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_destroy__
>>>>>>    0.24%  ovsdb-server  libc-2.17.so                 [.] 
>>>>>> malloc_consolidate
>>>>>>    0.24%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_atom_hash
>>>>>>    0.20%  ovsdb-server  [kernel.kallsyms]            [k] __do_softirq
>>>>>>    0.17%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_string
>>>>>>    0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> ovsdb_datum_destroy
>>>>>>    0.13%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010fe5
>>>>>>    0.10%  ovsdb-server  libc-2.17.so                 [.] __memset_sse2
>>>>>>    0.10%  ovsdb-server  libc-2.17.so                 [.] 
>>>>>> __strlen_sse2_pminub
>>>>>>    0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010fea
>>>>>>    0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000011020
>>>>>>    0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000011067
>>>>>>    0.07%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> ovsdb_datum_compare_3way@plt
>>>>>>    0.07%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> ovsdb_datum_equals
>>>>>>    0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010ff1
>>>>>>    0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000011025
>>>>>>    0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001108b
>>>>>>    0.03%  ovsdb-server  [kernel.kallsyms]            [k] __netif_schedule
>>>>>>    0.03%  ovsdb-server  libc-2.17.so                 [.] __strncmp_sse42
>>>>>>    0.03%  ovsdb-server  libc-2.17.so                 [.] _int_free
>>>>>>    0.03%  ovsdb-server  libc-2.17.so                 [.] free
>>>>>>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> ovsdb_datum_clone
>>>>>>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> ovsdb_datum_from_json
>>>>>>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] shash_find
>>>>>>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> 0x000000000018f7e2
>>>>>>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 
>>>>>> 0x000000000018f7e8
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> ovsdb_condition_from_json
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010fe1
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x0000000000010fff
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001105e
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x00000000000110c3
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x00000000000110f2
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001c007
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001c030
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001c05b
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001c45f
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001c535
>>>>>>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 
>>>>>> 0x000000000001c644
>>>>>> 
>>>>>>> On 2 May 2023, at 19:45, Ilya Maximets via discuss 
>>>>>>> <ovs-discuss@openvswitch.org> wrote:
>>>>>>> 
>>>>>>> On 5/2/23 16:45, Vladislav Odintsov wrote:
>>>>>>>> Hi Ilya,
>>>>>>>> 
>>>>>>>> let me jump into this thread.
>>>>>>>> 
>>>>>>>> Right now I’m debugging the behaviour of ovn (22.09.x) and 
>>>>>>>> ovsdb-server 3.1.0 where one ovsdb update3 request makes ovsdb-server, 
>>>>>>>> which acts as a relay for OVN Southbound DB with only 5-6 clients 
>>>>>>>> connected to it (ovn-controllers, acting as a central chassis for 
>>>>>>>> external access with enabled ha_group for edge LRs), utilize 100% CPU 
>>>>>>>> during 5-6 minutes.
>>>>>>>> During this time ovsdb relay failes to answer ovsdb inactivity probes 
>>>>>>>> and then clients and even upstream ovsdb-servers disconnect this ovsdb 
>>>>>>>> relay because of ping probe timeout of 60s. All the probe intervals 
>>>>>>>> configured to 60 seconds value (ovsdb-server SB cluster <-> ovsdb SB 
>>>>>>>> relay <-> ovn-controller). Earlier I’ve posted a long-read with some 
>>>>>>>> problems listed [1].
>>>>>>>> 
>>>>>>>> IIUC, this update is generated by ovn-northd after one LS with only 
>>>>>>>> one LSP type router and attached LB is removed.
>>>>>>>> You can see the request json here: [2]
>>>>>>>> Such updates appear not only if LS/LB is removed but also in some 
>>>>>>>> other operations, this is just an example.
>>>>>>>> So it seems like ovn-northd re-creates a big dp group and such update 
>>>>>>>> for some reason is difficult to handle for ovsdb relay (actually 
>>>>>>>> ovn-controllers  also utilize 100% cpu).
>>>>>>>> 
>>>>>>>> Have you seen such behaviour? Maybe you’ve got any suggestion about 
>>>>>>>> the reason and a possible fix for such huge load from one update3?
>>>>>>> 
>>>>>>> Such big updates are typically triggered by the fact that
>>>>>>> northd doesn't modify datapath groups.  It re-creates them
>>>>>>> instead.  I'm actually working on a patch for this since
>>>>>>> last week and hope to post it before the soft freeze for 23.06.
>>>>>>> 
>>>>>>> In your case, however, it seems like you have a new datapath
>>>>>>> group created for a load balancer.  And it looks like you have
>>>>>>> about 1000 switches in the setup, which is a lot, but...
>>>>>>> 
>>>>>>> Looking at the dump [2], it's big for a human, but it's only
>>>>>>> about 100 KB in size.  Database shouldn't have a lot of trouble
>>>>>>> processing that, especially in 3.1.  So, I'm not sure what
>>>>>>> exactly is going on in your case.  If you can profile the relay
>>>>>>> server and see what it is doing all that time, that would be
>>>>>>> helpful.
>>>>>>> 
>>>>>>> Best regards, Ilya Maximets.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> 1: 
>>>>>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-April/403699.html
>>>>>>>> 2: https://gist.github.com/odivlad/bba4443e589a268a0f389c2972511df3
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 2 May 2023, at 14:49, Ilya Maximets via discuss 
>>>>>>>>> <ovs-discuss@openvswitch.org> wrote:
>>>>>>>>> 
>>>>>>>>> Form my side, the first option would be increasing the inactivity
>>>>>>>>> probe on the ovn-controller side and see if that resolves the issue.
>>>>>>>>> Deployments typically have 60+ seconds set, just in case.
>>>>>>>>> 
>>>>>>>>> Also, if you're not already using latest versions of OVS/OVN, upgrade
>>>>>>>>> may resolve the issue as well.  For example, OVS 2.17 provides a big
>>>>>>>>> performance improvement over previous versions and 3.0 and 3.1 give
>>>>>>>>> even more on top.  And with new OVN releases, the southbound database
>>>>>>>>> size usually goes down significantly reducing the load on the OVSDB
>>>>>>>>> server.  I'd suggest to use releases after OVN 22.09 for large scale
>>>>>>>>> deployments.
>>>>>>>>> 
>>>>>>>>> However, if your setup have only one switch with 250 ports and you
>>>>>>>>> have an issue, that should not really be related to scale and you
>>>>>>>>> need to investigate further on what exactly is happening.
>>>>>>>>> 
>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>> 
>>>>>>>>> On 5/2/23 08:58, Felix Hüttner via discuss wrote:
>>>>>>>>>> Hi Gavin,
>>>>>>>>>> 
>>>>>>>>>> we saw similar issues after reaching a certain number of 
>>>>>>>>>> hypervisors. This happened because our ovsdb processes ran at 100% 
>>>>>>>>>> cpu utilization (and they are not multithreaded).
>>>>>>>>>> 
>>>>>>>>>> Our solutions where:
>>>>>>>>>> 
>>>>>>>>>> 1. If you use ssl on your north-/southbound db. Disable it and add a 
>>>>>>>>>> tls terminating reverse proxy (like traefik) in front
>>>>>>>>>> 2. Increase the inactivity probe significantly (you might need to 
>>>>>>>>>> change it on the ovn-controller and ovsdb side, not sure anymore)
>>>>>>>>>> 3. Introduce ovsdb relays and connect the ovn-controllers there.
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> 
>>>>>>>>>> Felix Huettner
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> *From:* discuss <ovs-discuss-boun...@openvswitch.org> *On Behalf Of 
>>>>>>>>>> *Gavin McKee via discuss
>>>>>>>>>> *Sent:* Monday, May 1, 2023 9:20 PM
>>>>>>>>>> *To:* ovs-discuss <ovs-discuss@openvswitch.org>
>>>>>>>>>> *Subject:* [ovs-discuss] CPU pinned at 100% , ovn-controller to 
>>>>>>>>>> ovnsb_db unstable
>>>>>>>>>> 
>>>>>>>>>> Hi ,
>>>>>>>>>> 
>>>>>>>>>> I'm having a pretty bad issue with OVN controller on the hypervisors 
>>>>>>>>>> being unable to connect to the OVS SB DB ,
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 2023-05-01T19:13:33.969Z|00541|reconnect|ERR|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: no response to inactivity probe after 5 
>>>>>>>>>> seconds, disconnecting
>>>>>>>>>> 2023-05-01T19:13:33.969Z|00542|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connection dropped
>>>>>>>>>> 2023-05-01T19:13:43.043Z|00543|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connected
>>>>>>>>>> 2023-05-01T19:13:56.115Z|00544|reconnect|ERR|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: no response to inactivity probe after 5 
>>>>>>>>>> seconds, disconnecting
>>>>>>>>>> 2023-05-01T19:13:56.115Z|00545|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connection dropped
>>>>>>>>>> 2023-05-01T19:14:36.177Z|00546|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connected
>>>>>>>>>> 2023-05-01T19:14:44.996Z|00547|jsonrpc|WARN|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: receive error: Connection reset by peer
>>>>>>>>>> 2023-05-01T19:14:44.996Z|00548|reconnect|WARN|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connection dropped (Connection reset by 
>>>>>>>>>> peer)
>>>>>>>>>> 2023-05-01T19:15:44.131Z|00549|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connected
>>>>>>>>>> 2023-05-01T19:15:54.137Z|00550|reconnect|ERR|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: no response to inactivity probe after 5 
>>>>>>>>>> seconds, disconnecting
>>>>>>>>>> 2023-05-01T19:15:54.137Z|00551|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connection dropped
>>>>>>>>>> 2023-05-01T19:16:02.184Z|00552|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connected
>>>>>>>>>> 2023-05-01T19:16:14.488Z|00553|reconnect|ERR|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: no response to inactivity probe after 5 
>>>>>>>>>> seconds, disconnecting
>>>>>>>>>> 2023-05-01T19:16:14.488Z|00554|reconnect|INFO|tcp:10.193.1.2:6642 
>>>>>>>>>> <http://10.193.1.2:6642/>: connection dropped
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> This happened after pushing a configuration to north db for around 
>>>>>>>>>> 250 logical switch ports.
>>>>>>>>>> 
>>>>>>>>>> Once I turn on the VM's everything goes bad very quickly,
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> 2023-05-01T04:27:09.294Z|01947|poll_loop|INFO|wakeup due to 
>>>>>>>>>> [POLLOUT] on fd 66 (10.193.200.6:6642 
>>>>>>>>>> <http://10.193.200.6:6642/><->10.193.0.102:48794 
>>>>>>>>>> <http://10.193.0.102:48794/>) at ../lib/stream-fd.c:153 (100% CPU 
>>>>>>>>>> usage)
>>>>>>>>>> 
>>>>>>>>>>  
>>>>>>>>>> 
>>>>>>>>>> Can anyone provide any guidance how to run down an issue like this ?
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> discuss mailing list
>>>>>>>>> disc...@openvswitch.org
>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Vladislav Odintsov
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list
>>>>>>> disc...@openvswitch.org
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Vladislav Odintsov
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> discuss mailing list
>>>>>> disc...@openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>> 
>>>> 
>>> 
>>> 
>>> Regards,
>>> Vladislav Odintsov
> 
> 
> Regards,
> Vladislav Odintsov


Regards,
Vladislav Odintsov

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to