Em qua., 10 de mai. de 2023 às 11:55, Vladislav Odintsov via discuss <
ovs-discuss@openvswitch.org> escreveu:

>
>
> On 10 May 2023, at 17:15, Vladislav Odintsov <odiv...@gmail.com> wrote:
>
> Hi all,
>
> On 3 May 2023, at 15:11, Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 5/3/23 12:47, Vladislav Odintsov wrote:
>
> Thanks Ilya for your inputs.
>
> On 2 May 2023, at 21:49, Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> On 5/2/23 19:22, Ilya Maximets wrote:
>
> On 5/2/23 19:04, Vladislav Odintsov via discuss wrote:
>
> I ran perf record -F99 -p $(ovsdb-server) -- sleep 30 on ovsdb-server
> process during CPU spike. perf report result:
>
>
> Could you run it for a couple of minutes during that 5-6 minute window?
>
>
> Sure, here it is (this report was collected during ~3 minutes while
> ovsdb-server was under 100% CPU load):
>
> # To display the perf.data header info, please use --header/--header-only
> options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 12K of event 'cpu-clock'
> # Event count (approx.): 130030301730
> #
> # Overhead  Command       Shared Object                Symbol
> # ........  ............  ...........................
>  ......................................
> #
>     21.20%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> uuid_compare_3way
>     10.49%  ovsdb-server  libc-2.17.so                 [.]
> malloc_consolidate
>     10.04%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_clause_evaluate
>      9.40%  ovsdb-server  libc-2.17.so                 [.] _int_malloc
>      6.42%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_destroy__
>      4.36%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_atom_compare_3way
>      3.29%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_serialize_string
>      3.23%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_condition_match_any_clause
>      3.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_serialize
>      2.60%  ovsdb-server  [kernel.kallsyms]            [k] clear_page_c_e
>      1.87%  ovsdb-server  libc-2.17.so                 [.]
> __memcpy_ssse3_back
>      1.80%  ovsdb-server  libc-2.17.so                 [.] free
>      1.67%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_serialize_object_member
>      1.60%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_atom_is_default
>      1.47%  ovsdb-server  libc-2.17.so                 [.] vfprintf
>      1.17%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] resize
>      1.12%  ovsdb-server  libc-2.17.so                 [.] _int_free
>      1.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_atom_compare_3way@plt
>      1.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] shash_find__
>
>
> Thanks!  Yeah, the conditional monitoring appears to be the
> main issue here.
>
> <snip>
>
>
> Also, is it a single 5-6 minute poll interval or a several shorter ones
> (from the log)?
>
>
> I see from monitoring that during these 5-6 minutes ovsdb-server utilizes
> 100% of one core. With top command it is also constantly shown as 100%.
>
> Worth to add that I see next log warnings in ovsdb-server relay (long poll
> interval for 84 seconds):
>
> 2023-05-03T10:21:53.928Z|11522|timeval|WARN|Unreasonably long 84348ms poll
> interval (84270ms user, 14ms system)
> 2023-05-03T10:21:53.931Z|11523|timeval|WARN|context switches: 0 voluntary,
> 229 involuntary
> 2023-05-03T10:21:53.933Z|11524|coverage|INFO|Skipping details of duplicate
> event coverage for hash=580d57f8
> 2023-05-03T10:21:53.935Z|11525|poll_loop|INFO|wakeup due to [POLLIN] on fd
> 21 (0.0.0.0:6642<->) at lib/stream-ssl.c:978 (99% CPU usage)
> 2023-05-03T10:21:54.094Z|11526|stream_ssl|WARN|SSL_write: system error
> (Broken pipe)
> 2023-05-03T10:21:54.096Z|11527|jsonrpc|WARN|ssl:x.x.x.x:46894: send error:
> Broken pipe
> 2023-05-03T10:21:54.120Z|11528|stream_ssl|WARN|SSL_accept: unexpected SSL
> connection close
> 2023-05-03T10:21:54.120Z|11529|jsonrpc|WARN|ssl:x.x.x.x:46950: receive
> error: Protocol error
> 2023-05-03T10:21:54.122Z|11530|poll_loop|INFO|wakeup due to [POLLIN] on fd
> 21 (0.0.0.0:6642<->) at lib/stream-ssl.c:978 (99% CPU usage)
>
>
> And you seem to miss some debug symbols for libovsdb.
>
>
> Thanks. Now I’ve installed openvswitch-debuginfo package.
>
>
> One potentially quick fix for your setup would be to disable conditional
> monitoring, i.e. set ovn-monitor-all=true.  You can see the condition
> comparison functions are high in the perf output.  It doesn't scale well.
>
>
> I’ve looked the docs about ovn-monitor-all and it seems that it is what I
> need, because it says:
>
>    …  Typically, set it to true for environments that all workloads need
> to be reachable from each other.
>
> In my case these servers handle centralized NAT service and need logical
> connection with most hosts.
>
>
> Yeah.  In general, setting it to true mostly affects memory usage
> of the ovn-controller.  Might slightly increase CPU usage as well,
> but that should not be critical.  It dramatically reduces load on
> the SB DB server though.  So, that's a trade off.
>
> FWIW, the man page seems to be incorrect.  It should say 'OVN database'
> instead of 'ovs-database', as this option affects SB connection and not
> the connection to a local OVS database.
>
>
>
> FWIW, conditional monitoring inefficiencies are not all related to
> the database server.  There are significant inefficiencies in a way
> ovn-controller creates condition clauses:
>  https://bugzilla.redhat.com/show_bug.cgi?id=2139194
> Since you're removing a port in your setup, that likely triggers
> condition change requests from all controllers for which this port
> is local.  Assuming you have N ports total in the cluster and
> M ports per node, conditions will contain M * 2 clauses and the
> server will have to perform N * M * 2 comparisons per controller
> on a condition change request.
>
>
> Gonna switch ovn-controllers which connected to one ovsdb relay to
> ovn-monitor-all=true and compare load after this change with other relays,
> whose same clients left intact.
>
> I’ll return with results...
>
>
> Looking forward to see them!
>
>
> So after more than one and a half weeks of work of one ovsdb-server, which
> acts as a dedicated relay for ovn-controllers, which serve centralized NAT
> services, I can say that ovn-monitor-all=true option totally helps: the
> load spikes on ovsdb-server process has gone.
> This is confirmed by monitoring graphs from another ovsdb-server relays,
> where I see spikes of CPU load and don’t see on a server with
> re-configured clients.
>
> Also, I’ve recompiled OVS with jemalloc (thnx @Felix Hüttner for the
> hint). In a short summary, I see that there is almost no work with memory
> management, which took significant amount of time with glibc
> implementation. It was ~10.5% in malloc_consolidate plus ~9.4% in
> _int_malloc.
>
>
> Forgot to add: memory footprint for ovsdb-server with jemalloc reduced by
> ~22% comparing to glibc after start and didn’t grow during this test period.
>

Hi Vladislav,

Thanks for bringing this up! This is the old known performance improvement
with the lib jemalloc.

Last year I asked Ilya during the talk at ovscon if he also confirmed about
the performance improvement with jemalloc, and I also asked the reason for
this lib not being linked as default in the build of releases.

I don't know if it's being used on the Red Hat side, but on the Canonical
side maybe Frode could have a position on this. At this point, I found a
problem with the jemalloc and linker in the build scripts for Ubuntu Jammy.
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2015748

Best regards,


>
> Now I’ve got next perf report from ovsdb-server, which has 6
> ovn-controllers connected with disabled ovn-monitor-all. Compiled with
> jemalloc:
>
> # Total Lost Samples: 0
> #
> # Samples: 21K of event 'cpu-clock'
> # Event count (approx.): 220989896780
> #
> # Overhead  Command       Shared Object                Symbol
> # ........  ............  ...........................
>  .............................................
> #
>     44.57%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_clause_evaluate
>     13.84%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> uuid_compare_3way
>     13.19%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_atom_compare_3way
>      9.87%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_condition_match_any_clause
>      3.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> uuid_compare_3way@plt
>      1.11%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_destroy__
>      1.10%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_serialize_string
>      0.98%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_compare_3way
>      0.79%  ovsdb-server  [kernel.kallsyms]            [k] clear_page_c_e
>      0.73%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_serialize
>      0.68%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_string
>      0.63%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_atom_is_default
>      0.59%  ovsdb-server  libc-2.17.so                 [.] vfprintf
>      0.51%  ovsdb-server  libc-2.17.so                 [.]
> __memcpy_ssse3_back
>      0.42%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_clone
>      0.38%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] shash_find__
>      0.28%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_changes_update
>      0.28%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_serialize_object_member
>      0.27%  ovsdb-server  libjemalloc.so.2             [.] free
>      0.22%  ovsdb-server  libc-2.17.so                 [.]
> _IO_default_xsputn
>      0.21%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] resize
>      0.21%  ovsdb-server  [kernel.kallsyms]            [k] __do_softirq
>      0.18%  ovsdb-server  libjemalloc.so.2             [.] malloc
>      0.16%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hash_bytes
>      0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> shash_add_nocopy__
>      0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_destroy
>      0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_equals
>      0.12%  ovsdb-server  [kernel.kallsyms]            [k]
> audit_filter_rules.isra.8
>      0.12%  ovsdb-server  libjemalloc.so.2             [.]
> 0x00000000000733c0
>      0.12%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_atom_compare_3way@plt
>      0.11%  ovsdb-server  [kernel.kallsyms]            [k]
> system_call_after_swapgs
>      0.10%  ovsdb-server  [kernel.kallsyms]            [k]
> audit_filter_syscall
>      0.10%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_compare_3way@plt
>      0.10%  ovsdb-server  libc-2.17.so                 [.] __strncmp_sse42
>      0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_row_skip_update.part.0
>      0.09%  ovsdb-server  libc-2.17.so                 [.] _itoa_word
>      0.09%  ovsdb-server  libjemalloc.so.2             [.]
> 0x0000000000014d7a
>      0.09%  ovsdb-server  libjemalloc.so.2             [.]
> 0x000000000007341b
>      0.08%  ovsdb-server  libc-2.17.so                 [.] __strchrnul
>      0.08%  ovsdb-server  libjemalloc.so.2             [.]
> 0x000000000004b81d
>      0.08%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_get_update
>      0.07%  ovsdb-server  [kernel.kallsyms]            [k]
> get_page_from_freelist
>      0.07%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] xvasprintf
>      0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_compose_row_update2
>      0.07%  ovsdb-server  libc-2.17.so                 [.] __memset_sse2
>      0.06%  ovsdb-server  libc-2.17.so                 [.]
> _IO_str_init_static_internal
>      0.06%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_string_create_nocopy
>      0.06%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_is_default
>      0.05%  ovsdb-server  [kernel.kallsyms]            [k]
> copy_user_enhanced_fast_string
>      0.05%  ovsdb-server  libc-2.17.so                 [.]
> __strlen_sse2_pminub
>      0.05%  ovsdb-server  libc-2.17.so                 [.] __vsnprintf_chk
>      0.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ds_put_format_valist
>      0.05%  ovsdb-server  libjemalloc.so.2             [.]
> 0x000000000004b810
>      0.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ds_put_uninit
>      0.05%  ovsdb-server  [kernel.kallsyms]            [k] __d_lookup_rcu
>      0.05%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hmap_swap
>      0.05%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_get_initial
>      0.05%  ovsdb-server  libpthread-2.17.so           [.] __libc_accept
>      0.05%  ovsdb-server  libpthread-2.17.so           [.] __read_nocancel
>      0.04%  ovsdb-server  [kernel.kallsyms]            [k]
> _raw_spin_unlock_irqrestore
>      0.04%  ovsdb-server  [kernel.kallsyms]            [k]
> finish_task_switch
>      0.04%  ovsdb-server  ld-2.17.so                   [.] __tls_get_addr
>      0.04%  ovsdb-server  libc-2.17.so                 [.] __strcmp_sse42
>      0.04%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_array_create
>      0.04%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_atom_to_json__
>      0.04%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_compose_update
>      0.03%  ovsdb-server  libc-2.17.so                 [.] __xstat64
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ds_put_buffer
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_object_create
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] xmalloc
>      0.03%  ovsdb-server  libpthread-2.17.so           [.]
> __write_nocancel
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_integer_create
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_object_put
>      0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_atom_hash
>      0.03%  ovsdb-server  ovsdb-server                 [.] main
>      0.02%  ovsdb-server  [kernel.kallsyms]            [k] __do_page_fault
>      0.02%  ovsdb-server  [kernel.kallsyms]            [k]
> audit_filter_inodes
>      0.02%  ovsdb-server  [vdso]                       [.]
> __vdso_clock_gettime
>      0.02%  ovsdb-server  libc-2.17.so                 [.] __poll_nocancel
>      0.02%  ovsdb-server  libjemalloc.so.2             [.]
> 0x0000000000014d1a
>      0.02%  ovsdb-server  libjemalloc.so.2             [.]
> 0x000000000004b82a
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hmap_init
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> json_array_create_2
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] xmalloc__
>      0.02%  ovsdb-server  [kernel.kallsyms]            [k] fget_light
>      0.02%  ovsdb-server  libjemalloc.so.2             [.]
> 0x0000000000016379
>      0.02%  ovsdb-server  libjemalloc.so.2             [.]
> 0x000000000004b899
>      0.02%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_hash
>      0.02%  ovsdb-server  libpthread-2.17.so           [.]
> pthread_mutex_trylock
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> __audit_syscall_exit
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> __mem_cgroup_uncharge_common
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __memcpy
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __pollwait
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> free_pages_prepare
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] inode_permission
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] iput
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] kmem_cache_alloc
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> run_timer_softirq
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_current_mss
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_recvmsg
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_sendmsg
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] vfs_read
>      0.01%  ovsdb-server  libc-2.17.so                 [.] _IO_old_init
>      0.01%  ovsdb-server  libc-2.17.so                 [.] _IO_setb
>      0.01%  ovsdb-server  libc-2.17.so                 [.] __strchr_sse42
>      0.01%  ovsdb-server  libjemalloc.so.2             [.]
> 0x0000000000014dd6
>      0.01%  ovsdb-server  libjemalloc.so.2             [.]
> 0x000000000001633c
>      0.01%  ovsdb-server  libjemalloc.so.2             [.]
> 0x000000000001636f
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ds_reserve
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] hexit_value
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_parser_feed
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> jsonrpc_session_recv_wait
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_to_json
>      0.01%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] time_msec
>      0.01%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_change_set_destroy
>      0.01%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] ovsdb_row_clone
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __audit_inode
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> __audit_syscall_entry
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __dev_queue_xmit
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> __mem_cgroup_commit_charge
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] __slab_alloc
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> __split_huge_page
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> bond_handle_frame
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> generic_permission
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> get_pageblock_flags_group
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] lock_sock_nested
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] mlx5e_poll_tx_cq
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] tcp_poll
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k]
> tcp_stream_memory_free
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] unix_poll
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] unix_release
>      0.01%  ovsdb-server  [kernel.kallsyms]            [k] xfs_vn_getattr
>      0.01%  ovsdb-server  libc-2.17.so                 [.] _IO_no_init
>      0.01%  ovsdb-server  libc-2.17.so                 [.] __clock_gettime
>      0.01%  ovsdb-server  libc-2.17.so                 [.]
> __memmove_ssse3_back
>      0.01%  ovsdb-server  libc-2.17.so                 [.] read_int
>      0.01%  ovsdb-server  libcrypto.so.1.0.2k          [.] gcm_ghash_avx
> <…cut...>
>
> Thanks everybody, who helped with that! :)
>
>
>
>
>
> Best regards, Ilya Maximets.
>
> Samples: 2K of event 'cpu-clock', Event count (approx.): 29989898690
> Overhead  Command       Shared Object                Symbol
>   58.71%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] uuid_compare_3way
>    7.61%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000011058
>    6.60%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_atom_compare_3way
>    5.93%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_condition_match_any_clause
>    2.26%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000011070
>    2.19%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> uuid_compare_3way@plt
>    2.16%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010ffd
>    1.68%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010fe8
>    1.65%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x00000000000110ec
>    1.38%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000011084
>    1.31%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001100e
>    1.25%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000011063
>    1.08%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_compare_3way
>    1.04%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010fe0
>    0.67%  ovsdb-server  libc-2.17.so                 [.] _int_malloc
>    0.54%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001105a
>    0.51%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_monitor_get_update
>    0.47%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_hash
>    0.30%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_destroy__
>    0.24%  ovsdb-server  libc-2.17.so                 [.]
> malloc_consolidate
>    0.24%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_atom_hash
>    0.20%  ovsdb-server  [kernel.kallsyms]            [k] __do_softirq
>    0.17%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] json_string
>    0.13%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_destroy
>    0.13%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010fe5
>    0.10%  ovsdb-server  libc-2.17.so                 [.] __memset_sse2
>    0.10%  ovsdb-server  libc-2.17.so                 [.]
> __strlen_sse2_pminub
>    0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010fea
>    0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000011020
>    0.10%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000011067
>    0.07%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_compare_3way@plt
>    0.07%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_equals
>    0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010ff1
>    0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000011025
>    0.07%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001108b
>    0.03%  ovsdb-server  [kernel.kallsyms]            [k] __netif_schedule
>    0.03%  ovsdb-server  libc-2.17.so                 [.] __strncmp_sse42
>    0.03%  ovsdb-server  libc-2.17.so                 [.] _int_free
>    0.03%  ovsdb-server  libc-2.17.so                 [.] free
>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] ovsdb_datum_clone
>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.]
> ovsdb_datum_from_json
>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] shash_find
>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 0x000000000018f7e2
>    0.03%  ovsdb-server  libopenvswitch-3.1.so.0.0.0  [.] 0x000000000018f7e8
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.]
> ovsdb_condition_from_json
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010fe1
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x0000000000010fff
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001105e
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x00000000000110c3
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x00000000000110f2
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001c007
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001c030
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001c05b
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001c45f
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001c535
>    0.03%  ovsdb-server  libovsdb-3.1.so.0.0.0        [.] 0x000000000001c644
>
> On 2 May 2023, at 19:45, Ilya Maximets via discuss <
> ovs-discuss@openvswitch.org> wrote:
>
> On 5/2/23 16:45, Vladislav Odintsov wrote:
>
> Hi Ilya,
>
> let me jump into this thread.
>
> Right now I’m debugging the behaviour of ovn (22.09.x) and ovsdb-server
> 3.1.0 where one ovsdb update3 request makes ovsdb-server, which acts as a
> relay for OVN Southbound DB with only 5-6 clients connected to it
> (ovn-controllers, acting as a central chassis for external access with
> enabled ha_group for edge LRs), utilize 100% CPU during 5-6 minutes.
> During this time ovsdb relay failes to answer ovsdb inactivity probes and
> then clients and even upstream ovsdb-servers disconnect this ovsdb relay
> because of ping probe timeout of 60s. All the probe intervals configured to
> 60 seconds value (ovsdb-server SB cluster <-> ovsdb SB relay <->
> ovn-controller). Earlier I’ve posted a long-read with some problems listed
> [1].
>
> IIUC, this update is generated by ovn-northd after one LS with only one
> LSP type router and attached LB is removed.
> You can see the request json here: [2]
> Such updates appear not only if LS/LB is removed but also in some other
> operations, this is just an example.
> So it seems like ovn-northd re-creates a big dp group and such update for
> some reason is difficult to handle for ovsdb relay (actually
> ovn-controllers  also utilize 100% cpu).
>
> Have you seen such behaviour? Maybe you’ve got any suggestion about the
> reason and a possible fix for such huge load from one update3?
>
>
> Such big updates are typically triggered by the fact that
> northd doesn't modify datapath groups.  It re-creates them
> instead.  I'm actually working on a patch for this since
> last week and hope to post it before the soft freeze for 23.06.
>
> In your case, however, it seems like you have a new datapath
> group created for a load balancer.  And it looks like you have
> about 1000 switches in the setup, which is a lot, but...
>
> Looking at the dump [2], it's big for a human, but it's only
> about 100 KB in size.  Database shouldn't have a lot of trouble
> processing that, especially in 3.1.  So, I'm not sure what
> exactly is going on in your case.  If you can profile the relay
> server and see what it is doing all that time, that would be
> helpful.
>
> Best regards, Ilya Maximets.
>
>
> Thanks.
>
> 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-April/403699.html
> 2: https://gist.github.com/odivlad/bba4443e589a268a0f389c2972511df3
>
>
> On 2 May 2023, at 14:49, Ilya Maximets via discuss <
> ovs-discuss@openvswitch.org> wrote:
>
> Form my side, the first option would be increasing the inactivity
> probe on the ovn-controller side and see if that resolves the issue.
> Deployments typically have 60+ seconds set, just in case.
>
> Also, if you're not already using latest versions of OVS/OVN, upgrade
> may resolve the issue as well.  For example, OVS 2.17 provides a big
> performance improvement over previous versions and 3.0 and 3.1 give
> even more on top.  And with new OVN releases, the southbound database
> size usually goes down significantly reducing the load on the OVSDB
> server.  I'd suggest to use releases after OVN 22.09 for large scale
> deployments.
>
> However, if your setup have only one switch with 250 ports and you
> have an issue, that should not really be related to scale and you
> need to investigate further on what exactly is happening.
>
> Best regards, Ilya Maximets.
>
> On 5/2/23 08:58, Felix Hüttner via discuss wrote:
>
> Hi Gavin,
>
> we saw similar issues after reaching a certain number of hypervisors. This
> happened because our ovsdb processes ran at 100% cpu utilization (and they
> are not multithreaded).
>
> Our solutions where:
>
> 1. If you use ssl on your north-/southbound db. Disable it and add a tls
> terminating reverse proxy (like traefik) in front
> 2. Increase the inactivity probe significantly (you might need to change
> it on the ovn-controller and ovsdb side, not sure anymore)
> 3. Introduce ovsdb relays and connect the ovn-controllers there.
>
> --
>
> Felix Huettner
>
>
>
> *From:* discuss <ovs-discuss-boun...@openvswitch.org> *On Behalf Of
> *Gavin McKee via discuss
> *Sent:* Monday, May 1, 2023 9:20 PM
> *To:* ovs-discuss <ovs-discuss@openvswitch.org>
> *Subject:* [ovs-discuss] CPU pinned at 100% , ovn-controller to ovnsb_db
> unstable
>
> Hi ,
>
> I'm having a pretty bad issue with OVN controller on the hypervisors being
> unable to connect to the OVS SB DB ,
>
>
>
> 2023-05-01T19:13:33.969Z|00541|reconnect|ERR|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: no response to inactivity probe after 5
> seconds, disconnecting
> 2023-05-01T19:13:33.969Z|00542|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connection dropped
> 2023-05-01T19:13:43.043Z|00543|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connected
> 2023-05-01T19:13:56.115Z|00544|reconnect|ERR|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: no response to inactivity probe after 5
> seconds, disconnecting
> 2023-05-01T19:13:56.115Z|00545|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connection dropped
> 2023-05-01T19:14:36.177Z|00546|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connected
> 2023-05-01T19:14:44.996Z|00547|jsonrpc|WARN|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: receive error: Connection reset by peer
> 2023-05-01T19:14:44.996Z|00548|reconnect|WARN|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connection dropped (Connection reset by peer)
> 2023-05-01T19:15:44.131Z|00549|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connected
> 2023-05-01T19:15:54.137Z|00550|reconnect|ERR|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: no response to inactivity probe after 5
> seconds, disconnecting
> 2023-05-01T19:15:54.137Z|00551|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connection dropped
> 2023-05-01T19:16:02.184Z|00552|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connected
> 2023-05-01T19:16:14.488Z|00553|reconnect|ERR|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: no response to inactivity probe after 5
> seconds, disconnecting
> 2023-05-01T19:16:14.488Z|00554|reconnect|INFO|tcp:10.193.1.2:6642 <
> http://10.193.1.2:6642/>: connection dropped
>
>
>
> This happened after pushing a configuration to north db for around 250
> logical switch ports.
>
> Once I turn on the VM's everything goes bad very quickly,
>
>
>
>
>
> 2023-05-01T04:27:09.294Z|01947|poll_loop|INFO|wakeup due to [POLLOUT] on
> fd 66 (10.193.200.6:6642 <http://10.193.200.6:6642/><->10.193.0.102:48794
> <http://10.193.0.102:48794/>) at ../lib/stream-fd.c:153 (100% CPU usage)
>
>
>
> Can anyone provide any guidance how to run down an issue like this ?
>
>
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
>
>
>
> Regards,
> Vladislav Odintsov
>
>
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
>
> Regards,
> Vladislav Odintsov
>
>
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
>
>
>
> Regards,
> Vladislav Odintsov
>
>
>
> Regards,
> Vladislav Odintsov
>
>
>
> Regards,
> Vladislav Odintsov
>
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to