Em qua., 10 de mai. de 2023 às 11:55, Vladislav Odintsov via discuss < ovs-discuss@openvswitch.org> escreveu:
> > > On 10 May 2023, at 17:15, Vladislav Odintsov <odiv...@gmail.com> wrote: > > Hi all, > > On 3 May 2023, at 15:11, Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 5/3/23 12:47, Vladislav Odintsov wrote: > > Thanks Ilya for your inputs. > > On 2 May 2023, at 21:49, Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 5/2/23 19:22, Ilya Maximets wrote: > > On 5/2/23 19:04, Vladislav Odintsov via discuss wrote: > > I ran perf record -F99 -p $(ovsdb-server) -- sleep 30 on ovsdb-server > process during CPU spike. perf report result: > > > Could you run it for a couple of minutes during that 5-6 minute window? > > > Sure, here it is (this report was collected during ~3 minutes while > ovsdb-server was under 100% CPU load): > > # To display the perf.data header info, please use --header/--header-only > options. > # > # > # Total Lost Samples: 0 > # > # Samples: 12K of event 'cpu-clock' > # Event count (approx.): 130030301730 > # > # Overhead Command Shared Object Symbol > # ........ ............ ........................... > ...................................... > # > 21.20% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > uuid_compare_3way > 10.49% ovsdb-server libc-2.17.so [.] > malloc_consolidate > 10.04% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_clause_evaluate > 9.40% ovsdb-server libc-2.17.so [.] _int_malloc > 6.42% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_destroy__ > 4.36% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_atom_compare_3way > 3.29% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_serialize_string > 3.23% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_condition_match_any_clause > 3.05% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_serialize > 2.60% ovsdb-server [kernel.kallsyms] [k] clear_page_c_e > 1.87% ovsdb-server libc-2.17.so [.] > __memcpy_ssse3_back > 1.80% ovsdb-server libc-2.17.so [.] free > 1.67% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_serialize_object_member > 1.60% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_atom_is_default > 1.47% ovsdb-server libc-2.17.so [.] vfprintf > 1.17% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] resize > 1.12% ovsdb-server libc-2.17.so [.] _int_free > 1.10% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_atom_compare_3way@plt > 1.05% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] shash_find__ > > > Thanks! Yeah, the conditional monitoring appears to be the > main issue here. > > <snip> > > > Also, is it a single 5-6 minute poll interval or a several shorter ones > (from the log)? > > > I see from monitoring that during these 5-6 minutes ovsdb-server utilizes > 100% of one core. With top command it is also constantly shown as 100%. > > Worth to add that I see next log warnings in ovsdb-server relay (long poll > interval for 84 seconds): > > 2023-05-03T10:21:53.928Z|11522|timeval|WARN|Unreasonably long 84348ms poll > interval (84270ms user, 14ms system) > 2023-05-03T10:21:53.931Z|11523|timeval|WARN|context switches: 0 voluntary, > 229 involuntary > 2023-05-03T10:21:53.933Z|11524|coverage|INFO|Skipping details of duplicate > event coverage for hash=580d57f8 > 2023-05-03T10:21:53.935Z|11525|poll_loop|INFO|wakeup due to [POLLIN] on fd > 21 (0.0.0.0:6642<->) at lib/stream-ssl.c:978 (99% CPU usage) > 2023-05-03T10:21:54.094Z|11526|stream_ssl|WARN|SSL_write: system error > (Broken pipe) > 2023-05-03T10:21:54.096Z|11527|jsonrpc|WARN|ssl:x.x.x.x:46894: send error: > Broken pipe > 2023-05-03T10:21:54.120Z|11528|stream_ssl|WARN|SSL_accept: unexpected SSL > connection close > 2023-05-03T10:21:54.120Z|11529|jsonrpc|WARN|ssl:x.x.x.x:46950: receive > error: Protocol error > 2023-05-03T10:21:54.122Z|11530|poll_loop|INFO|wakeup due to [POLLIN] on fd > 21 (0.0.0.0:6642<->) at lib/stream-ssl.c:978 (99% CPU usage) > > > And you seem to miss some debug symbols for libovsdb. > > > Thanks. Now I’ve installed openvswitch-debuginfo package. > > > One potentially quick fix for your setup would be to disable conditional > monitoring, i.e. set ovn-monitor-all=true. You can see the condition > comparison functions are high in the perf output. It doesn't scale well. > > > I’ve looked the docs about ovn-monitor-all and it seems that it is what I > need, because it says: > > … Typically, set it to true for environments that all workloads need > to be reachable from each other. > > In my case these servers handle centralized NAT service and need logical > connection with most hosts. > > > Yeah. In general, setting it to true mostly affects memory usage > of the ovn-controller. Might slightly increase CPU usage as well, > but that should not be critical. It dramatically reduces load on > the SB DB server though. So, that's a trade off. > > FWIW, the man page seems to be incorrect. It should say 'OVN database' > instead of 'ovs-database', as this option affects SB connection and not > the connection to a local OVS database. > > > > FWIW, conditional monitoring inefficiencies are not all related to > the database server. There are significant inefficiencies in a way > ovn-controller creates condition clauses: > https://bugzilla.redhat.com/show_bug.cgi?id=2139194 > Since you're removing a port in your setup, that likely triggers > condition change requests from all controllers for which this port > is local. Assuming you have N ports total in the cluster and > M ports per node, conditions will contain M * 2 clauses and the > server will have to perform N * M * 2 comparisons per controller > on a condition change request. > > > Gonna switch ovn-controllers which connected to one ovsdb relay to > ovn-monitor-all=true and compare load after this change with other relays, > whose same clients left intact. > > I’ll return with results... > > > Looking forward to see them! > > > So after more than one and a half weeks of work of one ovsdb-server, which > acts as a dedicated relay for ovn-controllers, which serve centralized NAT > services, I can say that ovn-monitor-all=true option totally helps: the > load spikes on ovsdb-server process has gone. > This is confirmed by monitoring graphs from another ovsdb-server relays, > where I see spikes of CPU load and don’t see on a server with > re-configured clients. > > Also, I’ve recompiled OVS with jemalloc (thnx @Felix Hüttner for the > hint). In a short summary, I see that there is almost no work with memory > management, which took significant amount of time with glibc > implementation. It was ~10.5% in malloc_consolidate plus ~9.4% in > _int_malloc. > > > Forgot to add: memory footprint for ovsdb-server with jemalloc reduced by > ~22% comparing to glibc after start and didn’t grow during this test period. > Hi Vladislav, Thanks for bringing this up! This is the old known performance improvement with the lib jemalloc. Last year I asked Ilya during the talk at ovscon if he also confirmed about the performance improvement with jemalloc, and I also asked the reason for this lib not being linked as default in the build of releases. I don't know if it's being used on the Red Hat side, but on the Canonical side maybe Frode could have a position on this. At this point, I found a problem with the jemalloc and linker in the build scripts for Ubuntu Jammy. https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2015748 Best regards, > > Now I’ve got next perf report from ovsdb-server, which has 6 > ovn-controllers connected with disabled ovn-monitor-all. Compiled with > jemalloc: > > # Total Lost Samples: 0 > # > # Samples: 21K of event 'cpu-clock' > # Event count (approx.): 220989896780 > # > # Overhead Command Shared Object Symbol > # ........ ............ ........................... > ............................................. > # > 44.57% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_clause_evaluate > 13.84% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > uuid_compare_3way > 13.19% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_atom_compare_3way > 9.87% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_condition_match_any_clause > 3.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > uuid_compare_3way@plt > 1.11% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_destroy__ > 1.10% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_serialize_string > 0.98% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_compare_3way > 0.79% ovsdb-server [kernel.kallsyms] [k] clear_page_c_e > 0.73% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_serialize > 0.68% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_string > 0.63% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_atom_is_default > 0.59% ovsdb-server libc-2.17.so [.] vfprintf > 0.51% ovsdb-server libc-2.17.so [.] > __memcpy_ssse3_back > 0.42% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_clone > 0.38% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] shash_find__ > 0.28% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_changes_update > 0.28% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_serialize_object_member > 0.27% ovsdb-server libjemalloc.so.2 [.] free > 0.22% ovsdb-server libc-2.17.so [.] > _IO_default_xsputn > 0.21% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] resize > 0.21% ovsdb-server [kernel.kallsyms] [k] __do_softirq > 0.18% ovsdb-server libjemalloc.so.2 [.] malloc > 0.16% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] hash_bytes > 0.13% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > shash_add_nocopy__ > 0.13% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_destroy > 0.13% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_equals > 0.12% ovsdb-server [kernel.kallsyms] [k] > audit_filter_rules.isra.8 > 0.12% ovsdb-server libjemalloc.so.2 [.] > 0x00000000000733c0 > 0.12% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_atom_compare_3way@plt > 0.11% ovsdb-server [kernel.kallsyms] [k] > system_call_after_swapgs > 0.10% ovsdb-server [kernel.kallsyms] [k] > audit_filter_syscall > 0.10% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_compare_3way@plt > 0.10% ovsdb-server libc-2.17.so [.] __strncmp_sse42 > 0.10% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_row_skip_update.part.0 > 0.09% ovsdb-server libc-2.17.so [.] _itoa_word > 0.09% ovsdb-server libjemalloc.so.2 [.] > 0x0000000000014d7a > 0.09% ovsdb-server libjemalloc.so.2 [.] > 0x000000000007341b > 0.08% ovsdb-server libc-2.17.so [.] __strchrnul > 0.08% ovsdb-server libjemalloc.so.2 [.] > 0x000000000004b81d > 0.08% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_get_update > 0.07% ovsdb-server [kernel.kallsyms] [k] > get_page_from_freelist > 0.07% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] xvasprintf > 0.07% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_compose_row_update2 > 0.07% ovsdb-server libc-2.17.so [.] __memset_sse2 > 0.06% ovsdb-server libc-2.17.so [.] > _IO_str_init_static_internal > 0.06% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_string_create_nocopy > 0.06% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_is_default > 0.05% ovsdb-server [kernel.kallsyms] [k] > copy_user_enhanced_fast_string > 0.05% ovsdb-server libc-2.17.so [.] > __strlen_sse2_pminub > 0.05% ovsdb-server libc-2.17.so [.] __vsnprintf_chk > 0.05% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ds_put_format_valist > 0.05% ovsdb-server libjemalloc.so.2 [.] > 0x000000000004b810 > 0.05% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ds_put_uninit > 0.05% ovsdb-server [kernel.kallsyms] [k] __d_lookup_rcu > 0.05% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] hmap_swap > 0.05% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_get_initial > 0.05% ovsdb-server libpthread-2.17.so [.] __libc_accept > 0.05% ovsdb-server libpthread-2.17.so [.] __read_nocancel > 0.04% ovsdb-server [kernel.kallsyms] [k] > _raw_spin_unlock_irqrestore > 0.04% ovsdb-server [kernel.kallsyms] [k] > finish_task_switch > 0.04% ovsdb-server ld-2.17.so [.] __tls_get_addr > 0.04% ovsdb-server libc-2.17.so [.] __strcmp_sse42 > 0.04% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_array_create > 0.04% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_atom_to_json__ > 0.04% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_compose_update > 0.03% ovsdb-server libc-2.17.so [.] __xstat64 > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ds_put_buffer > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_object_create > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] xmalloc > 0.03% ovsdb-server libpthread-2.17.so [.] > __write_nocancel > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_integer_create > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_object_put > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ovsdb_atom_hash > 0.03% ovsdb-server ovsdb-server [.] main > 0.02% ovsdb-server [kernel.kallsyms] [k] __do_page_fault > 0.02% ovsdb-server [kernel.kallsyms] [k] > audit_filter_inodes > 0.02% ovsdb-server [vdso] [.] > __vdso_clock_gettime > 0.02% ovsdb-server libc-2.17.so [.] __poll_nocancel > 0.02% ovsdb-server libjemalloc.so.2 [.] > 0x0000000000014d1a > 0.02% ovsdb-server libjemalloc.so.2 [.] > 0x000000000004b82a > 0.02% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] hmap_init > 0.02% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > json_array_create_2 > 0.02% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] xmalloc__ > 0.02% ovsdb-server [kernel.kallsyms] [k] fget_light > 0.02% ovsdb-server libjemalloc.so.2 [.] > 0x0000000000016379 > 0.02% ovsdb-server libjemalloc.so.2 [.] > 0x000000000004b899 > 0.02% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ovsdb_datum_hash > 0.02% ovsdb-server libpthread-2.17.so [.] > pthread_mutex_trylock > 0.01% ovsdb-server [kernel.kallsyms] [k] > __audit_syscall_exit > 0.01% ovsdb-server [kernel.kallsyms] [k] > __mem_cgroup_uncharge_common > 0.01% ovsdb-server [kernel.kallsyms] [k] __memcpy > 0.01% ovsdb-server [kernel.kallsyms] [k] __pollwait > 0.01% ovsdb-server [kernel.kallsyms] [k] > free_pages_prepare > 0.01% ovsdb-server [kernel.kallsyms] [k] inode_permission > 0.01% ovsdb-server [kernel.kallsyms] [k] iput > 0.01% ovsdb-server [kernel.kallsyms] [k] kmem_cache_alloc > 0.01% ovsdb-server [kernel.kallsyms] [k] > run_timer_softirq > 0.01% ovsdb-server [kernel.kallsyms] [k] tcp_current_mss > 0.01% ovsdb-server [kernel.kallsyms] [k] tcp_recvmsg > 0.01% ovsdb-server [kernel.kallsyms] [k] tcp_sendmsg > 0.01% ovsdb-server [kernel.kallsyms] [k] vfs_read > 0.01% ovsdb-server libc-2.17.so [.] _IO_old_init > 0.01% ovsdb-server libc-2.17.so [.] _IO_setb > 0.01% ovsdb-server libc-2.17.so [.] __strchr_sse42 > 0.01% ovsdb-server libjemalloc.so.2 [.] > 0x0000000000014dd6 > 0.01% ovsdb-server libjemalloc.so.2 [.] > 0x000000000001633c > 0.01% ovsdb-server libjemalloc.so.2 [.] > 0x000000000001636f > 0.01% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ds_reserve > 0.01% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] hexit_value > 0.01% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_parser_feed > 0.01% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > jsonrpc_session_recv_wait > 0.01% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_to_json > 0.01% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] time_msec > 0.01% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_change_set_destroy > 0.01% ovsdb-server libovsdb-3.1.so.0.0.0 [.] ovsdb_row_clone > 0.01% ovsdb-server [kernel.kallsyms] [k] __audit_inode > 0.01% ovsdb-server [kernel.kallsyms] [k] > __audit_syscall_entry > 0.01% ovsdb-server [kernel.kallsyms] [k] __dev_queue_xmit > 0.01% ovsdb-server [kernel.kallsyms] [k] > __mem_cgroup_commit_charge > 0.01% ovsdb-server [kernel.kallsyms] [k] __slab_alloc > 0.01% ovsdb-server [kernel.kallsyms] [k] > __split_huge_page > 0.01% ovsdb-server [kernel.kallsyms] [k] > bond_handle_frame > 0.01% ovsdb-server [kernel.kallsyms] [k] > generic_permission > 0.01% ovsdb-server [kernel.kallsyms] [k] > get_pageblock_flags_group > 0.01% ovsdb-server [kernel.kallsyms] [k] lock_sock_nested > 0.01% ovsdb-server [kernel.kallsyms] [k] mlx5e_poll_tx_cq > 0.01% ovsdb-server [kernel.kallsyms] [k] tcp_poll > 0.01% ovsdb-server [kernel.kallsyms] [k] > tcp_stream_memory_free > 0.01% ovsdb-server [kernel.kallsyms] [k] unix_poll > 0.01% ovsdb-server [kernel.kallsyms] [k] unix_release > 0.01% ovsdb-server [kernel.kallsyms] [k] xfs_vn_getattr > 0.01% ovsdb-server libc-2.17.so [.] _IO_no_init > 0.01% ovsdb-server libc-2.17.so [.] __clock_gettime > 0.01% ovsdb-server libc-2.17.so [.] > __memmove_ssse3_back > 0.01% ovsdb-server libc-2.17.so [.] read_int > 0.01% ovsdb-server libcrypto.so.1.0.2k [.] gcm_ghash_avx > <…cut...> > > Thanks everybody, who helped with that! :) > > > > > > Best regards, Ilya Maximets. > > Samples: 2K of event 'cpu-clock', Event count (approx.): 29989898690 > Overhead Command Shared Object Symbol > 58.71% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] uuid_compare_3way > 7.61% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000011058 > 6.60% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_atom_compare_3way > 5.93% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_condition_match_any_clause > 2.26% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000011070 > 2.19% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > uuid_compare_3way@plt > 2.16% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010ffd > 1.68% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010fe8 > 1.65% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x00000000000110ec > 1.38% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000011084 > 1.31% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001100e > 1.25% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000011063 > 1.08% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_compare_3way > 1.04% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010fe0 > 0.67% ovsdb-server libc-2.17.so [.] _int_malloc > 0.54% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001105a > 0.51% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_monitor_get_update > 0.47% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ovsdb_datum_hash > 0.30% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_destroy__ > 0.24% ovsdb-server libc-2.17.so [.] > malloc_consolidate > 0.24% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ovsdb_atom_hash > 0.20% ovsdb-server [kernel.kallsyms] [k] __do_softirq > 0.17% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] json_string > 0.13% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_destroy > 0.13% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010fe5 > 0.10% ovsdb-server libc-2.17.so [.] __memset_sse2 > 0.10% ovsdb-server libc-2.17.so [.] > __strlen_sse2_pminub > 0.10% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010fea > 0.10% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000011020 > 0.10% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000011067 > 0.07% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_compare_3way@plt > 0.07% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ovsdb_datum_equals > 0.07% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010ff1 > 0.07% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000011025 > 0.07% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001108b > 0.03% ovsdb-server [kernel.kallsyms] [k] __netif_schedule > 0.03% ovsdb-server libc-2.17.so [.] __strncmp_sse42 > 0.03% ovsdb-server libc-2.17.so [.] _int_free > 0.03% ovsdb-server libc-2.17.so [.] free > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] ovsdb_datum_clone > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] > ovsdb_datum_from_json > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] shash_find > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] 0x000000000018f7e2 > 0.03% ovsdb-server libopenvswitch-3.1.so.0.0.0 [.] 0x000000000018f7e8 > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] > ovsdb_condition_from_json > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010fe1 > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x0000000000010fff > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001105e > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x00000000000110c3 > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x00000000000110f2 > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001c007 > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001c030 > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001c05b > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001c45f > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001c535 > 0.03% ovsdb-server libovsdb-3.1.so.0.0.0 [.] 0x000000000001c644 > > On 2 May 2023, at 19:45, Ilya Maximets via discuss < > ovs-discuss@openvswitch.org> wrote: > > On 5/2/23 16:45, Vladislav Odintsov wrote: > > Hi Ilya, > > let me jump into this thread. > > Right now I’m debugging the behaviour of ovn (22.09.x) and ovsdb-server > 3.1.0 where one ovsdb update3 request makes ovsdb-server, which acts as a > relay for OVN Southbound DB with only 5-6 clients connected to it > (ovn-controllers, acting as a central chassis for external access with > enabled ha_group for edge LRs), utilize 100% CPU during 5-6 minutes. > During this time ovsdb relay failes to answer ovsdb inactivity probes and > then clients and even upstream ovsdb-servers disconnect this ovsdb relay > because of ping probe timeout of 60s. All the probe intervals configured to > 60 seconds value (ovsdb-server SB cluster <-> ovsdb SB relay <-> > ovn-controller). Earlier I’ve posted a long-read with some problems listed > [1]. > > IIUC, this update is generated by ovn-northd after one LS with only one > LSP type router and attached LB is removed. > You can see the request json here: [2] > Such updates appear not only if LS/LB is removed but also in some other > operations, this is just an example. > So it seems like ovn-northd re-creates a big dp group and such update for > some reason is difficult to handle for ovsdb relay (actually > ovn-controllers also utilize 100% cpu). > > Have you seen such behaviour? Maybe you’ve got any suggestion about the > reason and a possible fix for such huge load from one update3? > > > Such big updates are typically triggered by the fact that > northd doesn't modify datapath groups. It re-creates them > instead. I'm actually working on a patch for this since > last week and hope to post it before the soft freeze for 23.06. > > In your case, however, it seems like you have a new datapath > group created for a load balancer. And it looks like you have > about 1000 switches in the setup, which is a lot, but... > > Looking at the dump [2], it's big for a human, but it's only > about 100 KB in size. Database shouldn't have a lot of trouble > processing that, especially in 3.1. So, I'm not sure what > exactly is going on in your case. If you can profile the relay > server and see what it is doing all that time, that would be > helpful. > > Best regards, Ilya Maximets. > > > Thanks. > > 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-April/403699.html > 2: https://gist.github.com/odivlad/bba4443e589a268a0f389c2972511df3 > > > On 2 May 2023, at 14:49, Ilya Maximets via discuss < > ovs-discuss@openvswitch.org> wrote: > > Form my side, the first option would be increasing the inactivity > probe on the ovn-controller side and see if that resolves the issue. > Deployments typically have 60+ seconds set, just in case. > > Also, if you're not already using latest versions of OVS/OVN, upgrade > may resolve the issue as well. For example, OVS 2.17 provides a big > performance improvement over previous versions and 3.0 and 3.1 give > even more on top. And with new OVN releases, the southbound database > size usually goes down significantly reducing the load on the OVSDB > server. I'd suggest to use releases after OVN 22.09 for large scale > deployments. > > However, if your setup have only one switch with 250 ports and you > have an issue, that should not really be related to scale and you > need to investigate further on what exactly is happening. > > Best regards, Ilya Maximets. > > On 5/2/23 08:58, Felix Hüttner via discuss wrote: > > Hi Gavin, > > we saw similar issues after reaching a certain number of hypervisors. This > happened because our ovsdb processes ran at 100% cpu utilization (and they > are not multithreaded). > > Our solutions where: > > 1. If you use ssl on your north-/southbound db. Disable it and add a tls > terminating reverse proxy (like traefik) in front > 2. Increase the inactivity probe significantly (you might need to change > it on the ovn-controller and ovsdb side, not sure anymore) > 3. Introduce ovsdb relays and connect the ovn-controllers there. > > -- > > Felix Huettner > > > > *From:* discuss <ovs-discuss-boun...@openvswitch.org> *On Behalf Of > *Gavin McKee via discuss > *Sent:* Monday, May 1, 2023 9:20 PM > *To:* ovs-discuss <ovs-discuss@openvswitch.org> > *Subject:* [ovs-discuss] CPU pinned at 100% , ovn-controller to ovnsb_db > unstable > > Hi , > > I'm having a pretty bad issue with OVN controller on the hypervisors being > unable to connect to the OVS SB DB , > > > > 2023-05-01T19:13:33.969Z|00541|reconnect|ERR|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: no response to inactivity probe after 5 > seconds, disconnecting > 2023-05-01T19:13:33.969Z|00542|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connection dropped > 2023-05-01T19:13:43.043Z|00543|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connected > 2023-05-01T19:13:56.115Z|00544|reconnect|ERR|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: no response to inactivity probe after 5 > seconds, disconnecting > 2023-05-01T19:13:56.115Z|00545|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connection dropped > 2023-05-01T19:14:36.177Z|00546|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connected > 2023-05-01T19:14:44.996Z|00547|jsonrpc|WARN|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: receive error: Connection reset by peer > 2023-05-01T19:14:44.996Z|00548|reconnect|WARN|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connection dropped (Connection reset by peer) > 2023-05-01T19:15:44.131Z|00549|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connected > 2023-05-01T19:15:54.137Z|00550|reconnect|ERR|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: no response to inactivity probe after 5 > seconds, disconnecting > 2023-05-01T19:15:54.137Z|00551|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connection dropped > 2023-05-01T19:16:02.184Z|00552|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connected > 2023-05-01T19:16:14.488Z|00553|reconnect|ERR|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: no response to inactivity probe after 5 > seconds, disconnecting > 2023-05-01T19:16:14.488Z|00554|reconnect|INFO|tcp:10.193.1.2:6642 < > http://10.193.1.2:6642/>: connection dropped > > > > This happened after pushing a configuration to north db for around 250 > logical switch ports. > > Once I turn on the VM's everything goes bad very quickly, > > > > > > 2023-05-01T04:27:09.294Z|01947|poll_loop|INFO|wakeup due to [POLLOUT] on > fd 66 (10.193.200.6:6642 <http://10.193.200.6:6642/><->10.193.0.102:48794 > <http://10.193.0.102:48794/>) at ../lib/stream-fd.c:153 (100% CPU usage) > > > > Can anyone provide any guidance how to run down an issue like this ? > > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > > > Regards, > Vladislav Odintsov > > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > Regards, > Vladislav Odintsov > > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > > > Regards, > Vladislav Odintsov > > > > Regards, > Vladislav Odintsov > > > > Regards, > Vladislav Odintsov > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > -- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss