Hi Felix,
The local leader has these append messages before send the reply(next
below log message):
2025-02-25T14:03:47.776Z|00764|jsonrpc|DBG|ssl:170.168.0.X:39452: send
notification, method="append_request",
params=[{"cluster":"56b3aab6-476f-4ce1-96b9-1588dd4176c9","comment":"heartbeat","from":"11a8329d-bb6f-4e76-849b-090be09c030d","leader_commit":57,"log":[],"prev_log_index":57,"prev_log_term":2,"term":2,"to":"3967f0d3-ed57-4433-861b-e1548d78639f"}]
Here is the initial of the message with the whole database as answer:
2025-02-25T14:03:53.469Z|00765|jsonrpc|DBG|ssl:170.168.0.X:51420: send
reply,
result=[false,"0e044970-54c2-4918-8496-0ad6bb3d5f45",{"ACL":{"001cc1e8-2b1c-4935-addf-6ca20ad45e21":{"initial":{"action":"allow-related",
Could it be something to investigate?
Regards,
Tiago Pires
On Tue, Feb 25, 2025 at 10:26 AM Tiago Pires <[email protected]> wrote:
>
> On Tue, Feb 25, 2025 at 7:21 AM Felix Huettner
> <[email protected]> wrote:
> >
> > On Mon, Feb 24, 2025 at 05:44:02PM -0300, Tiago Pires via discuss wrote:
> > > Hi all,
> > >
> > > I have an OVN Central cluster where the leader of the ovsdb NB started
> > > to use 100% of CPU load most of the time:
> > >
> > > 206 root 20 0 11.6g 4.7g 7172 R 106.7 0.3 2059:59
> > > ovsdb-server -vconsole:off -vfile:info
> > > --log-file=/var/log/ovn/ovsdb-server-nb.log
> > >
> > > While in 100% of CPU the read and write operations of the NB cluster
> > > is impacted. Doing a debug when there is this increase of CPU load, I
> > > can see a jsonrpc reply to a member of the cluster with the size of
> > > 460MB, almost the same size as the NB database. I set up an
> > > ovn-fake-multinode cluster and imported this database there and the
> > > behavior is still the same.
> > > At least the leader is not changing frequently since the election
> > > timer is in 60secs.
> > > And I have already tested with OVN 24.03 and no luck, same behavior.
> >
> > Hi Tiago,
> >
> > so if i get that correctly a non-leader member of the raft cluster
> > regularly requests the whole database content.
> > How often does that happen and can you correlate that with anything on
> > that non-leader member? Maybe that member crashes or gets restarted for
> > some reason?
> >
> > Note that the OVN version does not necessarily say anything about the
> > OVS version. And the ovs version is what provides the code of the ovsdb
> > server. So that version would be interesting as well.
> >
>
> Hi Felix,
>
> You got well, in this scenario both non-leader of the raft cluster.
> In the leader the jsonrpc reply can happen to both non-leader and it
> happens around each 10secs.
> I checked the non-leaders and their ovsdb processes are not crashing
> or getting restarted.
> The OVS version tested is 3.3.4.
>
> > >
> > > The coverage figures are not so well clear to me:
> > > # ovs-appctl -t /var/run/ovn/ovnnb_db.ctl coverage/show
> > > Event coverage, avg rate over last: 5 seconds, last minute, last hour,
> > > hash=6087dcfb:
> > > raft_entry_serialize 0.0/sec 0.000/sec 0.0000/sec
> > > total: 59
> > > hmap_pathological 5.8/sec 3.667/sec 3.5750/sec
> > > total: 585411
> > > hmap_expand 79729.0/sec 53153.200/sec 51825.3172/sec
> > > total: 8484601546
> > > hmap_reserve 0.0/sec 0.000/sec 0.0000/sec
> > > total: 48
> > > lockfile_lock 0.0/sec 0.000/sec 0.0000/sec
> > > total: 1
> > > poll_create_node 3.6/sec 4.317/sec 4.4372/sec
> > > total: 3587083
> > > poll_zero_timeout 0.6/sec 0.150/sec 0.1286/sec
> > > total: 105735
> > > seq_change 0.6/sec 0.417/sec 0.4158/sec
> > > total: 375960
> > > pstream_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 4
> > > stream_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 3
> > > unixctl_received 0.0/sec 0.017/sec 0.0003/sec
> > > total: 11
> > > unixctl_replied 0.0/sec 0.017/sec 0.0003/sec
> > > total: 11
> > > util_xalloc 3427998.6/sec 2285349.950/sec
> > > 1035236.3394/sec total: 364876387809
> > > 100 events never hit
> > >
> > > Do you guys have any other way to debug it?
> >
> > Can you share the cluster status of both the leader and the node that
> > always requests the database? maybe that helps.
> >
> Below are the cluster status from each node:
>
> #leader
> # ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
> 9944
> Name: OVN_Northbound
> Cluster ID: a2dc (a2dcce53-a807-4708-bc9d-d0b2470c7ec5)
> Server ID: 9944 (99443341-5656-464d-b242-85bb16338570)
> Address: ssl:170.168.0.4:6643
> Status: cluster member
> Role: leader
> Term: 9
> Leader: self
> Vote: self
>
> Last Election started 66169932 ms ago, reason: leadership_transfer
> Last Election won: 66169930 ms ago
> Election timer: 60000
> Log: [66, 67]
> Entries not yet committed: 0
> Entries not yet applied: 0
> Connections: ->0000 <-6aee <-7b89 ->7b89
> Disconnections: 1
> Servers:
> 9944 (9944 at ssl:170.168.0.4:6643) (self) next_index=66 match_index=66
> 6aee (6aee at ssl:170.168.0.2:6643) next_index=67 match_index=66
> last msg 7857 ms ago
> 7b89 (7b89 at ssl:170.168.0.3:6643) next_index=67 match_index=66
> last msg 7857 ms ago
>
> #non-leader 1
> # ovn-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
> 6aee
> Name: OVN_Northbound
> Cluster ID: a2dc (a2dcce53-a807-4708-bc9d-d0b2470c7ec5)
> Server ID: 6aee (6aee85c6-bd3e-45d5-896e-264ed7eaec00)
> Address: ssl:170.168.0.2:6643
> Status: cluster member
> Role: follower
> Term: 9
> Leader: 9944
> Vote: 9944
>
> Last Election started 66336770 ms ago, reason: leadership_transfer
> Last Election won: 66336767 ms ago
> Election timer: 60000
> Log: [67, 67]
> Entries not yet committed: 0
> Entries not yet applied: 0
> Connections: <-7b89 ->7b89 <-9944 ->9944
> Disconnections: 0
> Servers:
> 9944 (9944 at ssl:170.168.0.4:6643) last msg 11573 ms ago
> 6aee (6aee at ssl:170.168.0.2:6643) (self)
> 7b89 (7b89 at ssl:170.168.0.3:6643) last msg 66173010 ms ago
>
> #non-leader 2
> # ovs-appctl -t /var/run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
> 7b89
> Name: OVN_Northbound
> Cluster ID: a2dc (a2dcce53-a807-4708-bc9d-d0b2470c7ec5)
> Server ID: 7b89 (7b892543-9c2f-43bc-b62c-2941491dbe56)
> Address: ssl:170.168.0.3:6643
> Status: cluster member
> Role: follower
> Term: 9
> Leader: 9944
> Vote: 9944
>
> Election timer: 60000
> Log: [66, 67]
> Entries not yet committed: 0
> Entries not yet applied: 0
> Connections: ->0000 <-6aee ->9944 <-9944
> Disconnections: 1
> Servers:
> 9944 (9944 at ssl:170.168.0.4:6643) last msg 32288 ms ago
> 6aee (6aee at ssl:170.168.0.2:6643) last msg 66228156 ms ago
> 7b89 (7b89 at ssl:170.168.0.3:6643) (self)
>
> Thank you
>
> Regards,
>
> Tiago Pires
>
> >
> > Thanks a lot,
> > Felix
> >
> > >
> > > Regards,
> > >
> > > Tiago Pires
> > >
> > > --
> > >
> > >
> > >
> > >
> > > _‘Esta mensagem é direcionada apenas para os endereços constantes no
> > > cabeçalho inicial. Se você não está listado nos endereços constantes no
> > > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> > > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
> > > estão
> > > imediatamente anuladas e proibidas’._
> > >
> > >
> > > * **‘Apesar do Magazine Luiza tomar
> > > todas as precauções razoáveis para assegurar que nenhum vírus esteja
> > > presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por
> > > quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
> > >
> > >
> > >
> > > _______________________________________________
> > > discuss mailing list
> > > [email protected]
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
--
_‘Esta mensagem é direcionada apenas para os endereços constantes no
cabeçalho inicial. Se você não está listado nos endereços constantes no
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão
imediatamente anuladas e proibidas’._
* **‘Apesar do Magazine Luiza tomar
todas as precauções razoáveis para assegurar que nenhum vírus esteja
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss