Hi,
well...
On 05/04/2012 09:38 PM, Ben Pfaff wrote:
On Fri, May 04, 2012 at 09:34:27PM +0200, Oliver Francke wrote:
… showed the following:
root@fcmsnode10:~# ovs-dpctl show
system@vmbr1:
lookups: hit:263209087 missed:904392 lost:0
flows: 5
port 0: vmbr1 (internal)
port 1: eth1
port 4: vlan10 (internal)
port 7: tap410i1d0
port 13: tap433i1d0
port 15: tap377i1d0
port 16: tap416i1d0
port 18: tap287i1d0
port 19: tap451i1d0
port 21: tap822i1d0
port 23: tap160i1d0
port 24: tap376i1d0
port 27: tap1084i1d0
port 28: tap1085i1d0
port 30: tap1113i1d0
port 31: tap339i1d0
port 38: tap760i1d0
system@vmbr0:
lookups: hit:11883603451 missed:6262740342 lost:114647219
flows: 1295
port 0: vmbr0 (internal)
port 1: vlan146 (internal)
port 2: eth0
port 4: tap266i0d0
port 8: tap323i0d0
port 13: tap283i0d0
port 31: tap410i0d0
port 41: tap134i0d0
and some more ~140 ports
Hmm, vmbr0 has a pretty high flow count and far too many lost packets.
I suggest, first, upgrading to OVS 1.4.1, which should reduce the lost
packet count, and then setting vmbr0's flow eviction threshold
significantly higher (which should reduce CPU usage) with:
ovs-vsctl set bridge vmbr0 other-config:flow-eviction-threshold=10000
updated all 4 nodes to 1.4.1, which worked seamless. And now to the
section with the "but" in it:
But: still high load with polluting syslog with: ( not constantly, but
ever so often)
--- 8-< ---
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139643|poll_loop|WARN|wakeup due
to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at
lib/dpif-linux.c:1197 (52% CPU usage)
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139644|poll_loop|WARN|wakeup due
to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at
lib/dpif-linux.c:1197 (52% CPU usage)
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139645|poll_loop|WARN|wakeup due
to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at
lib/dpif-linux.c:1197 (52% CPU usage)
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139646|poll_loop|WARN|wakeup due
to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at
lib/dpif-linux.c:1197 (52% CPU usage)
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139647|poll_loop|WARN|wakeup due
to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at
lib/dpif-linux.c:1197 (52% CPU usage)
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139648|timeval|WARN|52 ms poll
interval (10 ms user, 10 ms system) is over 19 times the weighted mean
interval 3 ms (31293432 samples)
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139649|timeval|WARN|context
switches: 0 voluntary, 136 involuntary
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139650|coverage|INFO|Event
coverage (epoch 31293432/entire run), hash=0a8403eb:
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139651|coverage|INFO|ofproto_dpif_xlate 30 / 418241133
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139652|coverage|INFO|flow_extract 15 / 119894932
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139653|coverage|INFO|hmap_pathological 6 / 187853714
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139654|coverage|INFO|hmap_expand 275 / 1030432349
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139655|coverage|INFO|netdev_get_stats 117 / 1275136
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139656|coverage|INFO|poll_fd_wait 15 / 469400883
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139657|coverage|INFO|util_xalloc 21740 / 85658737570
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139658|coverage|INFO|netdev_ethtool 234 / 2550522
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139659|coverage|INFO|netlink_received 486 / 444888366
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139660|coverage|INFO|netlink_sent 264 / 361913061
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139661|coverage|INFO|bridge_reconfigure 0 / 6
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139662|coverage|INFO|ofproto_flush 0 / 2
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139663|coverage|INFO|ofproto_update_port 0 / 131
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139664|coverage|INFO|facet_revalidate 0 / 157765
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139665|coverage|INFO|facet_unexpected 0 / 1
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139666|coverage|INFO|dpif_port_add 0 / 2
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139667|coverage|INFO|dpif_port_del 0 / 2
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139668|coverage|INFO|dpif_flow_flush 0 / 4
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139669|coverage|INFO|dpif_flow_put 0 / 445
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139670|coverage|INFO|dpif_flow_del 0 / 119661491
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139671|coverage|INFO|dpif_purge 0 / 2
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139672|coverage|INFO|mac_learning_learned 0 / 6111
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139673|coverage|INFO|mac_learning_expired 0 / 5598
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139674|coverage|INFO|poll_zero_timeout 0 / 6190
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139675|coverage|INFO|pstream_open 0 / 4
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139676|coverage|INFO|stream_open 0 / 1
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139677|coverage|INFO|netdev_set_policing 0 / 706
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139678|coverage|INFO|netdev_get_ifindex 0 / 123
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139679|coverage|INFO|netdev_get_hwaddr 0 / 125
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139680|coverage|INFO|nln_changed 0 / 137
May 8 06:30:15 fcmsnode0 ovs-vswitchd:
139681|coverage|INFO|netlink_recv_jumbo 0 / 16397628
May 8 06:30:15 fcmsnode0 ovs-vswitchd: 139682|coverage|INFO|47 events
never hit
May 8 06:30:16 fcmsnode0 ovs-vswitchd: 139683|poll_loop|WARN|Dropped
216 log messages in last 1 seconds (most recently, 1 seconds ago) due to
excessive rate
May 8 06:30:16 fcmsnode0 ovs-vswitchd: 139684|poll_loop|WARN|wakeup due
to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at
lib/dpif-linux.c:1197 (52% CPU usage)
May 8 06:30:16 fcmsnode0 ovs-vswitchd: 139685|poll_loop|WARN|wakeup due
to [POLLIN] on fd 36 (unknown anon_inode:[eventpoll]) at
lib/dpif-linux.c:1197 (52% CPU usage)
May 8 06:30:17 fcmsnode0 ovs-vswitchd: 139686|poll_loop|WARN|Dropped
480 log messages in last 1 seconds (most recently, 1 seconds ago) due to
excessive rate
--- 8-< ---
Perhaps there is already a hint in the stats... If not, how to dig into
it further...?
Thnx in@vance,
Oliver.
The latter will probably become unnecessary with OVS 1.7, but that's
not released yet.
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss
--
Oliver Francke
filoo GmbH
Moltkestraße 25a
33330 Gütersloh
HRB4355 AG Gütersloh
Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz
Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss