Hi, While I'm finishing up with the valgrind memory leak tests, I saw another tool come with valgrind called helgrind, which can detect pthread synchronization errors. http://valgrind.org/docs/manual/hg-manual.html I used helgrind on OVS testsuite and saw a couple of errors. Before digging into it, I'm not sure whether it worth fixing it or it's a false positive. Any feedback or experience of using helgrind are welcome.
An example error reported by helgrind at OVS: we have two lock: (1) 0x7dd0c0 inside data symbol "netdev_class_mutex" (2) 0x7dab60 inside data symbol "route_table_mutex" Helgrind detects there is incorrect lock acquiring order. At one time we lock (1), then (2). And another time we lock (2), then (1). In that case, a possible deadlock might happen. Details are below: ==128325== Thread #1: lock order "0x7DAB60 before 0x7DD0C0" violated ==128325== ==128325== Observed (incorrect) order is: acquisition of lock at 0x7DD0C0 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x47B2F4: netdev_run (netdev.c:180) ==128325== by 0x406263: main (ovs-vswitchd.c:122) ==128325== ==128325== followed by a later acquisition of lock at 0x7DAB60 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x4FC282: route_table_run (route-table.c:124) ==128325== by 0x47993E: netdev_vport_run (netdev-vport.c:375) ==128325== by 0x47B379: netdev_run (netdev.c:183) ==128325== by 0x406263: main (ovs-vswitchd.c:122) ==128325== ==128325== Required order was established by acquisition of lock at 0x7DAB60 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x4FC142: route_table_init (route-table.c:94) ==128325== by 0x45FC57: dp_initialize (dpif.c:126) ==128325== by 0x45FE8D: dp_enumerate_types (dpif.c:246) ==128325== by 0x418894: ofproto_enumerate_types (ofproto.c:470) ==128325== by 0x409C84: bridge_run__ (bridge.c:2877) ==128325== by 0x40F4E3: bridge_run (bridge.c:2940) ==128325== by 0x406254: main (ovs-vswitchd.c:120) ==128325== ==128325== followed by a later acquisition of lock at 0x7DD0C0 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x47B77A: netdev_open (netdev.c:366) ==128325== by 0x4DE8B9: insert_ipdev (tnl-ports.c:355) ==128325== by 0x4DF0FA: tnl_port_map_insert_ipdev (tnl-ports.c:423) ==128325== by 0x4BAD6C: ovs_router_insert__ (ovs-router.c:150) ==128325== by 0x4FC0E4: route_table_handle_msg (route-table.c:301) ==128325== by 0x4FC0E4: route_table_reset (route-table.c:186) ==128325== by 0x4FC1F2: route_table_init (route-table.c:113) ==128325== by 0x45FC57: dp_initialize (dpif.c:126) ==128325== by 0x45FE8D: dp_enumerate_types (dpif.c:246) ==128325== by 0x418894: ofproto_enumerate_types (ofproto.c:470) ==128325== by 0x409C84: bridge_run__ (bridge.c:2877) ==128325== by 0x40F4E3: bridge_run (bridge.c:2940) ==128325== Thread #1: lock order "0x7DAB60 before 0x7DD0C0" violated ==128325== ==128325== Observed (incorrect) order is: acquisition of lock at 0x7DD0C0 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x47B2F4: netdev_run (netdev.c:180) ==128325== by 0x406263: main (ovs-vswitchd.c:122) ==128325== ==128325== followed by a later acquisition of lock at 0x7DAB60 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x4FC282: route_table_run (route-table.c:124) ==128325== by 0x47993E: netdev_vport_run (netdev-vport.c:375) ==128325== by 0x47B379: netdev_run (netdev.c:183) ==128325== by 0x406263: main (ovs-vswitchd.c:122) ==128325== ==128325== Required order was established by acquisition of lock at 0x7DAB60 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x4FC142: route_table_init (route-table.c:94) ==128325== by 0x45FC57: dp_initialize (dpif.c:126) ==128325== by 0x45FE8D: dp_enumerate_types (dpif.c:246) ==128325== by 0x418894: ofproto_enumerate_types (ofproto.c:470) ==128325== by 0x409C84: bridge_run__ (bridge.c:2877) ==128325== by 0x40F4E3: bridge_run (bridge.c:2940) ==128325== by 0x406254: main (ovs-vswitchd.c:120) ==128325== ==128325== followed by a later acquisition of lock at 0x7DD0C0 ==128325== at 0x4C2CDE7: mutex_lock_WRK (hg_intercepts.c:901) ==128325== by 0x4C30C3B: pthread_mutex_lock (hg_intercepts.c:917) ==128325== by 0x4BB757: ovs_mutex_lock_at (ovs-thread.c:76) ==128325== by 0x47B77A: netdev_open (netdev.c:366) ==128325== by 0x4DE8B9: insert_ipdev (tnl-ports.c:355) ==128325== by 0x4DF0FA: tnl_port_map_insert_ipdev (tnl-ports.c:423) ==128325== by 0x4BAD6C: ovs_router_insert__ (ovs-router.c:150) ==128325== by 0x4FC0E4: route_table_handle_msg (route-table.c:301) ==128325== by 0x4FC0E4: route_table_reset (route-table.c:186) ==128325== by 0x4FC1F2: route_table_init (route-table.c:113) ==128325== by 0x45FC57: dp_initialize (dpif.c:126) ==128325== by 0x45FE8D: dp_enumerate_types (dpif.c:246) ==128325== by 0x418894: ofproto_enumerate_types (ofproto.c:470) ==128325== by 0x409C84: bridge_run__ (bridge.c:2877) ==128325== by 0x40F4E3: bridge_run (bridge.c:2940)
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss