On Thu, Sep 18, 2014 at 08:07:31AM +0000, Wodkowski, PawelX wrote: > > > +int > > > +bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev, > > > + uint8_t slave_pos) > > > +{ > > > + struct bond_dev_private *internals = bond_dev->data->dev_private; > > > + struct mode8023ad_data *data = &internals->mode4; > > > + struct port *port; > > > + uint8_t i; > > > + > > > + bond_mode_8023ad_stop(bond_dev); > > > + > > > + /* Exclude slave from transmit policy. If this slave is an aggregator > > > + * make all aggregated slaves unselected to force sellection logic > > > + * to select suitable aggregator for this port */ > > > + for (i = 0; i < internals->active_slave_count; i++) { > > > + port = &data->port_list[slave_pos]; > > > + if (port->used_agregator_idx == slave_pos) { > > > + port->selected = UNSELECTED; > > > + port->actor_state &= ~(STATE_SYNCHRONIZATION | > > STATE_DISTRIBUTING | > > > + STATE_COLLECTING); > > > + > > > + /* Use default aggregator */ > > > + port->used_agregator_idx = i; > > > + } > > > + } > > > + > > > + port = &data->port_list[slave_pos]; > > > + timer_cancel(&port->current_while_timer); > > > + timer_cancel(&port->periodic_timer); > > > + timer_cancel(&port->wait_while_timer); > > > + timer_cancel(&port->tx_machine_timer); > > > + > > These all seem rather racy. Alarm callbacks are executed with the alarm > > list > > locks not held. So there is every possibility that you could execute these > > (or > > any timer_cancel calls in this PMD in parallel with the internal state > > machine > > timer callback, and leave either with a corrupted timer list (resulting > > from a > > double free between here, and the actual callback site), > > I don't think so. Yes, callbacks are executed with alarm list locks not > held, but > this is not the issue because access to list itself is guarded by lock and > ap->executing variable. So list will not be trashed. Check source of > eal_alarm_callback(), rte_eal_alarm_set() and rte_eal_alarm_cancel(). > Yes, you're right, the list is probably safe wht the executing bit.
> > or a timer that is > > actually still pending when a slave is removed. > > > This is not the issue also, but problem might be similar. I assumed that > alarms > are atomic but when I looked at rte alarms closer I saw a race condition > between and rte_eal_alarm_cancel() from bond_mode_8023ad_stop() > and rte_eal_alarm_set() from state machines callback. This need to be > reworked in some way. Yes, this is what I was referring to: CPU0 CPU1 rte_eal_alarm_callback bond_8023ad_deactivate_slave -bond_8023_ad_periodic_cb timer_cancel timer_set If those timer functions operate on the same timer, the result is that you can leave the stop/deactivate slave paths with a timer function for that slave still pending. The bonding mode needs some internal state to serialize those operations and determine if the timer should be reactivated. Neil