Thanks Kyle, More comments inline.
Salvatore On 25 November 2013 16:03, Kyle Mestery (kmestery) <kmest...@cisco.com>wrote: > On Nov 25, 2013, at 8:28 AM, Salvatore Orlando <sorla...@nicira.com> > wrote: > > > > Hi, > > > > I've been recently debugging some issues I've had with the OVS agent, > and I found out that in many cases (possibly every case) the code just > logs errors from ovs-vsctl and ovs-ofctl without taking any action in the > control flow. > > > > For instance, the routine which should do the wiring for a port, > port_bound [1], does not react in any way if it fails to configure the > local vlan, which I guess means the port would not be able to send/receive > any data. > > > > I'm pretty sure there's a good reason for this which I'm missing at the > moment. I am asking because I see a pretty large number of ALARM_CLOCK > errors returned by OVS commands in gate logs (see bug [2]), and I'm not > sure whether it's ok to handle them as the OVS agent is doing nowadays. > > > Thanks for bringing this up Salvatore. It looks like the underlying > run_vstcl [1] provides an ability to raise exceptions on errors, but this > is not used by most of the callers of run_vsctl. Do you think we should be > returning the exceptions back up the stack to callers to handle? I think > that may be a good first step. > I think it makes sense to start to handle errors; as they often happen in the agent's rpc loop simply raising will probably just cause the agent to crash. I looked again at the code and it really seems it's silently ignoring errors from ovs command. This actually makes sense in some cases. For instance the l3 agent might remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its iteration. There are however cases in which the exception must be handled. In cases like the ALARM_CLOCK error, either a retry mechanism or marking the port for re-syncing at the next iteration might make sense. Other error cases might be unrecoverable; for instance when a port disappears. In that case it seems reasonable to put the relevant neutron port in ERROR state, so that the user is aware that the port anymore. > > Thanks, > Kyle > > [1] > https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52 > > > Regards, > > Salvatore > > > > [1] > https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599 > > [2] https://bugs.launchpad.net/neutron/+bug/1254520 > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev