On Nov 25, 2013, at 12:36 PM, Salvatore Orlando <sorla...@nicira.com> wrote: > > Thanks Kyle, > > More comments inline. > > Salvatore > > > On 25 November 2013 16:03, Kyle Mestery (kmestery) <kmest...@cisco.com> wrote: > On Nov 25, 2013, at 8:28 AM, Salvatore Orlando <sorla...@nicira.com> wrote: > > > > Hi, > > > > I've been recently debugging some issues I've had with the OVS agent, and I > > found out that in many cases (possibly every case) the code just logs > > errors from ovs-vsctl and ovs-ofctl without taking any action in the > > control flow. > > > > For instance, the routine which should do the wiring for a port, port_bound > > [1], does not react in any way if it fails to configure the local vlan, > > which I guess means the port would not be able to send/receive any data. > > > > I'm pretty sure there's a good reason for this which I'm missing at the > > moment. I am asking because I see a pretty large number of ALARM_CLOCK > > errors returned by OVS commands in gate logs (see bug [2]), and I'm not > > sure whether it's ok to handle them as the OVS agent is doing nowadays. > > > Thanks for bringing this up Salvatore. It looks like the underlying run_vstcl > [1] provides an ability to raise exceptions on errors, but this is not used > by most of the callers of run_vsctl. Do you think we should be returning the > exceptions back up the stack to callers to handle? I think that may be a good > first step. > > I think it makes sense to start to handle errors; as they often happen in the > agent's rpc loop simply raising will probably just cause the agent to crash. > I looked again at the code and it really seems it's silently ignoring errors > from ovs command. > This actually makes sense in some cases. For instance the l3 agent might > remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its > iteration. > > There are however cases in which the exception must be handled. > In cases like the ALARM_CLOCK error, either a retry mechanism or marking the > port for re-syncing at the next iteration might make sense. > Other error cases might be unrecoverable; for instance when a port > disappears. In that case it seems reasonable to put the relevant neutron port > in ERROR state, so that the user is aware that the port anymore. > I think it makes sense to address these things. Want me to file a bug?
> Thanks, > Kyle > > [1] > https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52 > > > Regards, > > Salvatore > > > > [1] > > https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599 > > [2] https://bugs.launchpad.net/neutron/+bug/1254520 > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev