On Nov 25, 2013, at 12:36 PM, Salvatore Orlando <sorla...@nicira.com> wrote:
> 
> Thanks Kyle,
> 
> More comments inline.
> 
> Salvatore
> 
> 
> On 25 November 2013 16:03, Kyle Mestery (kmestery) <kmest...@cisco.com> wrote:
> On Nov 25, 2013, at 8:28 AM, Salvatore Orlando <sorla...@nicira.com> wrote:
> >
> > Hi,
> >
> > I've been recently debugging some issues I've had with the OVS agent, and I 
> > found out that in many  cases (possibly every case) the code just logs 
> > errors from ovs-vsctl and ovs-ofctl without taking any action in the 
> > control flow.
> >
> > For instance, the routine which should do the wiring for a port, port_bound 
> > [1], does not react in any way if it fails to configure the local vlan, 
> > which I guess means the port would not be able to send/receive any data.
> >
> > I'm pretty sure there's a good reason for this which I'm missing at the 
> > moment. I am asking because I see a pretty large number of ALARM_CLOCK 
> > errors returned by OVS commands in gate logs (see bug [2]), and I'm not 
> > sure whether it's ok to handle them as the OVS agent is doing nowadays.
> >
> Thanks for bringing this up Salvatore. It looks like the underlying run_vstcl 
> [1] provides an ability to raise exceptions on errors, but this is not used 
> by most of the callers of run_vsctl. Do you think we should be returning the 
> exceptions back up the stack to callers to handle? I think that may be a good 
> first step.
> 
> I think it makes sense to start to handle errors; as they often happen in the 
> agent's rpc loop simply raising will probably just cause the agent to crash.
> I looked again at the code and it really seems it's silently ignoring errors 
> from ovs command.
> This actually makes sense in some cases. For instance the l3 agent might 
> remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its 
> iteration.
> 
> There are however cases in which the exception must be handled.
> In cases like the ALARM_CLOCK error, either a retry mechanism or marking the 
> port for re-syncing at the next iteration might make sense.
> Other error cases might be unrecoverable; for instance when a port 
> disappears. In that case it seems reasonable to put the relevant neutron port 
> in ERROR state, so that the user is aware that the port anymore.
> 
I think it makes sense to address these things. Want me to file a bug?

> Thanks,
> Kyle
> 
> [1] 
> https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52
> 
> > Regards,
> > Salvatore
> >
> > [1] 
> > https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599
> > [2] https://bugs.launchpad.net/neutron/+bug/1254520
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to