No updates since long time. I'm closing this RFE for now.
** Changed in: neutron
Status: New => Opinion
** Tags removed: rfe
** Tags added: rfe-postponed
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1817872
Title:
[RFE] neutron resource health check
Status in neutron:
Opinion
Bug description:
Problem Description
===================
How to do trouble shooting if one vm lost the connection? How to find out the
problem why the floating IP is not connectable?
No easy way, cloud operators need to dump the flows or iptables rules for it,
and then find out which parts was not set properly. What if there are huge
amounts of flows or rules, it is not human-readable, how to find out what
happened to that port? When there are plenty iptables rules, how to find out
why floating IP is not reachable? When there are many routers hosted in one
same agent node, how to find out why router is not up?
Each one seems unfriendly to mankind. And people make mistakes. But we have
the resource process procedure, so we can follow that workflow to let the
machine do the status check/trouble shooting/recovery for us.
Proposed Change
===============
This will aim to the community goal "Service-side health checks".
http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000558.html
And we already have that trouble shooting BP:
https://blueprints.launchpad.net/neutron/+spec/troubleshooting
seems we do not have much progress.
Overview
--------
Add some API, CLI tools, agent side functions to check resource status.
Basic plan:
1. In the agent side, adds some functions to detect the status of one single
resource.
For instance, check router iptables rules, check router route rules; for
ports, check the basic flow status, check the openflow security group, l2 pop,
arp, etc.
2. bulk check, ports for a tenant, or ports from one subnet, routers for a
tenant
3. check resources of one entire agent
4. API extension for the related resource, such as, router_check, port_check
For some automatically scenario, cloud operators may not want to login the
neutron-server host, then the API can be a good way to call these check methods.
Implement plan:
1. adds some functions to detect the status of one single resource.
For instance, according to the router process procesure, add check methods
for each step: check_router_gateway, check_nat_rules, check_route_rules,
check_qos_rules, check_meta_proxy, and so on.
2. CLI tool (cloud admin only, needs to run in neutron server host with
directly access of DB) to check resources of one entire agent.
For instance, check the routers of one l3 agent.
3. API extension for the related resource, check_router, check_port
---------------
to be continued...
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1817872/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp