Are the controller and the amphora using the same version of Octavia? We had a python3 issue where we had to change the HMAC digest used. If you controller is running an older version of Octavia than your amphora images, it may not have the compatibility code to support the new format. The compatibility code is here: https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/health_daemon/status_message.py#L56
There is also a release note about the issue here: https://docs.openstack.org/releasenotes/octavia/rocky.html#upgrade-notes If that is not the issue, I would double check the heartbeat_key in the health manager configuration files and inside one of the amphora. Note, that this key is only used for health heartbeats and stats, it is not used for the controller to amphora communication on port 9443. Also, load balancers cannot get "stuck" in PENDING_* states unless someone has killed the controller process that was actively working on that load balancer. By killed I mean a non-graceful shutdown of the process that was in the middle of working on the load balancer. Otherwise all code paths lead back to ACTIVE or ERROR status after it finishes the work or gives up retrying the requested action. Check your controller logs to make sure this load balancer is not still being worked on by one of the controllers. The default retry timeouts (some are up to 25 minutes) are very long (it will keep trying to accomplish the request) to accommodate very slow (virtual box) hosts and the test gates. You will want to tune those down for a production deployment. Michael On Tue, Oct 23, 2018 at 7:09 AM Gaël THEROND <gael.ther...@gmail.com> wrote: > > Hi guys, > > I'm finishing to work on my POC for Octavia and after solving few issues with > my configuration I'm close to get a properly working setup. > However, I'm facing a small but yet annoying bug with the health-manager > receiving amphora heartbeat UDP packet which it consider as not correct and > so drop it. > > Here are the messages that can be found in logs: > > 2018-10-23 13:53:21.844 25 WARNING > octavia.amphorae.backends.health_daemon.status_message [-] calculated hmac: > faf73e41a0f843b826ee581c3995b7f7e56b5e5a294fca0b84eda426766f8415 not equal to > msg hmac: 6137613337316432636365393832376431343337306537353066626130653261 > dropping packet > > Which come from this part of the HM Code: > > https://docs.openstack.org/octavia/pike/_modules/octavia/amphorae/backends/health_daemon/status_message.html#get_payload > > The annoying thing is that I don't get why the UDP packet is considered as > stale and how can I try to reproduce the payload which is send to the > HealthManager. > I'm willing to write a simple PY program to simulate the heartbeat payload > but I don't now what's exactly the message and I think I miss some > informations. > > Both HealthManager and the Amphora do use the same heartbeat_key and both can > contact on the network as the initial Health-manager to Amphora 9443 > connection is validated. > > As an effect to this situation, my loadbalancer is stuck in PENDING_UPDATE > mode. > > Do you have any idea on how can I handle such thing or if it's something > already seen out there for anyone else? > > Kind regards, > G. > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators