False alarm, after more tests the issue persisted, so I switched to backup mode in the other haproxy nodes and now everything works as expected.
Thanks Em 15/01/2015 12:13, "Pedro Sousa" <pgso...@gmail.com> escreveu: > Hi all, > > the culprit was haproxy, I had "option httpchk" when I disabled this > stopped having timeouts rebooting the servers. > > Thank you all. > > > On Wed, Jan 14, 2015 at 5:29 PM, John Dewey <j...@dewey.ws> wrote: > >> I would verify that the VIP failover is occurring. >> >> Your master should have the IP address. If you shut down keepalived the >> VIP should move to one of the others. I generally set the state to MASTER >> on all systems, and have one with a higher priority than the others (e.g. >> 100 vs 150 on others). >> >> On Tuesday, January 13, 2015 at 12:18 PM, Pedro Sousa wrote: >> >> As expected If I reboot the Keepalived MASTER node, I get timeouts again, >> so my understanding is that this happens when the VIP fails over to another >> node. Anyone has explanation for this? >> >> Thanks >> >> On Tue, Jan 13, 2015 at 8:08 PM, Pedro Sousa <pgso...@gmail.com> wrote: >> >> Hi, >> >> I think I found out the issue, as I have all the 3 nodes running >> Keepalived as MASTER, when I reboot one of the servers, one of the VIPS >> failsover to it, causing the timeout issues. So I left only one server as >> MASTER and the other 2 as BACKUP, and If I reboot the BACKUP servers >> everything will work fine. >> >> As a note aside, I don't know if this is some ARP issue because I have a >> similar problem with Neutron L3 running in HA Mode. If I reboot the server >> that is running as MASTER I loose connection to my floating IPS because the >> switch doesn't know yet that the Mac Addr has changed. To everything start >> working I have to ping an outside host like google from an instance. >> >> Maybe someone could share some experience on this, >> >> Thank you for your help. >> >> >> >> >> On Tue, Jan 13, 2015 at 7:18 PM, Pedro Sousa <pgso...@gmail.com> wrote: >> >> Jesse, >> >> I see a lot of these messages in glance-api: >> >> 2015-01-13 19:16:29.084 29269 DEBUG >> glance.api.middleware.version_negotiation >> [29d94a9a-135b-4bf2-a97b-f23b0704ee15 eb7ff2b5f0f34f51ac9ea0f75b60065d >> 2524b02b63994749ad1fed6f3a825c15 - - -] Unknown version. Returning version >> choices. process_request >> /usr/lib/python2.7/site-packages/glance/api/middleware/version_negotiation.py:64 >> >> While running openstack-status (glance image-list) >> >> == Glance images == >> Error finding address for >> http://172.16.21.20:9292/v1/images/detail?sort_key=name&sort_dir=asc&limit=20: >> HTTPConnectionPool(host='172.16.21.20', port=9292): Max retries exceeded >> with url: /v1/images/detail?sort_key=name&sort_dir=asc&limit=20 (Caused by >> <class 'httplib.BadStatusLine'>: '') >> >> >> Thanks >> >> >> On Tue, Jan 13, 2015 at 6:52 PM, Jesse Keating <j...@bluebox.net> wrote: >> >> On 1/13/15 10:42 AM, Pedro Sousa wrote: >> >> Hi >> >> >> I've changed some haproxy confs, now I'm getting a different error: >> >> *== Nova networks ==* >> *ERROR (ConnectionError): HTTPConnectionPool(host='172.16.21.20', >> port=8774): Max retries exceeded with url: >> /v2/2524b02b63994749ad1fed6f3a825c15/os-networks (Caused by <class >> 'httplib.BadStatusLine'>: '')* >> *== Nova instance flavors ==* >> >> If I restart my openstack services everything will start working. >> >> I'm attaching my new haproxy conf. >> >> >> Thanks >> >> >> Sounds like your services are losing access to something, like rabbit or >> the database. What do your service logs show prior to restart? Are they >> throwing any errors? >> >> >> -- >> -jlk >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> >> >> >> >> >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators