Hi all, the culprit was haproxy, I had "option httpchk" when I disabled this stopped having timeouts rebooting the servers.
Thank you all. On Wed, Jan 14, 2015 at 5:29 PM, John Dewey <j...@dewey.ws> wrote: > I would verify that the VIP failover is occurring. > > Your master should have the IP address. If you shut down keepalived the > VIP should move to one of the others. I generally set the state to MASTER > on all systems, and have one with a higher priority than the others (e.g. > 100 vs 150 on others). > > On Tuesday, January 13, 2015 at 12:18 PM, Pedro Sousa wrote: > > As expected If I reboot the Keepalived MASTER node, I get timeouts again, > so my understanding is that this happens when the VIP fails over to another > node. Anyone has explanation for this? > > Thanks > > On Tue, Jan 13, 2015 at 8:08 PM, Pedro Sousa <pgso...@gmail.com> wrote: > > Hi, > > I think I found out the issue, as I have all the 3 nodes running > Keepalived as MASTER, when I reboot one of the servers, one of the VIPS > failsover to it, causing the timeout issues. So I left only one server as > MASTER and the other 2 as BACKUP, and If I reboot the BACKUP servers > everything will work fine. > > As a note aside, I don't know if this is some ARP issue because I have a > similar problem with Neutron L3 running in HA Mode. If I reboot the server > that is running as MASTER I loose connection to my floating IPS because the > switch doesn't know yet that the Mac Addr has changed. To everything start > working I have to ping an outside host like google from an instance. > > Maybe someone could share some experience on this, > > Thank you for your help. > > > > > On Tue, Jan 13, 2015 at 7:18 PM, Pedro Sousa <pgso...@gmail.com> wrote: > > Jesse, > > I see a lot of these messages in glance-api: > > 2015-01-13 19:16:29.084 29269 DEBUG > glance.api.middleware.version_negotiation > [29d94a9a-135b-4bf2-a97b-f23b0704ee15 eb7ff2b5f0f34f51ac9ea0f75b60065d > 2524b02b63994749ad1fed6f3a825c15 - - -] Unknown version. Returning version > choices. process_request > /usr/lib/python2.7/site-packages/glance/api/middleware/version_negotiation.py:64 > > While running openstack-status (glance image-list) > > == Glance images == > Error finding address for > http://172.16.21.20:9292/v1/images/detail?sort_key=name&sort_dir=asc&limit=20: > HTTPConnectionPool(host='172.16.21.20', port=9292): Max retries exceeded > with url: /v1/images/detail?sort_key=name&sort_dir=asc&limit=20 (Caused by > <class 'httplib.BadStatusLine'>: '') > > > Thanks > > > On Tue, Jan 13, 2015 at 6:52 PM, Jesse Keating <j...@bluebox.net> wrote: > > On 1/13/15 10:42 AM, Pedro Sousa wrote: > > Hi > > > I've changed some haproxy confs, now I'm getting a different error: > > *== Nova networks ==* > *ERROR (ConnectionError): HTTPConnectionPool(host='172.16.21.20', > port=8774): Max retries exceeded with url: > /v2/2524b02b63994749ad1fed6f3a825c15/os-networks (Caused by <class > 'httplib.BadStatusLine'>: '')* > *== Nova instance flavors ==* > > If I restart my openstack services everything will start working. > > I'm attaching my new haproxy conf. > > > Thanks > > > Sounds like your services are losing access to something, like rabbit or > the database. What do your service logs show prior to restart? Are they > throwing any errors? > > > -- > -jlk > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators