I would like some help to identify (and correct) a problem with instances
metadata during booting. My environment is a Mitaka instalation, under
Ubuntu 16.04 LTS, with 1 controller, 1 network node and 5 compute nodes.
I'm using classic OVS as network setup.

The problem ocurs after some period of time in some projects (not all
projects at same time). When booting a Ubuntu Cloud Image with cloud-init,
instances lost conection with API metadata and doesn't get their
information like key-pairs and cloud-init scripts.

[  118.924311] cloud-init[932]: 2018-02-23 18:27:05,003 -
url_helper.py[WARNING]: Calling '
http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [101/120s]:
request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max
retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by
ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection
object at 0x7faabcd6fa58>, 'Connection to 169.254.169.254 timed out.
(connect timeout=50.0)'))]
[  136.959361] cloud-init[932]: 2018-02-23 18:27:23,038 -
url_helper.py[WARNING]: Calling '
http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [119/120s]:
request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max
retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by
ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection
object at 0x7faabcd7f240>, 'Connection to 169.254.169.254 timed out.
(connect timeout=17.0)'))]
[  137.967469] cloud-init[932]: 2018-02-23 18:27:24,040 -
DataSourceEc2.py[CRITICAL]: Giving up on md from ['
http://169.254.169.254/2009-04-04/meta-data/instance-id'] after 120 seconds
[  137.972226] cloud-init[932]: 2018-02-23 18:27:24,048 -
url_helper.py[WARNING]: Calling '
http://192.168.0.7/latest/meta-data/instance-id' failed [0/120s]: request
error [HTTPConnectionPool(host='192.168.0.7', port=80): Max retries
exceeded with url: /latest/meta-data/instance-id (Caused by
NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection
object at 0x7faabcd7fc18>: Failed to establish a new connection: [Errno
111] Connection refused',))]
[  138.974223] cloud-init[932]: 2018-02-23 18:27:25,053 -
url_helper.py[WARNING]: Calling '
http://192.168.0.7/latest/meta-data/instance-id' failed [1/120s]: request
error [HTTPConnectionPool(host='192.168.0.7', port=80): Max retries
exceeded with url: /latest/meta-data/instance-id (Caused by
NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection
object at 0x7faabcd7fa58>: Failed to establish a new connection: [Errno
111] Connection refused',))]

After give up 169.254.169.254 it tries 192.168.0.7 that is the dhcp address
for the project.

I've checked that neutron-l3-agent is running, without errors. On compute
node where VM is running, agents and vswitch is running. I could check the
namespace of a problematic project and saw an iptables rules redirecting
traffic from 169.254.169.254:80 to 0.0.0.0:9697, and there is a process
neutron-ns-medata_proxy_ID  that opens that port. So, it look like the
metadata-proxy is running fine. But, as we can see in logs there is a
timeout.

If I restart all services on network node sometimes solves the problem. In
some cases I have to restart services on controller node (nova-api). So,
all work fine for some time and start to have problems again.

Where can I investigate to try finding the cause of the problem?

I appreciate any help. Thank you!

- JLC
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to