+1 to this. Evan, can you report a bug (if one hasn't been reported yet)
and
propose the fix? Or else I can find someone else to propose it.
Vish
On Aug 23, 2012, at 1:38 PM, Evan Callicoat <diop...@gmail.com> wrote:
Hello all!
I'm the original author of the hairpin patch, and things have changed a
little bit in Essex and Folsom from the original Diablo target. I
believe I
can shed some light on what should be done here to solve the issue in
either
case.
---
For Essex (stable/essex), in nova/virt/libvirt/connection.py:
---
Currently _enable_hairpin() is only being called from spawn(). However,
spawn() is not the only place that vifs (veth#) get added to a bridge
(which
is when we need to enable hairpin_mode on them). The more relevant
function
is _create_new_domain(), which is called from spawn() and other places.
Without changing the information that gets passed to
_create_new_domain()
(which is just 'xml' from to_xml()), we can easily rewrite the first 2
lines
in _enable_hairpin(), as follows:
def _enable_hairpin(self, xml):
interfaces = self.get_interfaces(xml['name'])
Then, we can move the self._enable_hairpin(instance) call from spawn()
up
into _create_new_domain(), and pass it xml as follows:
[...]
self._enable_hairpin(xml)
return domain
This will run the hairpin code every time a domain gets created, which
is
also when the domain's vif(s) gets inserted into the bridge with the
default
of hairpin_mode=0.
---
For Folsom (trunk), in nova/virt/libvirt/driver.py:
---
There've been a lot more changes made here, but the same strategy as
above
should work. Here, _create_new_domain() has been split into
_create_domain()
and _create_domain_and_network(), and _enable_hairpin() was moved from
spawn() to _create_domain_and_network(), which seems like it'd be the
right
thing to do, but doesn't quite cover all of the cases of vif
reinsertion,
since _create_domain() is the only function which actually creates the
domain (_create_domain_and_network() just calls it after doing some
pre-work). The solution here is likewise fairly simple; make the same 2
changes to _enable_hairpin():
def _enable_hairpin(self, xml):
interfaces = self.get_interfaces(xml['name'])
And move it from _create_domain_and_network() to _create_domain(), like
before:
[...]
self._enable_hairpin(xml)
return domain
I haven't yet tested this on my Essex clusters and I don't have a Folsom
cluster handy at present, but the change is simple and makes sense.
Looking
at to_xml() and _prepare_xml_info(), it appears that the 'xml' variable
_create_[new_]domain() gets is just a python dictionary, and xml['name']
=
instance['name'], exactly what _enable_hairpin() was using the
'instance'
variable for previously.
Let me know if this works, or doesn't work, or doesn't make sense, or if
you
need an address to send gifts, etc. Hope it's solved!
-Evan
On Thu, Aug 23, 2012 at 11:20 AM, Sam Su <susltd...@gmail.com> wrote:
Hi Oleg,
Thank you for your investigation. Good lucky!
Can you let me know if find how to fix the bug?
Thanks,
Sam
On Wed, Aug 22, 2012 at 12:50 PM, Oleg Gelbukh <ogelb...@mirantis.com>
wrote:
Hello,
Is it possible that, during snapshotting, libvirt just tears down
virtual
interface at some point, and then re-creates it, with hairpin_mode
disabled
again?
This bugfix [https://bugs.launchpad.net/nova/+bug/933640] implies that
fix works on spawn of instance. This means that upon resume after
snapshot,
hairpin is not restored. May be if we insert the _enable_hairpin()
call in
snapshot procedure, it helps.
We're currently investigating this issue in one of our environments,
hope
to come up with answer by tomorrow.
--
Best regards,
Oleg
On Wed, Aug 22, 2012 at 11:29 PM, Sam Su <susltd...@gmail.com> wrote:
My friend has found a way to enable ping itself, when this problem
happened. But not found why this happen.
sudo echo "1" >
/sys/class/net/br1000/brif/<virtual-interface-name>/hairpin_mode
I file a ticket to report this problem:
https://bugs.launchpad.net/nova/+bug/1040255
hopefully someone can find why this happen and solve it.
Thanks,
Sam
On Fri, Jul 20, 2012 at 3:50 PM, Gabriel Hurley
<gabriel.hur...@nebula.com> wrote:
I ran into some similar issues with the _enable_hairpin() call. The
call is allowed to fail silently and (in my case) was failing. I
couldn’t
for the life of me figure out why, though, and since I’m really not
a
networking person I didn’t trace it along too far.
Just thought I’d share my similar pain.
- Gabriel
From:
openstack-bounces+gabriel.hurley=nebula....@lists.launchpad.net
[mailto:openstack-bounces+gabriel.hurley=nebula....@lists.launchpad.net] On
Behalf Of Sam Su
Sent: Thursday, July 19, 2012 11:50 AM
To: Brian Haley
Cc: openstack
Subject: Re: [Openstack] VM can't ping self floating IP after a
snapshot is taken
Thank you for your support.
I checked the file nova/virt/libvirt/connection.py, the sentence
self._enable_hairpin(instance) is already added to the function
_hard_reboot().
It looks like there are some difference between taking snapshot and
reboot instance. I tried to figure out how to fix this bug but
failed.
It will be much appreciated if anyone can give some hints.
Thanks,
Sam
On Thu, Jul 19, 2012 at 8:37 AM, Brian Haley <brian.ha...@hp.com>
wrote:
On 07/17/2012 05:56 PM, Sam Su wrote:
Hi,
Just This always happens in Essex release. After I take a snapshot
of
my VM ( I
tried Ubuntu 12.04 or CentOS 5.8), VM can't ping its self floating
IP; before I
take a snapshot though, VM can ping its self floating IP.
This looks closely related to
https://bugs.launchpad.net/nova/+bug/933640, but
still a little different. In 933640, it sounds like VM can't ping
its
self
floating IP regardless whether we take a snapshot or not.
Any suggestion to make an easy fix? And what is the root cause of
the
problem?
It might be because there's a missing _enable_hairpin() call in the
reboot()
function. Try something like this...
nova/virt/libvirt/connection.py, _hard_reboot():
self._create_new_domain(xml)
+ self._enable_hairpin(instance)
self.firewall_driver.apply_instance_filter(instance,
network_info)
At least that's what I remember doing myself recently when testing
after a
reboot, don't know about snapshot.
Folsom has changed enough that something different would need to be
done there.
-Brian
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp