On Wed, May 17, 2017 at 1:46 PM, George Dunlap <george.dun...@citrix.com> wrote:
> On 17/05/17 13:43, Ian Jackson wrote:
>> George Dunlap writes ("Re: [Xen-devel] [Xen-users] vif-bridge errors when 
>> creating and destroying dozens of VMs simultaneously"):
>>> So we have three options:
>> ...
>>> 3. Try to check to see if the version of iptables we have supports -w,
>>> and use it if available.  This should also work on all systems, but
>>> introduces a bit of complication.  It also doesn't allow us to
>>> reliably use a timeout.
>>
>> I think this is best.  Eventually we can get rid of the check for -w.
>>
>> I think a timeout in this context is not very helpful.
>>
>> Also, a loop, on a busy system, might need to have many attempts,
>> because it will be polling.
>
> FWIW the iptables internal mechanism will try to grab the lock, and if
> it fails (and -w is set), will call sleep(1) before trying again.  My
> bash loop would do exactly the same thing.
>
> But I agree that if timeouts are not important, doing it via iptables is
> probably cleaner.  Let me work up a patch.

Antony,

Attached is a patch to add the -w option if it's available.  I've
smoke-tested that it works under normal conditions; but my simplistic
attempts to get the bug to trigger have failed.  Can you give it a try
and see if it works?

Thanks,
 -George
From 7ab50acde39a05de664646ba58d5892f0b8fe353 Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dun...@citrix.com>
Date: Wed, 17 May 2017 11:36:25 +0100
Subject: [PATCH] vif-common.sh: Have iptables wait for the xtables lock

iptables has a system-wide lock on the xtables.  Strangely though, in
the case of two concurrent invocations, the default is for the
instance not grabbing the lock to exit out rather than waiting for it.
This means that when starting a large number of guests in parallel,
many will fail out with messages like this:

  2017-05-10 11:45:40 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge remove [18767] exited with error status 4
  2017-05-10 11:50:52 UTC libxl: error: libxl_exec.c:118: libxl_report_child_exitstatus: /etc/xen/scripts/vif-bridge offline [1554] exited with error status 4

In order to instruct iptables to wait for the lock, you have to
specify '-w'.  Unfortunately, not all versions of iptables have the
'-w' option, so on first invocation check to see if it accepts the -w
command.

Reported-by: Antony Saba <aws...@gmail.com>
Signed-off-by: George Dunlap <george.dun...@citrix.com>
---
CC: Ian Jackson <ian.jack...@eu.citrix.com>
CC: Wei Liu <wei.l...@citrix.com>
---
 tools/hotplug/Linux/vif-common.sh | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/tools/hotplug/Linux/vif-common.sh b/tools/hotplug/Linux/vif-common.sh
index 6e8d584..60ccce8 100644
--- a/tools/hotplug/Linux/vif-common.sh
+++ b/tools/hotplug/Linux/vif-common.sh
@@ -120,6 +120,35 @@ fi
 ip=${ip:-}
 ip=$(xenstore_read_default "$XENBUS_PATH/ip" "$ip")
 
+IPTABLES_WAIT_RUNE="-w"
+IPTABLES_WAIT_RUNE_CHECKED=false
+
+# If iptables tries to grab the xtable lock and fails, instead if
+# waiting for it by default, it exits with error 4.  They have since
+# added an option, `-w`, to specify the more sensible behavior. But it
+# was only introduced in 2013, so there are probably still systems
+# around which don't support it.  So check to see if it's supported
+# the first time we use it.
+iptables_w()
+{
+    if ! $IPTABLES_WAIT_RUNE_CHECKED ; then
+	iptables $IPTABLES_WAIT_RUNE -L -n >& /dev/null
+	# If it fails with -w and succeeds without, remove the rune
+	if [[ $? == 2 ]] ; then
+	    iptables -L -n >& /dev/null
+	    if [[ $? != 2 ]] ; then
+		# If we fail with PARAMETER_PROBLEM with -w and don't fail
+		# with PARAMETER_PRIBLEM without it, then it's the -w option
+		IPTABLES_WAIT_RUNE_CHECKED=true
+		IPTABLES_WAIT_RUNE=""
+	    fi
+	else
+	    IPTABLES_WAIT_RUNE_CHECKED=true
+	fi
+    fi
+    iptables $IPTABLES_WAIT_RUNE "$@"
+}
+
 frob_iptable()
 {
   if [ "$command" == "online" -o "$command" == "add" ]
@@ -129,9 +158,9 @@ frob_iptable()
     local c="-D"
   fi
 
-  iptables "$c" FORWARD -m physdev --physdev-is-bridged --physdev-in "$dev" \
+  iptables_w "$c" FORWARD -m physdev --physdev-is-bridged --physdev-in "$dev" \
     "$@" -j ACCEPT 2>/dev/null &&
-  iptables "$c" FORWARD -m physdev --physdev-is-bridged --physdev-out "$dev" \
+  iptables_w "$c" FORWARD -m physdev --physdev-is-bridged --physdev-out "$dev" \
     -j ACCEPT 2>/dev/null
 
   if [ \( "$command" == "online" -o "$command" == "add" \) -a $? -ne 0 ]
@@ -154,7 +183,7 @@ handle_iptable()
   # binary is not sufficient, because the user may not have the appropriate
   # modules installed.  If iptables is not working, then there's no need to do
   # anything with it, so we can just return.
-  if ! iptables -L -n >&/dev/null
+  if ! iptables_w -L -n >&/dev/null
   then
     return
   fi
-- 
2.1.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to