I reported the issue to the mailing list:
https://lwn.net/ml/linux-kernel/MW2PR2101MB0892FC0F67BD25661CDCE149BF529%40MW2PR2101MB0892.namprd21.prod.outlook.com/

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1928269

Title:
  netfilter: iptables-restore: setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE,
  "security...", ...) return -EAGAIN

Status in linux-azure package in Ubuntu:
  New

Bug description:
  Hi,
  I'm debugging an iptables-restore failure, which happens about 5% of the
  time when I keep stopping and starting the Linux VM. The VM has only 1
  CPU, and kernel version is 4.15.0-1098-azure, but I suspect the issue may
  also exist in the mainline Linux kernel.

  When the failure happens, it's always caused by line 27 of the rule
  file:

    1 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
    2 *raw
    3 :PREROUTING ACCEPT [0:0]
    4 :OUTPUT ACCEPT [0:0]
    5 -A PREROUTING ! -s 168.63.129.16/32 -p tcp -j NOTRACK
    6 -A OUTPUT ! -d 168.63.129.16/32 -p tcp -j NOTRACK
    7 COMMIT
    8 # Completed on Fri Apr 23 09:22:59 2021
    9 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
   10 *filter
   11 :INPUT ACCEPT [2407:79190058]
   12 :FORWARD ACCEPT [0:0]
   13 :OUTPUT ACCEPT [1648:2190051]
   14 -A OUTPUT -d 169.254.169.254/32 -m owner --uid-owner 33 -j DROP
   15 COMMIT
   16 # Completed on Fri Apr 23 09:22:59 2021
   17 # Generated by iptables-save v1.6.0 on Fri Apr 23 09:22:59 2021
   18 *security
   19 :INPUT ACCEPT [2345:79155398]
   20 :FORWARD ACCEPT [0:0]
   21 :OUTPUT ACCEPT [1504:2129015]
   22 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
   23 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW 
-j DROP
   24 -A OUTPUT -d 168.63.129.16/32 -p tcp -m owner --uid-owner 0 -j ACCEPT
   25 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW 
-j DROP
   26 -A OUTPUT -d 168.63.129.16/32 -p tcp -m conntrack --ctstate INVALID,NEW 
-j DROP
   27 COMMIT

  The related part of the strace log is:

    1 socket(PF_INET, SOCK_RAW, IPPROTO_RAW) = 3
    2 getsockopt(3, SOL_IP, IPT_SO_GET_INFO, 
"security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., [84]) = 0
    3 getsockopt(3, SOL_IP, IPT_SO_GET_ENTRIES, 
"security\0\357B\16Z\177\0\0Pg\355\0\0\0\0\0Pg\355\0\0\0\0\0"..., [880]) = 0
    4 setsockopt(3, SOL_IP, IPT_SO_SET_REPLACE, 
"security\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2200) = -1 
EAGAIN (Resource temporarily unavailable)
    5 close(3)                          = 0
    6 write(2, "iptables-restore: line 27 failed"..., 33) = 33

  The -EAGAIN error comes from line 1240 of xt_replace_table():

    do_ipt_set_ctl
      do_replace
        __do_replace
          xt_replace_table

  1216 xt_replace_table(struct xt_table *table,
  1217               unsigned int num_counters,
  1218               struct xt_table_info *newinfo,
  1219               int *error)
  1220 {
  1221         struct xt_table_info *private;
  1222         unsigned int cpu;
  1223         int ret;
  1224
  1225         ret = xt_jumpstack_alloc(newinfo);
  1226         if (ret < 0) {
  1227                 *error = ret;
  1228                 return NULL;
  1229         }
  1230
  1231         /* Do the substitution. */
  1232         local_bh_disable();
  1233         private = table->private;
  1234
  1235         /* Check inside lock: is the old number correct? */
  1236         if (num_counters != private->number) {
  1237                 pr_debug("num_counters != table->private->number 
(%u/%u)\n",
  1238                          num_counters, private->number);
  1239                 local_bh_enable();
  1240                 *error = -EAGAIN;
  1241                 return NULL;
  1242         }

  When the function returns -EAGAIN, the 'num_counters' is 5 while
  'private->number' is 6.

  If I re-run the iptables-restore program upon the failure, the program
  will succeed.

  I checked the function xt_replace_table() in the recent mainline kernel and it
  looks like the function is the same.

  It looks like there is a race condition between iptables-restore calls
  getsockopt() to get the number of table entries and iptables call
  setsockopt() to replace the entries? Looks like some other program is
  concurrently calling getsockopt()/setsockopt() -- but it looks like this is
  not the case according to the messages I print via trace_printk() around
  do_replace() in do_ipt_set_ctl(): when the -EAGAIN error happens, there is
  no other program calling do_replace(); the table entry number was changed
  to 5 by another program 'iptables' about 1.3 milliseconds ago, and then
  this program 'iptables-restore' calls setsockopt() and the kernel sees
  'num_counters' being 5 and the 'private->number' being 6 (how can this
  happen??); the next setsockopt() call for the same 'security' table
  happens in about 1 minute with both the numbers being 6.

  Can you please shed some light on the issue? Thanks!

  BTW, iptables does have a retry mechanism for getsockopt():
  2f93205b375e ("Retry ruleset dump when kernel returns EAGAIN.")
  
(https://git.netfilter.org/iptables/commit/libiptc?id=2f93205b375e&context=10&ignorews=0&dt=0)

  But it looks like this is enough? e.g. here getsockopt() returns 0, but
  setsockopt() returns -EAGAIN.

  Thanks,
  Dexuan

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1928269/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to