Public bug reported:

[Impact]

apt-daily.service is launched by a timer that depends on network-
online.target (after the fixes for bug 1686470 are in everywhere)

At boot that is mostly sufficient for it to have network online, but it
does not seem to work all the time, and we might be disagreeing with
network-manager and friends what online state means.

At resume time, network-online.target is still active, so the service is
started as soon as possible when it tries to catch up. Depending on the
timing, the network connectivity might not be there yet, and it will
fail and only retry 12 hours later.

[Proposed solution]
Introduce a new apt-helper wait-online that tries to connect() to remote hosts 
specified in sources.list until one connection works or a TIMEOUT is reached. 
The proposed algorithm looks something like this:

while (time elapsed < TIMEOUT):
  for each entry:
    host = gethostbyname()
    if host failed:
      continue
    fd = connect to it
    if fd is invalid:
      continue

    all fds += fd

    if poll(all fds, 100 ms timeout) finds a connected one:
      exit(0)

exit(42) # timeout

There are two things to consider:
* gethostbyname() and connect() may fail if network is not up yet, so we need 
to retry (we might need to sleep somewhere)
* If poll() fails, we likely sleep enough, so no extra sleep needed.

I believe the time out should be something like 30s.

On the systemd service side, we add:
  ExecStartPre=/usr/lib/apt/apt-helper wait-online
  RestartForceExitStatus=42
  RestartSec=15m

To retry the service after 15 minutes.

[Test case]
* Start apt-daily.service after turning off network -> It should wait (in 
ExecStartPre)
* Turn on network -> apt-daily.service should start

[Regression potential]
There might be increased I/O activity after resume, if that did not work before.

** Affects: apt (Ubuntu)
     Importance: High
     Assignee: Julian Andres Klode (juliank)
         Status: Triaged

** Changed in: apt (Ubuntu)
     Assignee: (unassigned) => Julian Andres Klode (juliank)

** Description changed:

  [Impact]
  
  apt-daily.service is launched by a timer that depends on network-
  online.target (after the fixes for 1686470 are in everywhere)
  
  At boot that is mostly sufficient for it to have network online, but it
  does not seem to work all the time, and we might be disagreeing with
  network-manager and friends what online state means.
  
  At resume time, network-online.target is still active, so the service is
  started as soon as possible when it tries to catch up. Depending on the
  timing, the network connectivity might not be there yet, and it will
  fail and only retry 12 hours later.
  
  [Proposed solution]
  Introduce a new apt-helper wait-online that tries to connect() to remote 
hosts specified in sources.list until one connection works or a TIMEOUT is 
reached. The proposed algorithm looks something like this:
  
  while (time elapsed < TIMEOUT):
-   for each entry:
-     host = gethostbyname()
-     if host failed:
-       continue
-     fd = connect to it
-     if fd is invalid:
-       continue
+   for each entry:
+     host = gethostbyname()
+     if host failed:
+       continue
+     fd = connect to it
+     if fd is invalid:
+       continue
  
-     all fds += fd
-   
-     if poll(all fds, 100 ms timeout) finds a connected one:
-       exit(0)
+     all fds += fd
+ 
+     if poll(all fds, 100 ms timeout) finds a connected one:
+       exit(0)
  
  exit(42) # timeout
  
  There are two things to consider:
  * gethostbyname() and connect() may fail if network is not up yet, so we need 
to retry (we might need to sleep somewhere)
  * If poll() fails, we likely sleep enough, so no extra sleep needed.
  
  I believe the time out should be something like 30s.
  
  On the systemd service side, we add:
-   RestartForceExitStatus=42
-   RestartSec=15m
+   ExecStartPre=/usr/lib/apt/apt-helper wait-online
+   RestartForceExitStatus=42
+   RestartSec=15m
  
  To retry the service after 15 minutes.
+ 
+ [Test case]
+ * Start apt-daily.service after turning off network -> It should wait (in 
ExecStartPre)
+ * Turn on network -> apt-daily.service should start
+ 
+ [Regression potential]
+ There might be increased I/O activity after resume, if that did not work 
before.

** Changed in: apt (Ubuntu)
       Status: New => Triaged

** Changed in: apt (Ubuntu)
   Importance: Undecided => High

** Description changed:

  [Impact]
  
  apt-daily.service is launched by a timer that depends on network-
- online.target (after the fixes for 1686470 are in everywhere)
+ online.target (after the fixes for bug 1686470 are in everywhere)
  
  At boot that is mostly sufficient for it to have network online, but it
  does not seem to work all the time, and we might be disagreeing with
  network-manager and friends what online state means.
  
  At resume time, network-online.target is still active, so the service is
  started as soon as possible when it tries to catch up. Depending on the
  timing, the network connectivity might not be there yet, and it will
  fail and only retry 12 hours later.
  
  [Proposed solution]
  Introduce a new apt-helper wait-online that tries to connect() to remote 
hosts specified in sources.list until one connection works or a TIMEOUT is 
reached. The proposed algorithm looks something like this:
  
  while (time elapsed < TIMEOUT):
    for each entry:
      host = gethostbyname()
      if host failed:
        continue
      fd = connect to it
      if fd is invalid:
        continue
  
      all fds += fd
  
      if poll(all fds, 100 ms timeout) finds a connected one:
        exit(0)
  
  exit(42) # timeout
  
  There are two things to consider:
  * gethostbyname() and connect() may fail if network is not up yet, so we need 
to retry (we might need to sleep somewhere)
  * If poll() fails, we likely sleep enough, so no extra sleep needed.
  
  I believe the time out should be something like 30s.
  
  On the systemd service side, we add:
-   ExecStartPre=/usr/lib/apt/apt-helper wait-online
+   ExecStartPre=/usr/lib/apt/apt-helper wait-online
    RestartForceExitStatus=42
    RestartSec=15m
  
  To retry the service after 15 minutes.
  
  [Test case]
  * Start apt-daily.service after turning off network -> It should wait (in 
ExecStartPre)
  * Turn on network -> apt-daily.service should start
  
  [Regression potential]
  There might be increased I/O activity after resume, if that did not work 
before.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1699850

Title:
  Reliable network connectivity for apt-daily

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1699850/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to