On Thu, Aug 1, 2019 at 10:15 AM Andrea Righi <andrea.ri...@canonical.com>
wrote:

> Thanks Ryan, this is very interesting:
>
> [ 259.411486] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
> [ 259.537070] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
> [ 259.797830] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
> [ 259.900392] bcache: register_bcache() error /dev/vdg: device already
> registered (emitting change event)
>
> It looks that we're trying to register /dev/vdg multiple times as a
> backing device (make-bcache -B). I'm not getting this message during my
> tests, so that might be required to reproduce that particular deadlock.
>

We carry a specific sauce patch to ensure that if the cacheset is already
online, and a backing device
shows up later, that the kernel emits the change event to trigger the udev
rules to generate the
symlink for /dev/bcache/by-uuid.  I don't think the patch we carry is at
issue since we are just detecting
the re-register scenario and emitting a change uevent;

https://www.spinics.net/lists/linux-bcache/msg05833.html

We may want to resubmit that now to see if they'll take that or even want
to deal with the scenario
in a cleaner way;


>
> I'll modify my test case to trigger these errors and see if I can
> reproduce the hung task timeout issue.
>

I can provide you a setup to reproduce this.  I'll put together a doc.


> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1796292
>
> Title:
>   Tight timeout for bcache removal causes spurious failures
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions
>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1796292

Title:
  Tight timeout for bcache removal causes spurious failures

Status in curtin:
  Fix Released
Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Confirmed

Bug description:
  I've had a number of deployment faults where curtin would report
  Timeout exceeded for removal of /sys/fs/bcache/xxx when doing a mass-
  deployment of 30+ nodes. Upon retrying the node would usually deploy
  fine. Experimentally I've set the timeout ridiculously high, and it
  seems I'm getting no faults with this. I'm wondering if the timeout
  for removal is set too tight, or might need to be made configurable.

  --- curtin/util.py~     2018-05-18 18:40:48.000000000 +0000
  +++ curtin/util.py      2018-10-05 09:40:06.807390367 +0000
  @@ -263,7 +263,7 @@
       return _subp(*args, **kwargs)
   
   
  -def wait_for_removal(path, retries=[1, 3, 5, 7]):
  +def wait_for_removal(path, retries=[1, 3, 5, 7, 1200, 1200]):
       if not path:
           raise ValueError('wait_for_removal: missing path parameter')

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1796292/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to