Hi Linus, Here are a collection of fixes to fix afs_cell struct refcounting, thereby fixing a slew of related syzbot bugs:
(1) Fix the cell tree in the netns to use an rwsem rather than RCU. There seem to be some problems deriving from the use of RCU and a seqlock to walk the rbtree, but it's not entirely clear what since there are several different failures being seen. Changing things to use an rwsem instead makes it more robust. The extra performance derived from using RCU isn't necessary in this case since the only time we're looking up a cell is during mount or when cells are being manually added. (2) Fix the refcounting by splitting the usage counter into a memory refcount and an active users counter. The usage counter was doing double duty, keeping track of whether a cell is still in use and keeping track of when it needs to be destroyed - but this makes the clean up tricky. Separating these out simplifies the logic. (3) Fix purging a cell that has an alias. A cell alias pins the cell it's an alias of, but the alias is always later in the list. Trying to purge in a single pass causes rmmod to hang in such a case. (4) Fix cell removal. If a cell's manager is requeued whilst it's removing itself, the manager will run again and re-remove itself, causing problems in various places. Follow Hillf Danton's suggestion to insert a more terminal state that causes the manager to do nothing post-removal. In additional to the above, I've included two more patches: (1) Add a tracepoint for the cell refcount and active users count. This helped with debugging the above and may be useful again in future. (2) Downgrade an assertion to a print when a still-active server is seen during purging. This was happening as a consequence of incomplete cell removal before the servers were cleaned up. David --- The following changes since commit bbf5c979011a099af5dc76498918ed7df445635b: Linux 5.9 (2020-10-11 14:15:50 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/afs-fixes-20201016 for you to fetch changes up to 7530d3eb3dcf1a30750e8e7f1f88b782b96b72b8: afs: Don't assert on unpurgeable server records (2020-10-16 14:39:34 +0100) ---------------------------------------------------------------- afs fixes ---------------------------------------------------------------- David Howells (6): afs: Fix rapid cell addition/removal by not using RCU on cells tree afs: Fix cell refcounting by splitting the usage counter afs: Fix cell purging with aliases afs: Fix cell removal afs: Add tracing for cell refcount and active user count afs: Don't assert on unpurgeable server records fs/afs/cell.c | 328 +++++++++++++++++++++++++++++---------------- fs/afs/dynroot.c | 23 ++-- fs/afs/internal.h | 20 ++- fs/afs/main.c | 2 +- fs/afs/mntpt.c | 4 +- fs/afs/proc.c | 23 ++-- fs/afs/server.c | 7 +- fs/afs/super.c | 18 +-- fs/afs/vl_alias.c | 8 +- fs/afs/vl_rotate.c | 2 +- fs/afs/volume.c | 6 +- include/trace/events/afs.h | 109 +++++++++++++++ 12 files changed, 378 insertions(+), 172 deletions(-)