Bug#770479: [PATCH] Re: nbd: Fix timeout detection

Hermann Lauer Wed, 14 Oct 2015 03:39:54 -0700

Hello Ben,

On Mon, Sep 28, 2015 at 01:29:23AM +0100, Ben Hutchings wrote:
> > There has been no unusual messages in the kernel log during a disconnection 
> > which
> > happened (server process restart during update when Wouter fixed the 
> > nbd-server
> > access code).
> > 
> > If more testing or something else is needed, please tell me.
> 
> After reviewing the patch, I'm afraid I can't accept it.  Hopefully
> upstream will come up with a proper fix in response to my comments.


based on Markus fix for 4.3 appended is a backported fix to be applied on top.
Tested again with the current jessie kernel without problems as written above.

Please review and tell me if I can do more.
Thanks a lot,
  Hermann

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: hermann.la...@iwr.uni-heidelberg.de

Backport for jessie kernel, based on:
The timeout handling introduced in
	7e2893a16d3e (nbd: Fix timeout detection)
introduces a race condition which may lead to killing of tasks that are
not in nbd context anymore. This was not observed or reproducable yet.

This patch adds locking to critical use of task_recv and task_send to
avoid killing tasks that already left the NBD thread functions. This
lock is only acquired if a timeout occures or the nbd device
starts/stops.

Reported-by: Ben Hutchings <b...@decadent.org.uk>
Tested-by: Hermann Lauer <hermann.la...@iwr.uni-heidelberg.de>
---
 drivers/block/nbd.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -142,21 +142,23 @@
 static void nbd_xmit_timeout(unsigned long arg)
 {
 	struct nbd_device *nbd = (struct nbd_device *)arg;
-	struct task_struct *task;
+	unsigned long flags;
 
 	if (list_empty(&nbd->queue_head))
 		return;
 
 	nbd->disconnect = 1;
 
-	task = READ_ONCE(nbd->task_recv);
-	if (task)
-		force_sig(SIGKILL, task);
+	spin_lock_irqsave(&nbd->tasks_lock, flags);
 
-	task = READ_ONCE(nbd->task_send);
-	if (task)
+	if (nbd->task_recv)
+		force_sig(SIGKILL, nbd->task_recv);
+	
+	if (nbd->task_send)
 		force_sig(SIGKILL, nbd->task_send);
 
+	spin_unlock_irqrestore(&nbd->tasks_lock, flags);
+
 	dev_err(disk_to_dev(nbd->disk), "Connection timed out, killed receiver and sender, shutting down connection\n");
 }
 
@@ -408,6 +410,7 @@
 {
 	struct request *req;
 	int ret;
+	unsigned long flags;
 
 	BUG_ON(nbd->magic != NBD_MAGIC);
 
@@ -425,7 +428,9 @@
 	while ((req = nbd_read_stat(nbd)) != NULL)
 		nbd_end_request(req);
 
+	spin_lock_irqsave(&nbd->tasks_lock, flags);
 	nbd->task_recv = NULL;
+	spin_unlock_irqrestore(&nbd->tasks_lock, flags);
 
 	if (signal_pending(current)) {
 		siginfo_t info;
@@ -541,8 +546,11 @@
 {
 	struct nbd_device *nbd = data;
 	struct request *req;
+	unsigned long flags;
 
+	spin_lock_irqsave(&nbd->tasks_lock, flags);
 	nbd->task_send = current;
+	spin_unlock_irqrestore(&nbd->tasks_lock, flags);
 
 	set_user_nice(current, MIN_NICE);
 	while (!kthread_should_stop() || !list_empty(&nbd->waiting_queue)) {
@@ -577,7 +585,15 @@
 		nbd_handle_req(nbd, req);
 	}
 
+	spin_lock_irqsave(&nbd->tasks_lock, flags);
 	nbd->task_send = NULL;
+	spin_unlock_irqrestore(&nbd->tasks_lock, flags);
+
+	/* Clear maybe pending signals */
+	if (signal_pending(current)) {
+		siginfo_t info;
+		dequeue_signal_lock(current, &current->blocked, &info);
+	}
 
 	return 0;
 }
@@ -902,6 +918,7 @@
 		nbd_dev[i].magic = NBD_MAGIC;
 		INIT_LIST_HEAD(&nbd_dev[i].waiting_queue);
 		spin_lock_init(&nbd_dev[i].queue_lock);
+		spin_lock_init(&nbd_dev[i].tasks_lock);
 		INIT_LIST_HEAD(&nbd_dev[i].queue_head);
 		mutex_init(&nbd_dev[i].tx_lock);
 		init_timer(&nbd_dev[i].timeout_timer);
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -43,6 +43,7 @@
 	int disconnect; /* a disconnect has been requested by user */
 
 	struct timer_list timeout_timer;
+	spinlock_t tasks_lock;
 	struct task_struct *task_recv;
 	struct task_struct *task_send;
 };

Bug#770479: [PATCH] Re: nbd: Fix timeout detection

Reply via email to