It is observed that IOPS can be improved much by simply making
hw queue per NUMA node on null_blk, so this patch applies the
introduced .host_tagset for improving performance.

In reality, .can_queue is quite big, and NUMA node number is
often small, so each hw queue's depth should be high enough to
saturate device.

Cc: Arun Easi <arun.e...@cavium.com>
Cc: Omar Sandoval <osan...@fb.com>,
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>,
Cc: James Bottomley <james.bottom...@hansenpartnership.com>,
Cc: Christoph Hellwig <h...@lst.de>,
Cc: Don Brace <don.br...@microsemi.com>
Cc: Kashyap Desai <kashyap.de...@broadcom.com>
Cc: Peter Rivera <peter.riv...@broadcom.com>
Cc: Laurence Oberman <lober...@redhat.com>
Cc: Hannes Reinecke <h...@suse.de>
Cc: Mike Snitzer <snit...@redhat.com>
Signed-off-by: Ming Lei <ming....@redhat.com>
---
 drivers/scsi/hpsa.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 3a9eca163db8..0747751b7e1c 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -978,6 +978,7 @@ static struct scsi_host_template hpsa_driver_template = {
        .shost_attrs = hpsa_shost_attrs,
        .max_sectors = 1024,
        .no_write_same = 1,
+       .host_tagset = 1,
 };
 
 static inline u32 next_command(struct ctlr_info *h, u8 q)
@@ -5761,6 +5762,11 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h)
 static int hpsa_scsi_add_host(struct ctlr_info *h)
 {
        int rv;
+       /* 256 tags should be high enough to saturate device */
+       int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256);
+
+       /* per NUMA node hw queue */
+       h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);
 
        rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
        if (rv) {
-- 
2.9.5

Reply via email to