On Sun, 2019-04-28 at 15:39 +0800, Ming Lei wrote: > Now scsi_mq_setup_tags() pre-allocates a big buffer for IO sg list, > and the buffer size is scsi_mq_sgl_size() which depends on smaller > value between shost->sg_tablesize and SG_CHUNK_SIZE. > > Modern HBA's DMA is often capable of deadling with very big segment > number, so scsi_mq_sgl_size() is often big. Suppose the max sg number > of SG_CHUNK_SIZE is taken, scsi_mq_sgl_size() will be 4KB. > > Then if one HBA has lots of queues, and each hw queue's depth is > high, pre-allocation for sg list can consume huge memory. > For example of lpfc, nr_hw_queues can be 70, each queue's depth > can be 3781, so the pre-allocation for data sg list is 70*3781*2k > =517MB for single HBA. > > There is Red Hat internal report that scsi_debug based tests can't > be run any more since legacy io path is killed because too big > pre-allocation. > > So switch to runtime allocation for sg list, meantime pre-allocate 2 > inline sg entries. This way has been applied to NVMe PCI for a while, > so it should be fine for SCSI too. Also runtime sg entries allocation > has verified and run always in the original legacy io path. > > Not see performance effect in my big BS test on scsi_debug.
Reviewed-by: Bart Van Assche <bvanass...@acm.org>