On 2019/06/21 0:25, Bart Van Assche wrote:
> On 6/19/19 8:48 PM, Damien Le Moal wrote:
>> + /*
>> + * Limit the command buffer size to the arbitrary SD_ZBC_REPORT_SIZE
>> + * size (1MB), allowing up to 16383 zone descriptors being reported with
>> + * a single command. And make sure that this size does not exceed the
>> + * hardware capabilities. To avoid disk revalidation failures due to
>> + * memory allocation errors, retry the allocation with a smaller buffer
>> + * size if the allocation fails.
>> + */
>> + bufsize = min_t(size_t, *buflen, SD_ZBC_REPORT_SIZE);
>> + bufsize = min_t(size_t, bufsize,
>> + queue_max_hw_sectors(disk->queue) << 9);
>> + for (order = get_order(bufsize); order >= 0; order--) {
>> + page = alloc_pages(gfp_mask, order);
>> + if (page) {
>> + *buflen = PAGE_SIZE << order;
>> + return page_address(page);
>> + }
>> + }
>
> Hi Damien,
>
> As you know Linux memory fragmentation tends to increase over time. The
> above code has the very unfortunate property that the more memory is
> fragmented the smaller the allocated buffer will become. I don't think
> that's how kernel code should work. Have you considered to use vmalloc()
> + blk_rq_map_sg() instead? See also efa_vmalloc_buf_to_sg() for an
> example of how to build a scatterlist for memory allocated by vmalloc().
Bart,
I have a fix along the lines you suggested, but since it modifies
bio_rq_map_kern(), I would rather not push that as a bug fix this late in RC.
Would you agree to accept the fix patch as is for now and I will send the more
complete fix for 5.3 ? Note that this more complete fix also reworks the similar
memory allocation for the struct blk_zone array used for zone revalidation. Put
all together, the use of report zones uses only vmalloc-ed buffers and data
structures, reduces pressure on the memory system and reducing chances of
failures.
Best regards.
--
Damien Le Moal
Western Digital Research