Cc linux...@kvack.org

W dniu 25.04.2019 o 11:25, Lech Perczak pisze:
> Hello,
>
> Some time ago, after upgrading the Kernel on our i.MX6Q-based boards to 
> mainline 4.18, and now to LTS 4.19 line, during stress tests we started 
> noticing strange warnings coming from 'read' syscall, when page_copy_sane() 
> check failed. Typical reproducibility is up to ~4 events per 24h. Warnings 
> origin from different processes, mostly involved with the stress tests, but 
> not necessarily with block devices we're stressing. If the warning appeared 
> in process relating to block device stress test, it would be accompanied by 
> corrupted data, as the read operation gets aborted. 
> Our reason for dropping caches is to ensure that the actual block device gets 
> accessed during the test without resorting to O_DIRECT.
> When dropping caches was disabled in the tests, the issue would also 
> disappear, at least in a single-week run.
>
> Example processes caught in the last run (grepped from stacktraces):
> [38128.430694] CPU: 2 PID: 32735 Comm: glmark2 Not tainted 
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [44168.081821] CPU: 2 PID: 21555 Comm: cat Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [75879.424076] CPU: 3 PID: 1478 Comm: flashcp Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [117060.951333] CPU: 1 PID: 14150 Comm: cat Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [201668.438219] CPU: 1 PID: 14370 Comm: glmark2 Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [214332.212960] CPU: 0 PID: 25633 Comm: cp Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [283484.619058] CPU: 0 PID: 5091 Comm: glmark2 Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [330256.709124] CPU: 2 PID: 4565 Comm: sensors Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
>
> When I started debugging the issue, I noticed that in all cases we're dealing 
> with highmem zero-order pages. In this case, page_head(page) == page, so 
> page_address(page) should be equal to page_address(head).
> However, it isn't the case, as page_address(head) in each case returns zero, 
> causing the value of "v" to explode, and the check to fail.
>
> At first I thought, that it could be caused by modification of struct page 
> itself, but it turned out to not be the case - sometimes struct page would be 
> unmodified during duration of the check. With lowmem pages, this wouldn't 
> also happen, because page_address is derived directly from struct page 
> pointer itself.
>
> Finally, after gathering a few traces with added debug logs I decided to 
> trigger a panic on the warning and capture a vmcore.
> When analyzing the vmcore, I confirmed that the struct page_address_slot 
> corresponding to the struct page had an empty list of mappings, explaining 
> NULL returned from page_address(head) in page_copy_sane().
> Yeah, I had to manually calculate hash of struct page pointer ;)
>
> In the meantime I noticed a patch by Eric Dumazet ("iov_iter: optimize 
> page_copy_sane()") [1]. Applying this patch silenced the warnings, at least 
> for test run going for over a week. This is quite expected, as with this 
> change 'page' isn't dereferenced at all, nor is page_address() called. Here I 
> suspect that applying this patch may only hide a real issue I'm facing, and 
> might be missing the case of highmem pages as well.
>
> An example stacktrace I captured, with added debug logs and copy of struct 
> page before and after the check in page_copy_sane:
>
> This last one shows no changes to struct page over the call to 
> page_copy_sane(), where first call to page_address(page) returns a valid page 
> address (page_address=2146996224) and second call returns 0 (head_address=0). 
> Subsequent calls to page_address inside WARN() argument list also return 0. 
> The same was true for the occurence I captured vmcore for.
>
> (For your reference, a patch containing my debug prints is at the end of the 
> message)
>
> [337685.344204] ------------[ cut here ]------------
> [337685.356870] WARNING: CPU: 0 PID: 30132 at 
> /mnt/work/git/platforms-manifest/build_devboardimx6q/tmp/work-shared/devboardimx6q/kernel-source/lib/iov_iter.c:834
>  page_copy_sane+0x13c/0x178
> [337685.420706] page_copy_sane: page=7721123e, offset=0, n=4096, 
> v=2147000320, head=7721123e, page_address=2146996224, head_address=0, 
> compound_order(head)=0, page_address(page)=0, page_address(head)=0 
> page_address(page_copy)=0, flags(page)=referenced|uptodate|lru|active|arch_1, 
> flags(page_copy)=referenced|uptodate|lru|active|arch_1
> [337685.494736] Modules linked in: xt_nat iptable_nat nf_nat_ipv4 nf_nat 
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables 
> usb_f_mass_storage usb_f_rndis u_ether ci_hdrc_imx ci_hdrc usbmisc_imx ulpi 
> libcomposite configfs udc_core
> [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G        W         
> 4.19.32-devboardimx6q+g0a64e3717985 #1
> [337685.558314] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [337685.563600] [<8010fe44>] (unwind_backtrace) from [<8010b840>] 
> (show_stack+0x10/0x14)
> [337685.570081] [<8010b840>] (show_stack) from [<807a5958>] 
> (dump_stack+0x88/0x9c)
> [337685.576033] [<807a5958>] (dump_stack) from [<8011f500>] 
> (__warn+0xfc/0x114)
> [337685.581721] [<8011f500>] (__warn) from [<8011f560>] 
> (warn_slowpath_fmt+0x48/0x6c)
> [337685.587936] [<8011f560>] (warn_slowpath_fmt) from [<803a0664>] 
> (page_copy_sane+0x13c/0x178)
> [337685.595004] [<803a0664>] (page_copy_sane) from [<803a2fe4>] 
> (copy_page_to_iter+0x18/0x474)
> [337685.602020] [<803a2fe4>] (copy_page_to_iter) from [<801bd9d4>] 
> (generic_file_read_iter+0x2cc/0x980)
> [337685.609794] [<801bd9d4>] (generic_file_read_iter) from [<8020949c>] 
> (__vfs_read+0xf8/0x158)
> [337685.616866] [<8020949c>] (__vfs_read) from [<80209588>] 
> (vfs_read+0x8c/0x118)
> [337685.622717] [<80209588>] (vfs_read) from [<80209aac>] 
> (ksys_read+0x4c/0xac)
> [337685.628396] [<80209aac>] (ksys_read) from [<80101000>] 
> (ret_fast_syscall+0x0/0x54)
> [337685.634668] Exception stack(0xa4e79fa8 to 0xa4e79ff0)
> [337685.638427] 9fa0:                   00000008 00000000 00000003 7ed03208 
> 00000200 00000000
> [337685.645318] 9fc0: 00000008 00000000 76fe8958 00000003 00000000 7ed03208 
> 76fe5640 7ed031b4
> [337685.652205] 9fe0: 76fe7cf0 7ed0316c 76fbeee0 76fd1b1c
> [337685.717080] ---[ end trace 660d072e57b3d168 ]---
> [337685.720698] page_copy 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c e6 
> 9b ed  l..@........\...
> [337685.720713] page_copy 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 00 
> 00 00  ................
> [337685.720721] *page     00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c e6 
> 9b ed  l..@........\...
> [337685.720728] *page     00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 00 
> 00 00  ................
>
> The scenario with dropping caches via /proc/sys/vm/drop_caches seems to be 
> rarely used (and tested), however I suspect that during dropping caches, some 
> pages might be unmapped too early causing other hard-to notice problems.
>
> As my findings are quite worrisome, I'd be glad to hear if my analysis is 
> valid, and if optimization made by Eric can be considered 'the fix', or maybe 
> it is too optimistic.
> Also as I'd like to investigate the issue further, I'd be glad for any hints 
> on where to continue.
>
> [1] https://lkml.org/lkml/2019/2/26/686, merged as 6daef95b8c91 in master
>
> Patch containing debug logs:
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 8be175df3075..ded7e073434c 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -815,14 +815,25 @@ bool _copy_from_iter_full_nocache(void *addr, size_t 
> bytes, struct iov_iter *i)
>  }
>  EXPORT_SYMBOL(_copy_from_iter_full_nocache);
>  
> -static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
> +static noinline bool page_copy_sane(struct page *page, size_t offset, size_t 
> n)
>  {
> +       struct page page_copy = *page;
>         struct page *head = compound_head(page);
> -       size_t v = n + offset + page_address(page) - page_address(head);
> +       size_t page_addr = (size_t) page_address(page);
> +       size_t head_addr = (size_t) page_address(head);
> +       size_t v = n + offset + page_addr - head_addr;
> +       unsigned int order = compound_order(head);
>  
> -       if (likely(n <= v && v <= (PAGE_SIZE << compound_order(head))))
> +       if (likely(n <= v && v <= (PAGE_SIZE << order)))
>                 return true;
> -       WARN_ON(1);
> +       WARN(1, "%s: page=%p, offset=%zu, n=%zu, v=%zu, head=%p, 
> page_address=%zu,"
> +               " head_address=%zu, compound_order(head)=%u, 
> page_address(page)=%zu,"
> +               " page_address(head)=%zu page_address(page_copy)=%zu, 
> flags(page)=%pGp, flags(page_copy)=%pGp\n",
> +            __func__, page, offset, n, v, head, page_addr, head_addr, order,
> +           (size_t) page_address(page), (size_t) page_address(head),
> +           (size_t) page_address(&page_copy), &page->flags, 
> &page_copy.flags);
> +       print_hex_dump_bytes("page_copy ", DUMP_PREFIX_OFFSET, &page_copy, 
> sizeof(page_copy));
> +       print_hex_dump_bytes("*page     ", DUMP_PREFIX_OFFSET, page, 
> sizeof(*page));
>         return false;
>  }
>
> -- Pozdrawiam/With kind regards, Lech Perczak Sr. Software Engineer Camlin 
> Technologies Poland Limited Sp. z o.o.
>
-- 
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.

Reply via email to