Cc linux...@kvack.org W dniu 25.04.2019 o 11:25, Lech Perczak pisze: > Hello, > > Some time ago, after upgrading the Kernel on our i.MX6Q-based boards to > mainline 4.18, and now to LTS 4.19 line, during stress tests we started > noticing strange warnings coming from 'read' syscall, when page_copy_sane() > check failed. Typical reproducibility is up to ~4 events per 24h. Warnings > origin from different processes, mostly involved with the stress tests, but > not necessarily with block devices we're stressing. If the warning appeared > in process relating to block device stress test, it would be accompanied by > corrupted data, as the read operation gets aborted. > Our reason for dropping caches is to ensure that the actual block device gets > accessed during the test without resorting to O_DIRECT. > When dropping caches was disabled in the tests, the issue would also > disappear, at least in a single-week run. > > Example processes caught in the last run (grepped from stacktraces): > [38128.430694] CPU: 2 PID: 32735 Comm: glmark2 Not tainted > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [44168.081821] CPU: 2 PID: 21555 Comm: cat Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [75879.424076] CPU: 3 PID: 1478 Comm: flashcp Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [117060.951333] CPU: 1 PID: 14150 Comm: cat Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [201668.438219] CPU: 1 PID: 14370 Comm: glmark2 Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [214332.212960] CPU: 0 PID: 25633 Comm: cp Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [283484.619058] CPU: 0 PID: 5091 Comm: glmark2 Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [330256.709124] CPU: 2 PID: 4565 Comm: sensors Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > > When I started debugging the issue, I noticed that in all cases we're dealing > with highmem zero-order pages. In this case, page_head(page) == page, so > page_address(page) should be equal to page_address(head). > However, it isn't the case, as page_address(head) in each case returns zero, > causing the value of "v" to explode, and the check to fail. > > At first I thought, that it could be caused by modification of struct page > itself, but it turned out to not be the case - sometimes struct page would be > unmodified during duration of the check. With lowmem pages, this wouldn't > also happen, because page_address is derived directly from struct page > pointer itself. > > Finally, after gathering a few traces with added debug logs I decided to > trigger a panic on the warning and capture a vmcore. > When analyzing the vmcore, I confirmed that the struct page_address_slot > corresponding to the struct page had an empty list of mappings, explaining > NULL returned from page_address(head) in page_copy_sane(). > Yeah, I had to manually calculate hash of struct page pointer ;) > > In the meantime I noticed a patch by Eric Dumazet ("iov_iter: optimize > page_copy_sane()") [1]. Applying this patch silenced the warnings, at least > for test run going for over a week. This is quite expected, as with this > change 'page' isn't dereferenced at all, nor is page_address() called. Here I > suspect that applying this patch may only hide a real issue I'm facing, and > might be missing the case of highmem pages as well. > > An example stacktrace I captured, with added debug logs and copy of struct > page before and after the check in page_copy_sane: > > This last one shows no changes to struct page over the call to > page_copy_sane(), where first call to page_address(page) returns a valid page > address (page_address=2146996224) and second call returns 0 (head_address=0). > Subsequent calls to page_address inside WARN() argument list also return 0. > The same was true for the occurence I captured vmcore for. > > (For your reference, a patch containing my debug prints is at the end of the > message) > > [337685.344204] ------------[ cut here ]------------ > [337685.356870] WARNING: CPU: 0 PID: 30132 at > /mnt/work/git/platforms-manifest/build_devboardimx6q/tmp/work-shared/devboardimx6q/kernel-source/lib/iov_iter.c:834 > page_copy_sane+0x13c/0x178 > [337685.420706] page_copy_sane: page=7721123e, offset=0, n=4096, > v=2147000320, head=7721123e, page_address=2146996224, head_address=0, > compound_order(head)=0, page_address(page)=0, page_address(head)=0 > page_address(page_copy)=0, flags(page)=referenced|uptodate|lru|active|arch_1, > flags(page_copy)=referenced|uptodate|lru|active|arch_1 > [337685.494736] Modules linked in: xt_nat iptable_nat nf_nat_ipv4 nf_nat > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables x_tables > usb_f_mass_storage usb_f_rndis u_ether ci_hdrc_imx ci_hdrc usbmisc_imx ulpi > libcomposite configfs udc_core > [337685.549556] CPU: 2 PID: 30132 Comm: sensors Tainted: G W > 4.19.32-devboardimx6q+g0a64e3717985 #1 > [337685.558314] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [337685.563600] [<8010fe44>] (unwind_backtrace) from [<8010b840>] > (show_stack+0x10/0x14) > [337685.570081] [<8010b840>] (show_stack) from [<807a5958>] > (dump_stack+0x88/0x9c) > [337685.576033] [<807a5958>] (dump_stack) from [<8011f500>] > (__warn+0xfc/0x114) > [337685.581721] [<8011f500>] (__warn) from [<8011f560>] > (warn_slowpath_fmt+0x48/0x6c) > [337685.587936] [<8011f560>] (warn_slowpath_fmt) from [<803a0664>] > (page_copy_sane+0x13c/0x178) > [337685.595004] [<803a0664>] (page_copy_sane) from [<803a2fe4>] > (copy_page_to_iter+0x18/0x474) > [337685.602020] [<803a2fe4>] (copy_page_to_iter) from [<801bd9d4>] > (generic_file_read_iter+0x2cc/0x980) > [337685.609794] [<801bd9d4>] (generic_file_read_iter) from [<8020949c>] > (__vfs_read+0xf8/0x158) > [337685.616866] [<8020949c>] (__vfs_read) from [<80209588>] > (vfs_read+0x8c/0x118) > [337685.622717] [<80209588>] (vfs_read) from [<80209aac>] > (ksys_read+0x4c/0xac) > [337685.628396] [<80209aac>] (ksys_read) from [<80101000>] > (ret_fast_syscall+0x0/0x54) > [337685.634668] Exception stack(0xa4e79fa8 to 0xa4e79ff0) > [337685.638427] 9fa0: 00000008 00000000 00000003 7ed03208 > 00000200 00000000 > [337685.645318] 9fc0: 00000008 00000000 76fe8958 00000003 00000000 7ed03208 > 76fe5640 7ed031b4 > [337685.652205] 9fe0: 76fe7cf0 7ed0316c 76fbeee0 76fd1b1c > [337685.717080] ---[ end trace 660d072e57b3d168 ]--- > [337685.720698] page_copy 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c e6 > 9b ed l..@........\... > [337685.720713] page_copy 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 00 > 00 00 ................ > [337685.720721] *page 00000000: 6c 04 00 40 84 1f f4 ef dc aa c4 80 5c e6 > 9b ed l..@........\... > [337685.720728] *page 00000010: 00 00 00 00 00 00 00 00 ff ff ff ff 02 00 > 00 00 ................ > > The scenario with dropping caches via /proc/sys/vm/drop_caches seems to be > rarely used (and tested), however I suspect that during dropping caches, some > pages might be unmapped too early causing other hard-to notice problems. > > As my findings are quite worrisome, I'd be glad to hear if my analysis is > valid, and if optimization made by Eric can be considered 'the fix', or maybe > it is too optimistic. > Also as I'd like to investigate the issue further, I'd be glad for any hints > on where to continue. > > [1] https://lkml.org/lkml/2019/2/26/686, merged as 6daef95b8c91 in master > > Patch containing debug logs: > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 8be175df3075..ded7e073434c 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -815,14 +815,25 @@ bool _copy_from_iter_full_nocache(void *addr, size_t > bytes, struct iov_iter *i) > } > EXPORT_SYMBOL(_copy_from_iter_full_nocache); > > -static inline bool page_copy_sane(struct page *page, size_t offset, size_t n) > +static noinline bool page_copy_sane(struct page *page, size_t offset, size_t > n) > { > + struct page page_copy = *page; > struct page *head = compound_head(page); > - size_t v = n + offset + page_address(page) - page_address(head); > + size_t page_addr = (size_t) page_address(page); > + size_t head_addr = (size_t) page_address(head); > + size_t v = n + offset + page_addr - head_addr; > + unsigned int order = compound_order(head); > > - if (likely(n <= v && v <= (PAGE_SIZE << compound_order(head)))) > + if (likely(n <= v && v <= (PAGE_SIZE << order))) > return true; > - WARN_ON(1); > + WARN(1, "%s: page=%p, offset=%zu, n=%zu, v=%zu, head=%p, > page_address=%zu," > + " head_address=%zu, compound_order(head)=%u, > page_address(page)=%zu," > + " page_address(head)=%zu page_address(page_copy)=%zu, > flags(page)=%pGp, flags(page_copy)=%pGp\n", > + __func__, page, offset, n, v, head, page_addr, head_addr, order, > + (size_t) page_address(page), (size_t) page_address(head), > + (size_t) page_address(&page_copy), &page->flags, > &page_copy.flags); > + print_hex_dump_bytes("page_copy ", DUMP_PREFIX_OFFSET, &page_copy, > sizeof(page_copy)); > + print_hex_dump_bytes("*page ", DUMP_PREFIX_OFFSET, page, > sizeof(*page)); > return false; > } > > -- Pozdrawiam/With kind regards, Lech Perczak Sr. Software Engineer Camlin > Technologies Poland Limited Sp. z o.o. > -- Pozdrawiam/With kind regards, Lech Perczak
Sr. Software Engineer Camlin Technologies Poland Limited Sp. z o.o.