*** This bug is a duplicate of bug 1572291 ***
https://bugs.launchpad.net/bugs/1572291
------- Comment From [email protected] 2016-09-02 09:22 EDT-------
>From the dmesg it looks like this time ext4 page allocation stumbles upon the
>doubly freed page first, but it is immediately after the page got corrupted by
>the double free (indicated by the WARNING), so this just means that ext4
>happened to be the first to get its fingers on the corrupted page during a
>page alloc. It could hit anyone, and we also see later another occurrence
>where copy_pte_range() stumbles over another corrupted page (no WARNING before
>that because it is a WARN_ONCE).
We still need to find the root cause for the double free and the
resulting page corruption (count -1), and for that we only have the
WARNING trace as reliable hint for a double free. So my analysis from
comment #5 is still valid, even though this time genwqe itself is not
the one who stumbled over the corrupted page, it was still involved in
the double free (anyone can see the corrupted page afterwards, genwqe
was just a more likely candidate because it was an active consumer at
the time).
BTW, instead of "double free" of course a call of dma_free() on
previously unmapped addresses would result in the same issue, but a
double free is much more likely, e.g. caused by broken error handling
with "off by one" or other issues. Speaking of error handling, the
"genwqe 0001:00:00.0: [genwqe_map_pages] err: no dma addr
daddr=ffffffffffffffff!" messages may be a good starting point to verify
the genwqe error handling and the page freeing strategy. Those messages
by itself are no problem and even expected given the nature of the test
(online/offline and failing rpcit), but of course there is some error
handling involved which may have issues that could lead to a double
free.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1559194
Title:
Bad page state in process genwqe_gunzip pfn:3c275 in the genwqe device
driver
Status in Release Notes for Ubuntu:
Fix Released
Status in Ubuntu on IBM z Systems:
Fix Released
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Xenial:
Fix Released
Status in linux source package in Yakkety:
Fix Released
Bug description:
== Comment: #0 - Dmitry Gorbachev <[email protected]> - 2016-03-17
08:52:41 ==
An error occurs when running zEDC compression/decompression and hotplugging
PCI devices.
There was 1G of memory, 2 pci functions and 50 threads of gunzipping enabled.
Mar 14 23:59:01 s8330018 kernel: [ 4972.486883] BUG: Bad page state in
process genwqe_gunzip pfn:3c275
Mar 14 23:59:01 s8330018 kernel: [ 4972.486888] page:000003d100f09d40
count:-1 mapcount:0 mapping: (null) index:0x0
Mar 14 23:59:01 s8330018 kernel: [ 4972.486891] flags: 0x0()
Mar 14 23:59:01 s8330018 kernel: [ 4972.486895] page dumped because: nonzero
_count
Mar 14 23:59:01 s8330018 kernel: [ 4972.486897] Modules linked in:
xt_CHECKSUM(E) iptable_mangle(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E)
iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E)
nf_conntrack(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) iptable_filter(E)
ip_tables(E) x_tables(E) genwqe_card(E) crc_itu_t(E) qeth_l2(E) qeth(E) vmur(E)
ccwgroup(E) dm_multipath(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_sa(E)
ib_mad(E) ib_core(E) ib_addr(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E)
scsi_transport_iscsi(E) btrfs(E) zlib_deflate(E) raid10(E) raid456(E)
async_memcpy(E) async_raid6_recov(E) async_pq(E) async_xor(E) async_tx(E)
xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) linear(E) ghash_s390(E)
prng(E) aes_s390(E) des_s390(E) des_generic(E) sha512_s390(E) sha256_s390(E)
sha1_s390(E) sha_common(E) zfcp(E) qdio(E) scsi_transport_fc(E)
dasd_eckd_mod(E) dasd_mod(E)
Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] CPU: 0 PID: 37867 Comm:
genwqe_gunzip Tainted: G W E 4.4.0-8-generic #23-Ubuntu
Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 00000000209176f8
0000000020917788 0000000000000002 0000000000000000
Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 0000000020917828
00000000209177a0 00000000209177a0 0000000000114182
Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 0000000000000011
000000000092345a 000003d10000000a 000000000000000a
Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 00000000209177e8
0000000020917788 0000000000000000 0000000020914000
Mar 14 23:59:01 s8330018 kernel: [ 4972.486916] 0000000000000000
0000000000114182 0000000020917788 00000000209177e8
Mar 14 23:59:01 s8330018 kernel: [ 4972.486922] Call Trace:
Mar 14 23:59:01 s8330018 kernel: [ 4972.486927] ([<000000000011406e>]
show_trace+0xf6/0x148)
Mar 14 23:59:01 s8330018 kernel: [ 4972.486929] [<0000000000114136>]
show_stack+0x76/0xe8
Mar 14 23:59:01 s8330018 kernel: [ 4972.486934] [<0000000000518c26>]
dump_stack+0x6e/0x90
Mar 14 23:59:01 s8330018 kernel: [ 4972.486937] [<000000000027c376>]
bad_page+0xe6/0x148
Mar 14 23:59:01 s8330018 kernel: [ 4972.486938] [<0000000000280516>]
get_page_from_freelist+0x49e/0xba8
Mar 14 23:59:01 s8330018 kernel: [ 4972.486940] [<0000000000280ede>]
__alloc_pages_nodemask+0x166/0xb00
Mar 14 23:59:01 s8330018 kernel: [ 4972.486941] [<000000000015635a>]
s390_dma_alloc+0x82/0x1a0
Mar 14 23:59:01 s8330018 kernel: [ 4972.486944] [<000003ff805ea142>]
__genwqe_alloc_consistent+0x7a/0x90 [genwqe_card]
Mar 14 23:59:01 s8330018 kernel: [ 4972.486947] [<000003ff805ea344>]
genwqe_alloc_sync_sgl+0x17c/0x2e0 [genwqe_card]
Mar 14 23:59:01 s8330018 kernel: [ 4972.486950] [<000003ff805e52da>]
do_execute_ddcb+0x1da/0x348 [genwqe_card]
Mar 14 23:59:01 s8330018 kernel: [ 4972.486952] [<000003ff805e5964>]
genwqe_ioctl+0x51c/0xc20 [genwqe_card]
Mar 14 23:59:01 s8330018 kernel: [ 4972.486953] [<00000000003145ee>]
do_vfs_ioctl+0x3b6/0x518
Mar 14 23:59:01 s8330018 kernel: [ 4972.486955] [<00000000003147f4>]
SyS_ioctl+0xa4/0xb8
Mar 14 23:59:01 s8330018 kernel: [ 4972.486956] [<00000000007ad1be>]
system_call+0xd6/0x264
Mar 14 23:59:01 s8330018 kernel: [ 4972.486957] [<000003ffa9df2492>]
0x3ffa9df2492
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/1559194/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp