Hi folks, After building myself a 2.6.30 kernel with the iop dma patches for my Thecus, I started seeing reproducible kernel oopses on large NFS transfers, such as the one included below. After some prodding at the source for other uses of down_read(), I concluded that we're not supposed to call down_read(¤t->mm->mmap_sem) if in_atomic() is true. Updating the dma1 patch from http://people.debian.org/~tbm/dma/dma-patch to the attached appears to have fixed the problem for me, giving me a stable DMA-enabled squeeze kernel.
Cheers, -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developer http://www.debian.org/ slanga...@ubuntu.com vor...@debian.org Sep 16 23:27:57 becquer kernel: [100829.520000] Unable to handle kernel NULL pointer dereference at virtual address 00000034 Sep 16 23:27:57 becquer kernel: [100829.520000] pgd = c0004000 Sep 16 23:27:57 becquer kernel: [100829.520000] [00000034] *pgd=00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] Internal error: Oops: 17 [#1] Sep 16 23:27:57 becquer kernel: [100829.520000] Modules linked in: des_generic cbc rpcsec_gss_krb5 nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss ipv6 sunrpc ext4 jbd2 crc16 ext2 loop evdev ehci_hcd uhci_hcd usbcore r8169 mii ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod crc_t10dif sata_sil libata scsi_mod Sep 16 23:27:57 becquer kernel: [100829.520000] CPU: 0 Not tainted (2.6.30-1-iop32x #1) Sep 16 23:27:57 becquer kernel: [100829.520000] PC is at __down_read+0x20/0xe0 Sep 16 23:27:57 becquer kernel: [100829.520000] LR is at down_read+0x10/0x14 Sep 16 23:27:57 becquer kernel: [100829.520000] pc : [<c020ba64>] lr : [<c020af4c>] psr: 80000093 Sep 16 23:27:57 becquer kernel: [100829.520000] sp : c4f05bf8 ip : 00000034 fp : c4f05c1c Sep 16 23:27:57 becquer kernel: [100829.520000] r10: c7115000 r9 : c7b3b064 r8 : 00000001 Sep 16 23:27:57 becquer kernel: [100829.520000] r7 : 00000e3c r6 : 00000001 r5 : 00000000 r4 : c4f04000 Sep 16 23:27:57 becquer kernel: [100829.520000] r3 : 80000093 r2 : 80000013 r1 : c4f05c40 r0 : 00000034 Sep 16 23:27:57 becquer kernel: [100829.520000] Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel Sep 16 23:27:57 becquer kernel: [100829.520000] Control: 0000397f Table: a3660000 DAC: 00000017 Sep 16 23:27:57 becquer kernel: [100829.520000] Process nfsd (pid: 2078, stack limit = 0xc4f04268) Sep 16 23:27:57 becquer kernel: [100829.520000] Stack: (0xc4f05bf8 to 0xc4f06000) Sep 16 23:27:57 becquer kernel: [100829.520000] 5be0: c305c320 007d8f02 Sep 16 23:27:57 becquer kernel: [100829.520000] 5c00: c4f05c4c c4f05c10 c4f04000 00000000 c4f05c2c c4f05c20 c020af4c c020ba50 Sep 16 23:27:57 becquer kernel: [100829.520000] 5c20: c4f05c9c c4f05c30 c0182120 c020af48 00000000 00000008 00000001 c02d37f4 Sep 16 23:27:57 becquer kernel: [100829.520000] 5c40: c4f05c64 c4f05c50 c018a4a8 c00964c8 00000000 c4f05c40 c4f05c7c c4f05c68 Sep 16 23:27:57 becquer kernel: [100829.520000] 5c60: c0190ce4 c736fdd8 c736fe50 c4f04000 c4f05cb4 00000e3c 00000000 c730c52c Sep 16 23:27:57 becquer kernel: [100829.520000] 5c80: 00000e3c c730c52c c4f04000 c7b3b064 c4f05ccc c4f05ca0 c018cb60 c0182004 Sep 16 23:27:57 becquer kernel: [100829.520000] 5ca0: c79ac8b4 00000e3c 00000e3c 00000000 00000000 00000e3c c79ac8b4 c1c09cc0 Sep 16 23:27:57 becquer kernel: [100829.520000] 5cc0: c4f05d04 c4f05cd0 c018d33c c018cafc c79ac8b4 c730c52c c4f05d04 c1c09cc0 Sep 16 23:27:57 becquer kernel: [100829.520000] 5ce0: c79ac620 00000e3c 00000000 00000000 c79ac8b4 0000408c c4f05d74 c4f05d08 Sep 16 23:27:57 becquer kernel: [100829.520000] 5d00: c01c269c c018d2f0 00000000 00000000 c4f05edc 00000000 00000001 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5d20: 00000000 c4f04000 c79ac684 c79ac8d0 c79ac6b8 c79ac990 00004000 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5d40: 00000000 c4f05d8c c743aba0 c02cadec c4f05edc c4f05dfc c7867880 c743aba0 Sep 16 23:27:57 becquer kernel: [100829.520000] 5d60: c4f05edc 00000040 c4f05da4 c4f05d78 c01852a0 c01c21c0 00000040 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5d80: c4f05d8c c4f05eb0 c4f05da4 00000000 00000000 00000000 c4f05eb4 c4f05da8 Sep 16 23:27:57 becquer kernel: [100829.520000] 5da0: c018383c c0185260 00000040 0000008c 00001000 0000408c 0000008c c723042c Sep 16 23:27:57 becquer kernel: [100829.520000] 5dc0: 00000000 00000001 ffffffff 00000000 00000000 00000000 00000000 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5de0: c7867880 c4f05da4 00000000 00000000 c4f05eb0 c7867880 c004e10c c4f05dfc Sep 16 23:27:57 becquer kernel: [100829.520000] 5e00: c4f05dfc c74b091c c009a7c4 c730c000 c4f05e48 c723042c 00000018 c730c0f0 Sep 16 23:27:57 becquer kernel: [100829.520000] 5e20: c730c000 00000000 c4f05e4c c4f05e38 c01826b8 00005bb4 00000000 c4f05e48 Sep 16 23:27:57 becquer kernel: [100829.520000] 5e40: c4f05f24 c02b3ba0 c78678b0 c7867880 00000040 0000408c c743aba0 c7867a40 Sep 16 23:27:57 becquer kernel: [100829.520000] 5e60: 00000000 c4f05edc c00345a8 c0032c88 c7849870 c787eb50 00000000 c4f04000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5e80: c4f05ec4 c79ac620 c4f05ecc 00000017 00000000 c4f05edc 00000005 c730c52c Sep 16 23:27:57 becquer kernel: [100829.520000] 5ea0: c03daf00 c7230400 c4f05ecc c4f05eb8 c018389c c018377c 0000408c c730c000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5ec0: c4f05f1c c4f05ed0 bf1a88f4 c0183868 0000408c 00000040 c4f05f34 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5ee0: 00000000 c730c52c 00000005 00000000 00000000 00000040 00001000 c7230400 Sep 16 23:27:57 becquer kernel: [100829.520000] 5f00: c730c000 c02d96e8 c71b1520 c70f2d20 c4f05f5c c4f05f20 bf1a9aa0 bf1a889c Sep 16 23:27:57 becquer kernel: [100829.520000] 5f20: c4f05f5c c4f05f30 c020a5d8 c7230564 00000000 000176dc 009ee398 c730cd94 Sep 16 23:27:57 becquer kernel: [100829.520000] 5f40: c7230400 c730c000 00057e40 c723041c c4f05fb4 c4f05f60 bf1b2e30 bf1a9800 Sep 16 23:27:57 becquer kernel: [100829.520000] 5f60: c4f05fb4 c4f05f70 00057e40 c71b1520 0000408c 00000000 c7867880 c00352c0 Sep 16 23:27:57 becquer kernel: [100829.520000] 5f80: 00100100 00200200 c4f05fa4 0000408c c730c000 00000000 00000000 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5fa0: 00000000 00000000 c4f05fd4 c4f05fb8 bf2a49dc bf1b2710 c020a2a0 c4f04000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5fc0: c730c000 bf2a4928 c4f05ff4 c4f05fd8 c004dc74 bf2a4934 00000000 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] 5fe0: 00000000 00000000 00000000 c4f05ff8 c003cb54 c004dc1c 00000000 00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] Backtrace: Sep 16 23:27:57 becquer kernel: [100829.520000] [<c020ba44>] (__down_read+0x0/0xe0) from [<c020af4c>] (down_read+0x10/0x14) Sep 16 23:27:57 becquer kernel: [100829.520000] r5:00000000 r4:c4f04000 Sep 16 23:27:57 becquer kernel: [100829.520000] [<c020af3c>] (down_read+0x0/0x14) from [<c0182120>] (__try_polled_dma_copy_to_user+0x128/0x33c) Sep 16 23:27:57 becquer kernel: [100829.520000] [<c0181ff8>] (__try_polled_dma_copy_to_user+0x0/0x33c) from [<c018cb60>] (memcpy_toiovec+0x70/0xb8) Sep 16 23:27:57 becquer kernel: [100829.520000] [<c018caf0>] (memcpy_toiovec+0x0/0xb8) from [<c018d33c>] (skb_copy_datagram_iovec+0x58/0x1bc) Sep 16 23:27:57 becquer kernel: [100829.520000] [<c018d2e4>] (skb_copy_datagram_iovec+0x0/0x1bc) from [<c01c269c>] (tcp_recvmsg+0x4e8/0x7fc) Sep 16 23:27:57 becquer kernel: [100829.520000] [<c01c21b4>] (tcp_recvmsg+0x0/0x7fc) from [<c01852a0>] (sock_common_recvmsg+0x4c/0x60) Sep 16 23:27:57 becquer kernel: [100829.520000] [<c0185254>] (sock_common_recvmsg+0x0/0x60) from [<c018383c>] (sock_recvmsg+0xcc/0xec) Sep 16 23:27:57 becquer kernel: [100829.520000] r5:00000000 r4:00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] [<c0183770>] (sock_recvmsg+0x0/0xec) from [<c018389c>] (kernel_recvmsg+0x40/0x70) Sep 16 23:27:57 becquer kernel: [100829.520000] [<c018385c>] (kernel_recvmsg+0x0/0x70) from [<bf1a88f4>] (svc_recvfrom+0x64/0xa0 [sunrpc]) Sep 16 23:27:57 becquer kernel: [100829.520000] r5:c730c000 r4:0000408c Sep 16 23:27:57 becquer kernel: [100829.520000] [<bf1a8890>] (svc_recvfrom+0x0/0xa0 [sunrpc]) from [<bf1a9aa0>] (svc_tcp_recvfrom+0x2ac/0x3dc [sunrpc]) Sep 16 23:27:57 becquer kernel: [100829.520000] [<bf1a97f4>] (svc_tcp_recvfrom+0x0/0x3dc [sunrpc]) from [<bf1b2e30>] (svc_recv+0x72c/0x83c [sunrpc]) Sep 16 23:27:57 becquer kernel: [100829.520000] r8:c723041c r7:00057e40 r6:c730c000 r5:c7230400 r4:c730cd94 Sep 16 23:27:57 becquer kernel: [100829.520000] [<bf1b2704>] (svc_recv+0x0/0x83c [sunrpc]) from [<bf2a49dc>] (nfsd+0xb4/0x16c [nfsd]) Sep 16 23:27:57 becquer kernel: [100829.520000] [<bf2a4928>] (nfsd+0x0/0x16c [nfsd]) from [<c004dc74>] (kthread+0x64/0x9c) Sep 16 23:27:57 becquer kernel: [100829.520000] r6:bf2a4928 r5:c730c000 r4:c4f04000 Sep 16 23:27:57 becquer kernel: [100829.520000] [<c004dc10>] (kthread+0x0/0x9c) from [<c003cb54>] (do_exit+0x0/0x610) Sep 16 23:27:57 becquer kernel: [100829.520000] r6:00000000 r5:00000000 r4:00000000 Sep 16 23:27:57 becquer kernel: [100829.520000] Code: e1a0c000 e10f3000 e3833080 e121f003 (e5901000) Sep 16 23:27:57 becquer kernel: [100830.100000] ---[ end trace b27d7e5e844bde5a ]---
[ try-dma-copy-to-user.patch and convert-copy-to-user.patch ] async_tx: divert large copy_to_user calls to a dma engine This hack globally replaces calls to __copy_to_user with __try_polled_dma_copy_to_user. If the transfer length is above a configurable threshold an attempt is made to perform the operation using a hardware channel obtained from the async_tx subsystem. Memory is pinned via get_user_pages. If any errors occur when setting up the transfer the code falls back to the original software __copy_to_user implementation. The implementation is synchronous in that the copy is guaranteed complete before the routine returns. For simplicity and data integrity this feature is disabled for SMP and PREEMPT configurations. Also, the explicit rescheduling points in get_user_pages are disabled by a 'DMA' thread-information-flag. Signed-off-by: Dan Williams <dan.j.willi...@intel.com> diff -urN a/drivers/dma/Kconfig b/drivers/dma/Kconfig --- a/drivers/dma/Kconfig 2008-10-28 18:32:23.000000000 +0000 +++ b/drivers/dma/Kconfig 2008-10-28 18:33:41.000000000 +0000 @@ -116,4 +116,23 @@ Simple DMA test client. Say N unless you're debugging a DMA Device driver. +config POLLED_DMA_COPY_USER + bool "Perform copy_to_user with a DMA engine (polled)" + depends on DMA_ENGINE && !SMP && !PREEMPT + select ASYNC_MEMCPY + ---help--- + If a memory copy request is larger than the POLLED_DMA_COPY_USER_THRESHOLD + then async_tx will trap it and attempt to use a dma engine for the copy. + This operation is polled so it will not benefit CPU utilization. + say Y here. + +config POLLED_DMA_MEMCPY_THRESHOLD + int "Polled DMA memcpy threshold (bytes)" + depends on POLLED_DMA_COPY_USER + default "2048" + ---help--- + Minimum number of bytes that must be requested in a memcpy call before it + is handed to a DMA engine for processing. This does not affect code that + directly calls DMA memcpy routines. + endif diff -urN a/drivers/dma/Makefile b/drivers/dma/Makefile --- a/drivers/dma/Makefile 2008-10-28 18:32:23.000000000 +0000 +++ b/drivers/dma/Makefile 2008-10-28 18:33:53.000000000 +0000 @@ -8,3 +8,4 @@ obj-$(CONFIG_MV_XOR) += mv_xor.o obj-$(CONFIG_DW_DMAC) += dw_dmac.o obj-$(CONFIG_MX3_IPU) += ipu/ +obj-$(CONFIG_POLLED_DMA_COPY_USER) += copy_to_user.o diff -urN a/drivers/dma/copy_to_user.c b/drivers/dma/copy_to_user.c --- a/drivers/dma/copy_to_user.c 1970-01-01 00:00:00.000000000 +0000 +++ b/drivers/dma/copy_to_user.c 2008-10-28 18:33:41.000000000 +0000 @@ -0,0 +1,156 @@ +/* + * Copyright (c) 2007, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + */ +#include "copy_user.h" + +#undef __copy_to_user +extern unsigned long __must_check __copy_to_user(void __user *to, const void *from, unsigned long n); + +static unsigned long dma_copy_to_user_threshold = + CONFIG_POLLED_DMA_MEMCPY_THRESHOLD; + +/*=======================================================================*/ +/* Procedure: dma_copy_to_user() */ +/* */ +/* Description: DMA-based copy_to_user. */ +/* */ +/* Parameters: to: destination address */ +/* from: source address */ +/* n: number of bytes to transfer */ +/* */ +/* Returns: unsigned long: number of bytes NOT copied */ +/* */ +/* Notes/Assumptions: */ +/* Assumes that kernel physical memory is contiguous, i.e., */ +/* the physical addresses of contiguous virtual addresses */ +/* are also contiguous. */ +/* Assumes that kernel memory doesn't get paged. */ +/* Assumes that to/from memory regions cannot overlap */ +/* This code breaks a lot of Linux rules, but it has had */ +/* a long exposure to IOP end users */ +/* */ +/* History: Carl Staelin 1/27/03 Initial Creation */ +/* Dave Jiang 2/22/03 Attemped to DMA chaining */ +/* (data corrupt with full chain) */ +/* back to Carl's way for now */ +/* Dan Williams 3/12/07 Port to use dmaengine and iop-adma */ +/* Dan Williams 2/05/08 Convert to get_user_pages */ +/*=======================================================================*/ +unsigned long +__try_polled_dma_copy_to_user(void __user *to, const void *from, + unsigned long n) +{ + int atomic; + unsigned long virt_to = (unsigned long) to; + unsigned long virt_from = (unsigned long) from; + int nr_pages = 1 + ((n - 1 + offset_into_page(virt_to)) / PAGE_SIZE); + struct page *pages[nr_pages]; + dma_addr_t dma_from; + int nr_unmap; + unsigned long ret = 0; + int byte; + unsigned long _n; + struct dma_chan *chan = __async_tx_find_channel(NULL, DMA_MEMCPY); + struct dma_device *device = chan ? chan->device : NULL; + struct dma_async_tx_descriptor *tx = NULL; + struct dma_async_tx_descriptor *last_tx; + + atomic = in_atomic(); + + if (!access_ok(VERIFY_WRITE, to, n)) + return n; + + if (n < dma_copy_to_user_threshold || !chan) + return __copy_to_user(to, from, n); + + if (!(virt_addr_valid(from)) || + !(virt_addr_valid(from + n))) + return __copy_to_user(to, from, n); + + pr_debug("%s: %p -> %p (len: %lu) nr_pages: %d\n", + __FUNCTION__, from, to, n, nr_pages); + + /* let the dma copy proceed without rescheduling */ + set_thread_flag(TIF_DMA); + + /* pin pages down */ + if (!atomic) + down_read(¤t->mm->mmap_sem); + nr_unmap = get_user_pages( + current, + current->mm, + virt_to, + nr_pages, + 1, /* write */ + 0, /* force */ + pages, + NULL); + if (!atomic) + up_read(¤t->mm->mmap_sem); + + if (nr_unmap != nr_pages) { + ret = __copy_to_user(to, from, n); + goto unpin; + } + + dma_from = dma_map_single(NULL, (void *) virt_from, n, DMA_TO_DEVICE); + for (byte = 0, _n = n, last_tx = NULL; byte < n;) { + int len = min(PAGE_SIZE - offset_into_page(virt_to + byte), _n); + int idx = (byte + offset_into_page(virt_to)) >> PAGE_SHIFT; + dma_addr_t dma_to = dma_map_page(NULL, pages[idx], + offset_into_page(virt_to + byte), + len, DMA_FROM_DEVICE); + + tx = device->device_prep_dma_memcpy(chan, dma_to, + dma_from + byte, len, 0); + if (!tx) { + pr_debug("%s: no descriptors available\n", __FUNCTION__); + if (last_tx) + dma_wait_for_async_tx(last_tx); + ret = __copy_to_user(to + byte, from + byte, _n); + goto unpin; + } else + last_tx = tx; + + pr_debug("%s: submitting memcpy " + "virt (%p -> %p) dma (%#x -> %#x)\n", + __FUNCTION__, from + byte, to + byte, + dma_from + byte, dma_to); + + async_tx_submit(chan, tx, ASYNC_TX_ACK, NULL, NULL, NULL); + + _n -= len; + byte += len; + } + + if (tx) + dma_wait_for_async_tx(tx); + +unpin: + while (nr_unmap--) { + set_page_dirty_lock(pages[nr_unmap]); + page_cache_release(pages[nr_unmap]); + } + + clear_thread_flag(TIF_DMA); + if (!in_atomic()) + cond_resched(); + + return ret; +} +EXPORT_SYMBOL(__try_polled_dma_copy_to_user); + diff -urN a/drivers/dma/copy_user.h b/drivers/dma/copy_user.h --- a/drivers/dma/copy_user.h 1970-01-01 00:00:00.000000000 +0000 +++ b/drivers/dma/copy_user.h 2008-10-28 18:33:41.000000000 +0000 @@ -0,0 +1,26 @@ +/* + * Copyright (c) 2007, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + */ +#ifndef _COPY_USER_H_ +#define _COPY_USER_H_ +#include <linux/async_tx.h> +#include <linux/pagemap.h> +#include <asm/cacheflush.h> +#include <asm/uaccess.h> + +#define offset_into_page(x) ((x) & (PAGE_SIZE - 1)) +#endif diff -urN a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h --- a/arch/arm/include/asm/thread_info.h 2008-10-28 18:32:28.000000000 +0000 +++ b/arch/arm/include/asm/thread_info.h 2008-10-28 18:33:41.000000000 +0000 @@ -140,6 +140,7 @@ #define TIF_USING_IWMMXT 17 #define TIF_MEMDIE 18 #define TIF_FREEZE 19 +#define TIF_DMA 20 #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff -urN a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h --- a/arch/arm/include/asm/uaccess.h 2008-10-28 18:32:28.000000000 +0000 +++ b/arch/arm/include/asm/uaccess.h 2008-10-28 18:33:41.000000000 +0000 @@ -393,6 +393,12 @@ #define __clear_user(addr,n) (memset((void __force *)addr, 0, n), 0) #endif +#ifdef CONFIG_POLLED_DMA_COPY_USER +extern unsigned long __must_check __try_polled_dma_copy_to_user(void __user *to, const void *from, unsigned long n); + +#define __copy_to_user(to, from, n) __try_polled_dma_copy_to_user(to, from, n) +#endif + extern unsigned long __must_check __strncpy_from_user(char *to, const char __user *from, unsigned long count); extern unsigned long __must_check __strnlen_user(const char __user *s, long n); diff -urN a/include/linux/sched.h b/include/linux/sched.h --- a/include/linux/sched.h 2008-10-28 18:32:29.000000000 +0000 +++ b/include/linux/sched.h 2008-10-28 18:33:41.000000000 +0000 @@ -2202,7 +2202,8 @@ static inline int need_resched(void) { - return unlikely(test_thread_flag(TIF_NEED_RESCHED)); + return unlikely(test_thread_flag(TIF_NEED_RESCHED) && + !test_thread_flag(TIF_DMA)); } /*