Re: [PATCH] GBPAGES: Fix global bit for 64bit
Andi Kleen wrote: [Ideally apply before the patch to enable gbpages direct mapping] The gbpages direct patch assumed that __PAGE_KERNEL contains _PAGE_GLOBAL (I think because that was true at some point in git-x86 and i forgot to remove it again when forward porting) This is currently true on 32bit, but not on 64bit which does not make much sense. Add it to 64bit too. Also get rid of the obsolete MAKE_GLOBAL. Last time when my patch to do this was in the tree, it caused random things to fail, even after increasing the strength of various tlb flushes. Did you change something to fix this? J Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Index: linux/include/asm-x86/pgtable.h === --- linux.orig/include/asm-x86/pgtable.h +++ linux/include/asm-x86/pgtable.h @@ -69,11 +69,13 @@ #define _PAGE_KERNEL (_PAGE_KERNEL_EXEC | _PAGE_NX) #ifndef __ASSEMBLY__ +/* These are set up based on CPUID capabilities */ extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC; #endif /* __ASSEMBLY__ */ #else +/* 64bit can assume more CPUID capabilities */ #define __PAGE_KERNEL_EXEC \ - (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL) #define __PAGE_KERNEL (__PAGE_KERNEL_EXEC | _PAGE_NX) #endif @@ -86,22 +88,16 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KE #define __PAGE_KERNEL_LARGE(__PAGE_KERNEL | _PAGE_PSE) #define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE) -#ifdef CONFIG_X86_32 -# define MAKE_GLOBAL(x)__pgprot((x)) -#else -# define MAKE_GLOBAL(x)__pgprot((x) | _PAGE_GLOBAL) -#endif - -#define PAGE_KERNELMAKE_GLOBAL(__PAGE_KERNEL) -#define PAGE_KERNEL_RO MAKE_GLOBAL(__PAGE_KERNEL_RO) -#define PAGE_KERNEL_EXEC MAKE_GLOBAL(__PAGE_KERNEL_EXEC) -#define PAGE_KERNEL_RX MAKE_GLOBAL(__PAGE_KERNEL_RX) -#define PAGE_KERNEL_NOCACHEMAKE_GLOBAL(__PAGE_KERNEL_NOCACHE) -#define PAGE_KERNEL_EXEC_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_EXEC_NOCACHE) -#define PAGE_KERNEL_LARGE MAKE_GLOBAL(__PAGE_KERNEL_LARGE) -#define PAGE_KERNEL_LARGE_EXEC MAKE_GLOBAL(__PAGE_KERNEL_LARGE_EXEC) -#define PAGE_KERNEL_VSYSCALL MAKE_GLOBAL(__PAGE_KERNEL_VSYSCALL) -#define PAGE_KERNEL_VSYSCALL_NOCACHE MAKE_GLOBAL(__PAGE_KERNEL_VSYSCALL_NOCACHE) +#define PAGE_KERNEL__pgprot(__PAGE_KERNEL) +#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO) +#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC) +#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX) +#define PAGE_KERNEL_NOCACHE__pgprot(__PAGE_KERNEL_NOCACHE) +#define PAGE_KERNEL_EXEC_NOCACHE __pgprot(__PAGE_KERNEL_EXEC_NOCACHE) +#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE) +#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC) +#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL) +#define PAGE_KERNEL_VSYSCALL_NOCACHE __pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE) /* xwr */ #define __P000 PAGE_NONE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sparc compile error caused by x86 arch updates
Adrian Bunk wrote: On Thu, Jan 31, 2008 at 05:15:23PM +0100, Ingo Molnar wrote: * Adrian Bunk <[EMAIL PROTECTED]> wrote: On Thu, Jan 31, 2008 at 05:00:33PM +0100, Ingo Molnar wrote: * Adrian Bunk <[EMAIL PROTECTED]> wrote: You tested x86 but broke more than half a dozen other archtectures, with at least 3 different commits breaking other architectures. Note that all known breakages are fixed in current -git, except for the s390 problem that Martin/Nick posted the fix. What about the breakages caused by commit a5a19c63f4e55e32dc0bc3d936d7f94793d8b380 (this commit broke the defconfig compilation on at least avr32, blackfin, sh, sparc and uml)? the patch below fixes that. The sparc breakage (might not have been reported until now and I bisected it just a few minutes ago) is caused by the following part of commit a5a19c63f4e55e32dc0bc3d936d7f94793d8b380: --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #include Drat. I added that because of /* only sparc can not include linux/pagemap.h in this file * so leave page_cache_release and release_pages undeclared... */ #define free_page_and_swap_cache(page) \ page_cache_release(page) #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr), 0); But I guess I overlooked the comment... I guess the fix is to scatter linux/pagemap.h into the appropriate places where these macros are used (asm-generic/tlb.h, for a start). J The compile error with the sparc defconfig is: <-- snip --> ... CC init/main.o In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/highmem.h:24, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:10, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swap.h:9, from include2/asm/pgtable.h:15, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/mm.h:39, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-generic/dma-mapping.h:17, from include2/asm/dma-mapping.h:6, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/dma-mapping.h:52, from include2/asm/sbus.h:10, from include2/asm/dma.h:13, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/bootmem.h:8, from /home/bunk/linux/kernel-2.6/git/linux-2.6/init/main.c:26: include2/asm/highmem.h: In function 'kmap': include2/asm/highmem.h:60: error: implicit declaration of function 'PageHighMem' include2/asm/highmem.h:61: error: implicit declaration of function 'page_address' include2/asm/highmem.h:61: warning: return makes pointer from integer without a cast In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swap.h:9, from include2/asm/pgtable.h:15, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/mm.h:39, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-generic/dma-mapping.h:17, from include2/asm/dma-mapping.h:6, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/dma-mapping.h:52, from include2/asm/sbus.h:10, from include2/asm/dma.h:13, from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/bootmem.h:8, from /home/bunk/linux/kernel-2.6/git/linux-2.6/init/main.c:26: /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h: In function 'lock_page': /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:169: error: implicit declaration of function 'TestSetPageLocked' /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h: In function 'wait_on_page_locked': /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:199: error: implicit declaration of function 'PageLocked' /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:200: error: 'PG_locked' undeclared (first use in this function) /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:200: error: (Each undeclared identifier is reported only once /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:200: error: for each function it appears in.) /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h: In function 'wait_on_page_writeback': /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:208: error: implicit declaration of function 'PageWriteback' /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:209: error: 'PG_writeback' undeclared (first use in this function) In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-generic/dma-mapping.h:17, from include2/asm/dma-mapping.h:6,
Re: [git pull] x86 arch updates for v2.6.25
Ingo Molnar wrote: * Adrian Bunk <[EMAIL PROTECTED]> wrote: What about the breakages caused by commit a5a19c63f4e55e32dc0bc3d936d7f94793d8b380 (this commit broke the defconfig compilation on at least avr32, blackfin, sh, sparc and uml)? the patch below fixes that. Is it safe, or why did Jeremy state in the commit "I removed this include to avoid an include cycle"? that is an x86.git complication alone, and only affects 32-bit PAE: it is solved by the uninlining patch (that i've queued up to before the asm-generic/tlb.h revert/fix). Ingo -> Subject: x86: uninline __pte_free_tlb() and __pmd_free_tlb() From: Ingo Molnar <[EMAIL PROTECTED]> this also removes an include file dependency. Yes, that simplifies things a lot. J Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable_32.c | 19 +++ include/asm-x86/pgalloc_32.h | 19 ++- 2 files changed, 21 insertions(+), 17 deletions(-) Index: linux/arch/x86/mm/pgtable_32.c === --- linux.orig/arch/x86/mm/pgtable_32.c +++ linux/arch/x86/mm/pgtable_32.c @@ -376,3 +376,22 @@ void check_pgt_cache(void) { quicklist_trim(0, pgd_dtor, 25, 16); } + +void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) +{ + paravirt_release_pt(page_to_pfn(pte)); + tlb_remove_page(tlb, pte); +} + +void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) +{ + /* This is called just after the pmd has been detached from + the pgd, which requires a full tlb flush to be recognized + by the CPU. Rather than incurring multiple tlb flushes + while the address space is being pulled down, make the tlb + gathering machinery do a full flush when we're done. */ + tlb->fullmm = 1; + + paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); + tlb_remove_page(tlb, virt_to_page(pmd)); +} Index: linux/include/asm-x86/pgalloc_32.h === --- linux.orig/include/asm-x86/pgalloc_32.h +++ linux/include/asm-x86/pgalloc_32.h @@ -51,11 +51,7 @@ static inline void pte_free(struct page } -static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) -{ - paravirt_release_pt(page_to_pfn(pte)); - tlb_remove_page(tlb, pte); -} +extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte); #ifdef CONFIG_X86_PAE /* @@ -72,18 +68,7 @@ static inline void pmd_free(pmd_t *pmd) free_page((unsigned long)pmd); } -static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) -{ - /* This is called just after the pmd has been detached from - the pgd, which requires a full tlb flush to be recognized - by the CPU. Rather than incurring multiple tlb flushes - while the address space is being pulled down, make the tlb - gathering machinery do a full flush when we're done. */ - tlb->fullmm = 1; - - paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); - tlb_remove_page(tlb, virt_to_page(pmd)); -} +extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
x86.git: wants to build as ia64
Something in current x86.git is making my kernel build as ia64. I don't see any obvious change which might be the cause. : ezr:pts/6; make oldconfig make -C /home/jeremy/hg/xen/paravirt/linux O=/home/jeremy/hg/xen/paravirt/linux-i386/. oldconfig /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 16: ia64-linux-gcc: command not found /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 17: ia64-linux-objdump: command not found /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 19: [: !=: unary operator expected /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 30: ia64-linux-gcc: command not found /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 31: ia64-linux-readelf: command not found /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/check-gas: line 7: ia64-linux-gcc: command not found /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/check-gas: line 8: ia64-linux-objdump: command not found /home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/check-gas: line 10: [: !=: unary operator expected /home/jeremy/hg/xen/paravirt/linux/scripts/gcc-version.sh: line 25: ia64-linux-gcc: command not found /home/jeremy/hg/xen/paravirt/linux/scripts/gcc-version.sh: line 26: ia64-linux-gcc: command not found GEN /home/jeremy/hg/xen/paravirt/linux-i386/Makefile scripts/kconfig/conf -o arch/ia64/Kconfig drivers/net/Kconfig:1743:warning: 'select' used by config symbol 'CPMAC' refers to undefined symbol 'FIXED_MII_100_FDX' drivers/spi/Kconfig:156:warning: 'select' used by config symbol 'SPI_PXA2XX' refers to undefined symbol 'PXA_SSP' .config:12:warning: trying to assign nonexistent symbol GENERIC_CMOS_UPDATE .config:13:warning: trying to assign nonexistent symbol CLOCKSOURCE_WATCHDOG .config:14:warning: trying to assign nonexistent symbol GENERIC_CLOCKEVENTS .config:15:warning: trying to assign nonexistent symbol GENERIC_CLOCKEVENTS_BROADCAST .config:18:warning: trying to assign nonexistent symbol SEMAPHORE_SLEEPERS .config:22:warning: trying to assign nonexistent symbol GENERIC_ISA_DMA .config:25:warning: trying to assign nonexistent symbol GENERIC_HWEIGHT .config:29:warning: trying to assign nonexistent symbol RWSEM_GENERIC_SPINLOCK .config:37:warning: trying to assign nonexistent symbol ZONE_DMA32 .config:43:warning: trying to assign nonexistent symbol X86_SMP .config:44:warning: trying to assign nonexistent symbol X86_32_SMP ... J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sparc compile error caused by x86 arch updates
Ingo Molnar wrote: * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: But I guess I overlooked the comment... I guess the fix is to scatter linux/pagemap.h into the appropriate places where these macros are used (asm-generic/tlb.h, for a start). no. The fix is always to undo the damage ASAP, to keep the window of breakage minimized. Yes, sorry about that. Uninlining the asm-x86/pgalloc.h functions is the right thing to do anyway. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04 of 11] x86: fix early_ioremap pagetable ops
Ian Campbell wrote: This seems to have ended up in f6df72e71eba621b2f5c49b3a763116fac748f6e as: + paravirt_release_pt(__pa(pmd) >> PAGE_SHIFT); and the pmd_populate_kernel hunk is missing altogether. --- >From bfa2a08064a269dd7906ed5f60e436360e1360e7 Mon Sep 17 00:00:00 2001 From: Ian Campbell <[EMAIL PROTECTED]> Date: Thu, 31 Jan 2008 18:56:06 + Subject: [PATCH] x86: fix early_ioremap pagetable ops for paravirt. Some important parts of f6df72e71eba621b2f5c49b3a763116fac748f6e got dropped along the way, reintroduce them. Yep. Acked-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Signed-off-by: Ian Campbell <[EMAIL PROTECTED]> --- arch/x86/mm/ioremap.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index ed4208e..93d931e 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -302,7 +302,7 @@ void __init early_ioremap_init(void) pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)); memset(bm_pte, 0, sizeof(bm_pte)); - set_pmd(pmd, __pmd(__pa(bm_pte) | _PAGE_TABLE)); + pmd_populate_kernel(&init_mm, pmd, bm_pte); /* * The boot-ioremap range spans multiple pmds, for which @@ -332,7 +332,7 @@ void __init early_ioremap_clear(void) pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)); pmd_clear(pmd); - paravirt_release_pt(__pa(pmd) >> PAGE_SHIFT); + paravirt_release_pt(__pa(bm_pte) >> PAGE_SHIFT); __flush_tlb_all(); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04 of 11] x86: fix early_ioremap pagetable ops
Ingo Molnar wrote: * Ian Campbell <[EMAIL PROTECTED]> wrote: Some important parts of f6df72e71eba621b2f5c49b3a763116fac748f6e got dropped along the way, reintroduce them. thanks, applied. AFAICS it should only affect paravirt, not the native kernel, right? Correct. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
x86: PAE swapper_pg_dir needs to be page-sized
Xen currently needs swapper_pg_dir page aligned and sized. This fixes the second part of that... Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/kernel/head_32.S |1 + 1 file changed, 1 insertion(+) === --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -640,6 +640,7 @@ # else # error "Kernel PMDs should be 1, 2 or 3" # endif + .align PAGE_SIZE_asm/* needs to be page-sized too */ #endif .data -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86: PAE swapper_pg_dir needs to be page-sized
Ingo Molnar wrote: thanks, applied. I'm wondering, where did we break that? In the "PAE from boot" patch, I would guess. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0 of 4] x86: cleanups from pmd lifetime series
Ingo Molnar wrote: FYI, only this one applied to the latest x86.git tree, could you please resend? I guess the pgalloc.h related revert interfered. OK. I'll do a quick rebase to this morning's tree and resend. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0 of 4] x86: cleanups from pmd lifetime series
Hi Ingo, Here's a followup set from that last batch of patches: 1. fix up the pgd_ctor merge, so that non-PAE will end up getting kernel mappings 2. revert "optimise-pud_clear-cr3-reload" 3. only do a cr3 reload if pud_clear is being used on the active pagetable 4. update documentation about PAE tlb flushing. The third of these makes pud_clear more robust, since it doesn't rely on it being followed by the right kind of TLB flush. In practice it shouldn't make any performance difference, since the only performance critical paths pud_clear is used on are exit and execve, and they both operate on some other pagetable at the time the old pagetable is being pulled down. It will generate TLB flushes in the case of a usermode process munmapping a 1+G chunk of its address space, or something to do with unsharing a hugetlbfs mapping. I don't think either of these are performance critical. Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary
Rather than unconditionally reloading cr3, only do so if the pud we're updating is within the active pgd. This eliminates TLB flushes most of the time. The performance-critical uses of pud_clear are during execve and exit, but in those cases cr3 is referring to some other pagetable. The only other use of pud_clear is during a large (1Gbyte+) munmap, and those are sufficiently rare that a couple of cr3 reloads won't hurt. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- include/asm-x86/pgtable-3level.h | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h --- a/include/asm-x86/pgtable-3level.h +++ b/include/asm-x86/pgtable-3level.h @@ -93,17 +93,20 @@ static inline void pud_clear(pud_t *pudp) { + unsigned long pgd; + set_pud(pudp, __pud(0)); /* * Pentium-II erratum A13: in PAE mode we explicitly have to flush * the TLB via cr3 if the top-level pgd is changed... * -* XXX I don't think we need to worry about this here, since -* when clearing the pud, the calling code needs to flush the -* tlb anyway. But do it now for safety's sake. - jsgf +* Make sure the pud entry we're updating is within the +* current pgd to avoid unnecessary TLB flushes. */ - write_cr3(read_cr3()); + pgd = read_cr3(); + if (__pa(pudp) >= pgd && __pa(pudp) < (pgd + sizeof(pgd_t)*PTRS_PER_PGD)) + write_cr3(pgd); } #define pud_page(pud) \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4 of 4] x86: update reference for PAE tlb flushing
Remove bogus reference to "Pentium-II erratum A13" and point to the actual canonical source of information about what requirements x86 processors have for PAE pagetable updates. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- include/asm-x86/pgalloc_32.h |6 -- include/asm-x86/pgtable-3level.h |6 -- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h --- a/include/asm-x86/pgalloc_32.h +++ b/include/asm-x86/pgalloc_32.h @@ -80,8 +80,10 @@ set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT)); /* -* Pentium-II erratum A13: in PAE mode we explicitly have to flush -* the TLB via cr3 if the top-level pgd is changed... +* According to Intel App note "TLBs, Paging-Structure Caches, +* and Their Invalidation", April 2007, document 317080-001, +* section 8.1: in PAE mode we explicitly have to flush the +* TLB via cr3 if the top-level pgd is changed... */ if (mm == current->active_mm) write_cr3(read_cr3()); diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h --- a/include/asm-x86/pgtable-3level.h +++ b/include/asm-x86/pgtable-3level.h @@ -98,8 +98,10 @@ set_pud(pudp, __pud(0)); /* -* Pentium-II erratum A13: in PAE mode we explicitly have to flush -* the TLB via cr3 if the top-level pgd is changed... +* According to Intel App note "TLBs, Paging-Structure Caches, +* and Their Invalidation", April 2007, document 317080-001, +* section 8.1: in PAE mode we explicitly have to flush the +* TLB via cr3 if the top-level pgd is changed... * * Make sure the pud entry we're updating is within the * current pgd to avoid unnecessary TLB flushes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor
The constructors for PAE and non-PAE pgd_ctors are more or less identical, and can be made into the same function. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: William Irwin <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable_32.c | 58 +- 1 file changed, 22 insertions(+), 36 deletions(-) diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -219,50 +219,39 @@ list_del(&page->lru); } +#define UNSHARED_PTRS_PER_PGD \ + (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD) - -#if (PTRS_PER_PMD == 1) -/* Non-PAE pgd constructor */ -static void pgd_ctor(void *pgd) +static void pgd_ctor(void *p) { + pgd_t *pgd = p; unsigned long flags; - /* !PAE, no pagetable sharing */ + /* Clear usermode parts of PGD */ memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t)); spin_lock_irqsave(&pgd_lock, flags); - /* must happen under lock */ - clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD, - swapper_pg_dir + USER_PTRS_PER_PGD, - KERNEL_PGD_PTRS); - paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT, - __pa(swapper_pg_dir) >> PAGE_SHIFT, - USER_PTRS_PER_PGD, + /* If the pgd points to a shared pagetable level (either the + ptes in non-PAE, or shared PMD in PAE), then just copy the + references from swapper_pg_dir. */ + if (PAGETABLE_LEVELS == 2 || + (PAGETABLE_LEVELS == 3 && SHARED_KERNEL_PMD)) { + clone_pgd_range(pgd + USER_PTRS_PER_PGD, + swapper_pg_dir + USER_PTRS_PER_PGD, KERNEL_PGD_PTRS); - pgd_list_add(pgd); + paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT, + __pa(swapper_pg_dir) >> PAGE_SHIFT, + USER_PTRS_PER_PGD, + KERNEL_PGD_PTRS); + } + + /* list required to sync kernel mapping updates */ + if (!SHARED_KERNEL_PMD) + pgd_list_add(pgd); + spin_unlock_irqrestore(&pgd_lock, flags); } -#else /* PTRS_PER_PMD > 1 */ -/* PAE pgd constructor */ -static void pgd_ctor(void *pgd) -{ - /* PAE, kernel PMD may be shared */ - - if (SHARED_KERNEL_PMD) { - clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD, - swapper_pg_dir + USER_PTRS_PER_PGD, - KERNEL_PGD_PTRS); - } else { - unsigned long flags; - - memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t)); - spin_lock_irqsave(&pgd_lock, flags); - pgd_list_add(pgd); - spin_unlock_irqrestore(&pgd_lock, flags); - } -} -#endif /* PTRS_PER_PMD */ static void pgd_dtor(void *pgd) { @@ -275,9 +264,6 @@ pgd_list_del(pgd); spin_unlock_irqrestore(&pgd_lock, flags); } - -#define UNSHARED_PTRS_PER_PGD \ - (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD) #ifdef CONFIG_X86_PAE /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2 of 4] x86: revert "defer cr3 reload when doing pud_clear()"
Revert "defer cr3 reload when doing pud_clear()" since I'm going to replace it. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable_32.c |7 --- include/asm-x86/pgtable-3level.h | 21 ++--- 2 files changed, 6 insertions(+), 22 deletions(-) diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -373,13 +373,6 @@ void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) { - /* This is called just after the pmd has been detached from - the pgd, which requires a full tlb flush to be recognized - by the CPU. Rather than incurring multiple tlb flushes - while the address space is being pulled down, make the tlb - gathering machinery do a full flush when we're done. */ - tlb->fullmm = 1; - paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); tlb_remove_page(tlb, virt_to_page(pmd)); } diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h --- a/include/asm-x86/pgtable-3level.h +++ b/include/asm-x86/pgtable-3level.h @@ -96,23 +96,14 @@ set_pud(pudp, __pud(0)); /* -* In principle we need to do a cr3 reload here to make sure -* the processor recognizes the changed pgd. In practice, all -* the places where pud_clear() gets called are followed by -* full tlb flushes anyway, so we can defer the cost here. +* Pentium-II erratum A13: in PAE mode we explicitly have to flush +* the TLB via cr3 if the top-level pgd is changed... * -* Specifically: -* -* mm/memory.c:free_pmd_range() - immediately after the -* pud_clear() it does a pmd_free_tlb(). We change the -* mmu_gather structure to do a full tlb flush (which has the -* effect of reloading cr3) when the pagetable free is -* complete. -* -* arch/x86/mm/hugetlbpage.c:huge_pmd_unshare() - the call to -* this is followed by a flush_tlb_range, which on x86 does a -* full tlb flush. +* XXX I don't think we need to worry about this here, since +* when clearing the pud, the calling code needs to flush the +* tlb anyway. But do it now for safety's sake. - jsgf */ + write_cr3(read_cr3()); } #define pud_page(pud) \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [12/12] GBPAGES: Switch direct mapping setup over to set_pte
Andi Kleen wrote: [Actually not needed for gbpages, but an indepedent, but related cleanup] Use set_pte() for setting up the 2MB pages in the direct mapping similar to what the earlier GBPAGES patches did for the 1GB PUDs. Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> --- arch/x86/mm/init_64.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) Index: linux/arch/x86/mm/init_64.c === --- linux.orig/arch/x86/mm/init_64.c +++ linux/arch/x86/mm/init_64.c @@ -289,7 +289,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned int i = pmd_index(address); for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) { - unsigned long entry; pmd_t *pmd = pmd_page + pmd_index(address); if (address >= end) { @@ -303,9 +302,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned if (pmd_val(*pmd)) continue; - entry = __PAGE_KERNEL_LARGE|_PAGE_GLOBAL|address; - entry &= __supported_pte_mask; - set_pmd(pmd, __pmd(entry)); + set_pte((pte_t *)pmd, + pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL_LARGE)); Why? 64-bit Xen will need this to be set_pmd if its an update to L2 of the table. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [12/12] GBPAGES: Switch direct mapping setup over to set_pte
Andi Kleen wrote: Why? 64-bit Xen will need this to be set_pmd if its an update to L2 of the table. Then change_page_attr() and hugepages will already not work because they both do exactly that. And I didn't want to duplicate this manual code for the GBpages case, so i changed it everywhere to the standard way. It's a bit moot because Xen doesn't support any kind of large page yet, but there has been some work in that area. The main problem with using set_pte is that Xen supports trap'n'emulate for pte-level accesses, but not for upper levels. Looks like you're right about the rest of cpa; may as well make it all consistent for now, and we can fix it later when the need arises. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3 of 5] x86/pgtable.h: demacro ptep_set_access_flags
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- include/asm-x86/pgtable.h | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -287,6 +287,8 @@ #define pte_update_defer(mm, addr, ptep) do { } while (0) #endif +#include + /* * We only update the dirty/accessed state if we set * the dirty bit by hand in the kernel, since the hardware @@ -295,16 +297,18 @@ * bit at the same time. */ #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS -#define ptep_set_access_flags(vma, address, ptep, entry, dirty) \ -({ \ - int __changed = !pte_same(*(ptep), entry); \ - if (__changed && dirty) { \ - *ptep = entry; \ - pte_update_defer((vma)->vm_mm, (address), (ptep)); \ - flush_tlb_page(vma, address); \ - } \ - __changed; \ -}) +static inline int ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep, + pte_t entry, int dirty) +{ + int changed = !pte_same(*ptep, entry); + if (changed && dirty) { + *ptep = entry; + pte_update_defer(vma->vm_mm, address, ptep); + flush_tlb_page(vma, address); + } + return changed; +} #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG #define ptep_test_and_clear_young(vma, addr, ptep) ({ \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0 of 5] x86: add alloc/release_pud; more demacroing
Hi Ingo, This series: 1. Renames the alloc/release_{pt,pd} calls to _pte, _pmd so that its clear what they operate on. 2. Adds alloc/release_pud, and puts calls in the appropriate places 3. Demacros some stuff in pgtable.h A bit eclectic, but all fairly straightforward (and no changes to non-x86 headers ;). Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2 of 5] x86: add pud_alloc for 4-level pagetables
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/kernel/paravirt.c |2 ++ arch/x86/mm/pgtable.c |1 + include/asm-x86/paravirt.h | 11 +++ include/asm-x86/pgalloc.h |3 +++ 4 files changed, 17 insertions(+) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -385,8 +385,10 @@ .alloc_pte = paravirt_nop, .alloc_pmd = paravirt_nop, .alloc_pmd_clone = paravirt_nop, + .alloc_pud = paravirt_nop, .release_pte = paravirt_nop, .release_pmd = paravirt_nop, + .release_pud = paravirt_nop, .set_pte = native_set_pte, .set_pte_at = native_set_pte_at, diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -34,6 +34,7 @@ #if PAGETABLE_LEVELS > 3 void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) { + paravirt_release_pud(__pa(pud) >> PAGE_SHIFT); tlb_remove_page(tlb, virt_to_page(pud)); } #endif /* PAGETABLE_LEVELS > 3 */ diff --git a/include/asm-x86/paravirt.h b/include/asm-x86/paravirt.h --- a/include/asm-x86/paravirt.h +++ b/include/asm-x86/paravirt.h @@ -223,8 +223,10 @@ void (*alloc_pte)(struct mm_struct *mm, u32 pfn); void (*alloc_pmd)(struct mm_struct *mm, u32 pfn); void (*alloc_pmd_clone)(u32 pfn, u32 clonepfn, u32 start, u32 count); + void (*alloc_pud)(struct mm_struct *mm, u32 pfn); void (*release_pte)(u32 pfn); void (*release_pmd)(u32 pfn); + void (*release_pud)(u32 pfn); /* Pagetable manipulation functions */ void (*set_pte)(pte_t *ptep, pte_t pteval); @@ -918,6 +920,15 @@ PVOP_VCALL1(pv_mmu_ops.release_pmd, pfn); } +static inline void paravirt_alloc_pud(struct mm_struct *mm, unsigned pfn) +{ + PVOP_VCALL2(pv_mmu_ops.alloc_pud, mm, pfn); +} +static inline void paravirt_release_pud(unsigned pfn) +{ + PVOP_VCALL1(pv_mmu_ops.release_pud, pfn); +} + #ifdef CONFIG_HIGHPTE static inline void *kmap_atomic_pte(struct page *page, enum km_type type) { diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h --- a/include/asm-x86/pgalloc.h +++ b/include/asm-x86/pgalloc.h @@ -11,8 +11,10 @@ #define paravirt_alloc_pte(mm, pfn) do { } while (0) #define paravirt_alloc_pmd(mm, pfn) do { } while (0) #define paravirt_alloc_pmd_clone(pfn, clonepfn, start, count) do { } while (0) +#define paravirt_alloc_pud(mm, pfn) do { } while (0) #define paravirt_release_pte(pfn) do { } while (0) #define paravirt_release_pmd(pfn) do { } while (0) +#define paravirt_release_pud(pfn) do { } while (0) #endif /* @@ -93,6 +95,7 @@ #if PAGETABLE_LEVELS > 3 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud) { + paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT); set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud))); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4 of 5] x86/pgtable.h: demacro ptep_test_and_clear_young
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- include/asm-x86/pgtable.h | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -311,15 +311,17 @@ } #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG -#define ptep_test_and_clear_young(vma, addr, ptep) ({ \ - int __ret = 0; \ - if (pte_young(*(ptep))) \ - __ret = test_and_clear_bit(_PAGE_BIT_ACCESSED, \ - &(ptep)->pte); \ - if (__ret) \ - pte_update((vma)->vm_mm, addr, ptep); \ - __ret; \ -}) +static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + int ret = 0; + if (pte_young(*ptep)) + ret = test_and_clear_bit(_PAGE_BIT_ACCESSED, +&ptep->pte); + if (ret) + pte_update(vma->vm_mm, addr, ptep); + return ret; +} #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH #define ptep_clear_flush_young(vma, address, ptep) \ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5 of 5] x86/pgtable.h: demacro ptep_clear_flush_young
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- include/asm-x86/pgtable.h | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -324,14 +324,15 @@ } #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH -#define ptep_clear_flush_young(vma, address, ptep) \ -({ \ - int __young;\ - __young = ptep_test_and_clear_young((vma), (address), (ptep)); \ - if (__young)\ - flush_tlb_page(vma, address); \ - __young;\ -}) +static inline int ptep_clear_flush_young(struct vm_area_struct *vma, +unsigned long address, pte_t *ptep) +{ + int young; + young = ptep_test_and_clear_young(vma, address, ptep); + if (young) + flush_tlb_page(vma, address); + return young; +} #define __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1 of 5] x86: rename paravirt_alloc_pt etc after the pagetable structure
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name of the appropriate pagetable level structure. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/kernel/paravirt.c | 10 +- arch/x86/kernel/vmi_32.c | 20 ++-- arch/x86/mm/init_32.c |6 +++--- arch/x86/mm/ioremap.c |2 +- arch/x86/mm/pageattr.c |2 +- arch/x86/mm/pgtable.c | 16 arch/x86/xen/enlighten.c | 30 +++--- include/asm-x86/paravirt.h | 32 include/asm-x86/pgalloc.h | 16 9 files changed, 67 insertions(+), 67 deletions(-) diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -382,11 +382,11 @@ .flush_tlb_single = native_flush_tlb_single, .flush_tlb_others = native_flush_tlb_others, - .alloc_pt = paravirt_nop, - .alloc_pd = paravirt_nop, - .alloc_pd_clone = paravirt_nop, - .release_pt = paravirt_nop, - .release_pd = paravirt_nop, + .alloc_pte = paravirt_nop, + .alloc_pmd = paravirt_nop, + .alloc_pmd_clone = paravirt_nop, + .release_pte = paravirt_nop, + .release_pmd = paravirt_nop, .set_pte = native_set_pte, .set_pte_at = native_set_pte_at, diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c --- a/arch/x86/kernel/vmi_32.c +++ b/arch/x86/kernel/vmi_32.c @@ -392,13 +392,13 @@ } #endif -static void vmi_allocate_pt(struct mm_struct *mm, u32 pfn) +static void vmi_allocate_pte(struct mm_struct *mm, u32 pfn) { vmi_set_page_type(pfn, VMI_PAGE_L1); vmi_ops.allocate_page(pfn, VMI_PAGE_L1, 0, 0, 0); } -static void vmi_allocate_pd(struct mm_struct *mm, u32 pfn) +static void vmi_allocate_pmd(struct mm_struct *mm, u32 pfn) { /* * This call comes in very early, before mem_map is setup. @@ -409,20 +409,20 @@ vmi_ops.allocate_page(pfn, VMI_PAGE_L2, 0, 0, 0); } -static void vmi_allocate_pd_clone(u32 pfn, u32 clonepfn, u32 start, u32 count) +static void vmi_allocate_pmd_clone(u32 pfn, u32 clonepfn, u32 start, u32 count) { vmi_set_page_type(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE); vmi_check_page_type(clonepfn, VMI_PAGE_L2); vmi_ops.allocate_page(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE, clonepfn, start, count); } -static void vmi_release_pt(u32 pfn) +static void vmi_release_pte(u32 pfn) { vmi_ops.release_page(pfn, VMI_PAGE_L1); vmi_set_page_type(pfn, VMI_PAGE_NORMAL); } -static void vmi_release_pd(u32 pfn) +static void vmi_release_pmd(u32 pfn) { vmi_ops.release_page(pfn, VMI_PAGE_L2); vmi_set_page_type(pfn, VMI_PAGE_NORMAL); @@ -871,15 +871,15 @@ vmi_ops.allocate_page = vmi_get_function(VMI_CALL_AllocatePage); if (vmi_ops.allocate_page) { - pv_mmu_ops.alloc_pt = vmi_allocate_pt; - pv_mmu_ops.alloc_pd = vmi_allocate_pd; - pv_mmu_ops.alloc_pd_clone = vmi_allocate_pd_clone; + pv_mmu_ops.alloc_pte = vmi_allocate_pte; + pv_mmu_ops.alloc_pmd = vmi_allocate_pmd; + pv_mmu_ops.alloc_pmd_clone = vmi_allocate_pmd_clone; } vmi_ops.release_page = vmi_get_function(VMI_CALL_ReleasePage); if (vmi_ops.release_page) { - pv_mmu_ops.release_pt = vmi_release_pt; - pv_mmu_ops.release_pd = vmi_release_pd; + pv_mmu_ops.release_pte = vmi_release_pte; + pv_mmu_ops.release_pmd = vmi_release_pmd; } /* Set linear is needed in all cases */ diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -68,7 +68,7 @@ if (!(pgd_val(*pgd) & _PAGE_PRESENT)) { pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE); - paravirt_alloc_pd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT); + paravirt_alloc_pmd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT); set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT)); pud = pud_offset(pgd, 0); BUG_ON(pmd_table != pmd_offset(pud, 0)); @@ -97,7 +97,7 @@ (pte_t *)alloc_bootmem_low_pages(PAGE_SIZE); } - paravirt_alloc_pt(&init_mm, __pa(page_table) >> PAGE_SHIFT); + paravirt_alloc_pte(&init_mm, __pa(page_table) >> PAGE_SHIFT); set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE)); BUG_ON(page_table != pte_offset_kernel(pmd, 0)); } @@ -374,7 +374,7 @@ pte_clear(NULL, va, pte); } - paravirt_alloc_pd(&init_mm, __pa(base) >> PAGE_SHIFT); + paravirt_alloc_pmd(&init_mm, __pa(base) >> PAGE_SHIFT); } void __init native_
[PATCH 0 of 7] x86: more pgalloc unification
Hi Ingo, This series does more unification of pgalloc, and creates a unified mm/pgtable.c for common pagetable functions. Ends up removing pgalloc_32/64.h in favour of pgalloc.h. [ I thought I'd mailed this earlier, but I don't see it on lkml. Maybe I created the mbox without sending it. Anyway, this should go before the set I just mailed. ] Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1 of 7] x86: convert pgalloc_64.h from macros to inlines
Convert asm-x86/pgalloc_64.h from macros into inline functions. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- include/asm-x86/pgalloc_64.h | 41 ++--- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -842,3 +842,18 @@ return 0; } #endif + +void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) +{ + tlb_remove_page(tlb, pte); +} + +void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) +{ + tlb_remove_page(tlb, virt_to_page(pmd)); +} + +void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) +{ + tlb_remove_page(tlb, virt_to_page(pud)); +} diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h --- a/include/asm-x86/pgalloc_64.h +++ b/include/asm-x86/pgalloc_64.h @@ -1,16 +1,24 @@ #ifndef _X86_64_PGALLOC_H #define _X86_64_PGALLOC_H -#include #include #include +#include -#define pmd_populate_kernel(mm, pmd, pte) \ - set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte))) -#define pud_populate(mm, pud, pmd) \ - set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd))) -#define pgd_populate(mm, pgd, pud) \ - set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud))) +static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) +{ + set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte))); +} + +static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) +{ + set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd))); +} + +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud) +{ + set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud))); +} static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte) { @@ -109,11 +117,10 @@ static inline void pte_free(struct page *pte) { __free_page(pte); -} +} -#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte)) - -#define __pmd_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x)) -#define __pud_free_tlb(tlb,x) tlb_remove_page((tlb),virt_to_page(x)) +extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte); +extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); +extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud); #endif /* _X86_64_PGALLOC_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3 of 7] x86: put paravirt stubs into common asm/pgalloc.h
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/pageattr.c |2 -- include/asm-x86/pgalloc.h| 11 +++ include/asm-x86/pgalloc_32.h | 10 -- 3 files changed, 11 insertions(+), 12 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -249,9 +249,7 @@ address = __pa(address); addr = address & LARGE_PAGE_MASK; pbase = (pte_t *)page_address(base); -#ifdef CONFIG_X86_32 paravirt_alloc_pt(&init_mm, page_to_pfn(base)); -#endif pgprot_val(ref_prot) &= ~_PAGE_NX; for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h --- a/include/asm-x86/pgalloc.h +++ b/include/asm-x86/pgalloc.h @@ -4,6 +4,16 @@ #include #include /* for struct page */ #include + +#ifdef CONFIG_PARAVIRT +#include +#else +#define paravirt_alloc_pt(mm, pfn) do { } while (0) +#define paravirt_alloc_pd(mm, pfn) do { } while (0) +#define paravirt_alloc_pd_clone(pfn, clonepfn, start, count) do { } while (0) +#define paravirt_release_pt(pfn) do { } while (0) +#define paravirt_release_pd(pfn) do { } while (0) +#endif /* * Allocate and free page tables. diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h --- a/include/asm-x86/pgalloc_32.h +++ b/include/asm-x86/pgalloc_32.h @@ -1,15 +1,5 @@ #ifndef _I386_PGALLOC_H #define _I386_PGALLOC_H - -#ifdef CONFIG_PARAVIRT -#include -#else -#define paravirt_alloc_pt(mm, pfn) do { } while (0) -#define paravirt_alloc_pd(mm, pfn) do { } while (0) -#define paravirt_alloc_pd_clone(pfn, clonepfn, start, count) do { } while (0) -#define paravirt_release_pt(pfn) do { } while (0) -#define paravirt_release_pd(pfn) do { } while (0) -#endif static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4 of 7] x86: move pte functions into common asm/pgalloc.h
Common definitions for 2-level pagetable functions. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable.c|6 ++ arch/x86/mm/pgtable_32.c |6 -- include/asm-x86/pgalloc.h| 16 include/asm-x86/pgalloc_32.h | 17 - include/asm-x86/pgalloc_64.h | 19 --- 5 files changed, 22 insertions(+), 42 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -843,11 +843,6 @@ } #endif -void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) -{ - tlb_remove_page(tlb, pte); -} - void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) { tlb_remove_page(tlb, virt_to_page(pmd)); diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -16,6 +16,12 @@ pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0); #endif return pte; +} + +void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) +{ + paravirt_release_pt(page_to_pfn(pte)); + tlb_remove_page(tlb, pte); } #ifdef CONFIG_X86_64 diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -178,12 +178,6 @@ __VMALLOC_RESERVE += reserve; } -void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte) -{ - paravirt_release_pt(page_to_pfn(pte)); - tlb_remove_page(tlb, pte); -} - #ifdef CONFIG_X86_PAE void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h --- a/include/asm-x86/pgalloc.h +++ b/include/asm-x86/pgalloc.h @@ -24,6 +24,22 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address); struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address); +/* Should really implement gc for free page table pages. This could be + done with a reference count in struct page. */ + +static inline void pte_free_kernel(pte_t *pte) +{ + BUG_ON((unsigned long)pte & (PAGE_SIZE-1)); + free_page((unsigned long)pte); +} + +static inline void pte_free(struct page *pte) +{ + __free_page(pte); +} + +extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte); + #ifdef CONFIG_X86_32 # include "pgalloc_32.h" #else diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h --- a/include/asm-x86/pgalloc_32.h +++ b/include/asm-x86/pgalloc_32.h @@ -15,23 +15,6 @@ paravirt_alloc_pt(mm, pfn); set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE)); } - -/* - * Allocate and free page tables. - */ - -static inline void pte_free_kernel(pte_t *pte) -{ - free_page((unsigned long)pte); -} - -static inline void pte_free(struct page *pte) -{ - __free_page(pte); -} - - -extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte); #ifdef CONFIG_X86_PAE /* diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h --- a/include/asm-x86/pgalloc_64.h +++ b/include/asm-x86/pgalloc_64.h @@ -45,21 +45,6 @@ free_page((unsigned long)pud); } -/* Should really implement gc for free page table pages. This could be - done with a reference count in struct page. */ - -static inline void pte_free_kernel(pte_t *pte) -{ - BUG_ON((unsigned long)pte & (PAGE_SIZE-1)); - free_page((unsigned long)pte); -} - -static inline void pte_free(struct page *pte) -{ - __free_page(pte); -} - -extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte); extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2 of 7] x86: add common mm/pgtable.c
Add a common arch/x86/mm/pgtable.c file for common pagetable functions. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/Makefile_32 |2 arch/x86/mm/Makefile_64 |2 arch/x86/mm/pgtable.c| 234 ++ arch/x86/mm/pgtable_32.c | 185 - include/asm-x86/pgalloc.h| 19 +++ include/asm-x86/pgalloc_32.h | 11 - include/asm-x86/pgalloc_64.h | 61 -- 7 files changed, 255 insertions(+), 259 deletions(-) diff --git a/arch/x86/mm/Makefile_32 b/arch/x86/mm/Makefile_32 --- a/arch/x86/mm/Makefile_32 +++ b/arch/x86/mm/Makefile_32 @@ -2,7 +2,7 @@ # Makefile for the linux i386-specific parts of the memory manager. # -obj-y := init_32.o pgtable_32.o fault.o ioremap.o extable.o pageattr.o mmap.o +obj-y := init_32.o pgtable.o pgtable_32.o fault.o ioremap.o extable.o pageattr.o mmap.o obj-$(CONFIG_NUMA) += discontig_32.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o diff --git a/arch/x86/mm/Makefile_64 b/arch/x86/mm/Makefile_64 --- a/arch/x86/mm/Makefile_64 +++ b/arch/x86/mm/Makefile_64 @@ -2,7 +2,7 @@ # Makefile for the linux x86_64-specific parts of the memory manager. # -obj-y := init_64.o fault.o ioremap.o extable.o pageattr.o mmap.o +obj-y := init_64.o fault.o ioremap.o extable.o pageattr.o pgtable.o mmap.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_NUMA) += numa_64.o obj-$(CONFIG_K8_NUMA) += k8topology_64.o diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c new file mode 100644 --- /dev/null +++ b/arch/x86/mm/pgtable.c @@ -0,0 +1,235 @@ +#include +#include + +pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) +{ + return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO); +} + +struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address) +{ + struct page *pte; + +#ifdef CONFIG_HIGHPTE + pte = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO, 0); +#else + pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0); +#endif + return pte; +} + +#ifdef CONFIG_X86_64 +static inline void pgd_list_add(pgd_t *pgd) +{ + struct page *page = virt_to_page(pgd); + + spin_lock(&pgd_lock); + list_add(&page->lru, &pgd_list); + spin_unlock(&pgd_lock); +} + +static inline void pgd_list_del(pgd_t *pgd) +{ + struct page *page = virt_to_page(pgd); + + spin_lock(&pgd_lock); + list_del(&page->lru); + spin_unlock(&pgd_lock); +} + +pgd_t *pgd_alloc(struct mm_struct *mm) +{ + unsigned boundary; + pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT); + if (!pgd) + return NULL; + pgd_list_add(pgd); + /* +* Copy kernel pointers in from init. +* Could keep a freelist or slab cache of those because the kernel +* part never changes. +*/ + boundary = pgd_index(__PAGE_OFFSET); + memset(pgd, 0, boundary * sizeof(pgd_t)); + memcpy(pgd + boundary, + init_level4_pgt + boundary, + (PTRS_PER_PGD - boundary) * sizeof(pgd_t)); + return pgd; +} + +void pgd_free(pgd_t *pgd) +{ + BUG_ON((unsigned long)pgd & (PAGE_SIZE-1)); + pgd_list_del(pgd); + free_page((unsigned long)pgd); +} + +#else +/* + * List of all pgd's needed for non-PAE so it can invalidate entries + * in both cached and uncached pgd's; not needed for PAE since the + * kernel pmd is shared. If PAE were not to share the pmd a similar + * tactic would be needed. This is essentially codepath-based locking + * against pageattr.c; it is the unique case in which a valid change + * of kernel pagetables can't be lazily synchronized by vmalloc faults. + * vmalloc faults work because attached pagetables are never freed. + * -- wli + */ +static inline void pgd_list_add(pgd_t *pgd) +{ + struct page *page = virt_to_page(pgd); + + list_add(&page->lru, &pgd_list); +} + +static inline void pgd_list_del(pgd_t *pgd) +{ + struct page *page = virt_to_page(pgd); + + list_del(&page->lru); +} + +#define UNSHARED_PTRS_PER_PGD \ + (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD) + +static void pgd_ctor(void *p) +{ + pgd_t *pgd = p; + unsigned long flags; + + /* Clear usermode parts of PGD */ + memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t)); + + spin_lock_irqsave(&pgd_lock, flags); + + /* If the pgd points to a shared pagetable level (either the + ptes in non-PAE, or shared PMD in PAE), then just copy the + references from swapper_pg_dir. */ + if (PAGETABLE_LEVELS == 2 || + (PAGETABLE_LEVELS == 3 && SHARED_KERNEL_PMD)) { + clone_pgd_range(pgd + USER_PTRS_PER_PGD, + swapper_pg_dir + USER_PTRS_PER_PGD, +
[PATCH 7 of 7] x86: move all the pgd_list handling to one place
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable.c | 24 +--- 1 file changed, 5 insertions(+), 19 deletions(-) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -39,32 +39,30 @@ #endif /* PAGETABLE_LEVELS > 3 */ #endif /* PAGETABLE_LEVELS > 2 */ -#ifdef CONFIG_X86_64 static inline void pgd_list_add(pgd_t *pgd) { struct page *page = virt_to_page(pgd); - spin_lock(&pgd_lock); list_add(&page->lru, &pgd_list); - spin_unlock(&pgd_lock); } static inline void pgd_list_del(pgd_t *pgd) { struct page *page = virt_to_page(pgd); - spin_lock(&pgd_lock); list_del(&page->lru); - spin_unlock(&pgd_lock); } +#ifdef CONFIG_X86_64 pgd_t *pgd_alloc(struct mm_struct *mm) { unsigned boundary; pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT); if (!pgd) return NULL; + spin_lock(&pgd_lock); pgd_list_add(pgd); + spin_unlock(&pgd_lock); /* * Copy kernel pointers in from init. * Could keep a freelist or slab cache of those because the kernel @@ -81,7 +79,9 @@ void pgd_free(pgd_t *pgd) { BUG_ON((unsigned long)pgd & (PAGE_SIZE-1)); + spin_lock(&pgd_lock); pgd_list_del(pgd); + spin_unlock(&pgd_lock); free_page((unsigned long)pgd); } @@ -96,20 +96,6 @@ * vmalloc faults work because attached pagetables are never freed. * -- wli */ -static inline void pgd_list_add(pgd_t *pgd) -{ - struct page *page = virt_to_page(pgd); - - list_add(&page->lru, &pgd_list); -} - -static inline void pgd_list_del(pgd_t *pgd) -{ - struct page *page = virt_to_page(pgd); - - list_del(&page->lru); -} - #define UNSHARED_PTRS_PER_PGD \ (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6 of 7] x86: move pud/pgd functions into common asm/pgalloc.h
Common definitions for 4-level pagetable functions. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable.c|7 ++ include/asm-x86/pgalloc.h| 46 -- include/asm-x86/pgalloc_32.h | 24 - include/asm-x86/pgalloc_64.h | 32 - 4 files changed, 47 insertions(+), 62 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -842,8 +842,3 @@ return 0; } #endif - -void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) -{ - tlb_remove_page(tlb, virt_to_page(pud)); -} diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -30,6 +30,13 @@ paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); tlb_remove_page(tlb, virt_to_page(pmd)); } + +#if PAGETABLE_LEVELS > 3 +void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) +{ + tlb_remove_page(tlb, virt_to_page(pud)); +} +#endif /* PAGETABLE_LEVELS > 3 */ #endif /* PAGETABLE_LEVELS > 2 */ #ifdef CONFIG_X86_64 diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h --- a/include/asm-x86/pgalloc.h +++ b/include/asm-x86/pgalloc.h @@ -69,12 +69,46 @@ } extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); + +static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd) +{ + paravirt_alloc_pd(mm, __pa(pmd) >> PAGE_SHIFT); + + /* Note: almost everything apart from _PAGE_PRESENT is + reserved at the pmd (PDPT) level. */ + set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT)); + +#ifdef CONFIG_X86_PAE + /* +* According to Intel App note "TLBs, Paging-Structure Caches, +* and Their Invalidation", April 2007, document 317080-001, +* section 8.1: in PAE mode we explicitly have to flush the +* TLB via cr3 if the top-level pgd is changed... +*/ + if (mm == current->active_mm) + write_cr3(read_cr3()); +#endif /* CONFIG_X86_PAE */ +} + +#if PAGETABLE_LEVELS > 3 +static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud) +{ + set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud))); +} + +static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) +{ + return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); +} + +static inline void pud_free(pud_t *pud) +{ + BUG_ON((unsigned long)pud & (PAGE_SIZE-1)); + free_page((unsigned long)pud); +} + +extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud); +#endif /* PAGETABLE_LEVELS > 3 */ #endif /* PAGETABLE_LEVELS > 2 */ -#ifdef CONFIG_X86_32 -# include "pgalloc_32.h" -#else -# include "pgalloc_64.h" -#endif - #endif /* _ASM_X86_PGALLOC_H */ diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h deleted file mode 100644 --- a/include/asm-x86/pgalloc_32.h +++ /dev/null @@ -1,24 +0,0 @@ -#ifndef _I386_PGALLOC_H -#define _I386_PGALLOC_H - -#ifdef CONFIG_X86_PAE -static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd) -{ - paravirt_alloc_pd(mm, __pa(pmd) >> PAGE_SHIFT); - - /* Note: almost everything apart from _PAGE_PRESENT is - reserved at the pmd (PDPT) level. */ - set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT)); - - /* -* According to Intel App note "TLBs, Paging-Structure Caches, -* and Their Invalidation", April 2007, document 317080-001, -* section 8.1: in PAE mode we explicitly have to flush the -* TLB via cr3 if the top-level pgd is changed... -*/ - if (mm == current->active_mm) - write_cr3(read_cr3()); -} -#endif /* CONFIG_X86_PAE */ - -#endif /* _I386_PGALLOC_H */ diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h deleted file mode 100644 --- a/include/asm-x86/pgalloc_64.h +++ /dev/null @@ -1,29 +0,0 @@ -#ifndef _X86_64_PGALLOC_H -#define _X86_64_PGALLOC_H - -#include - -static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) -{ - set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd))); -} - -static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud) -{ - set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud))); -} - -static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); -} - -static inline void pud_free (pud_t *pud) -{ - BUG_ON((unsigned long)pud & (PAGE_SIZE-1)); - free_page((unsigned long)pud); -} - -extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud); - -#endif /* _X86_64_PGALLOC_H */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5 of 7] x86: move pmd functions into common asm/pgalloc.h
Common definitions for 3-level pagetable functions. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable.c|8 arch/x86/mm/pgtable_32.c | 10 -- include/asm-x86/pgalloc.h| 31 +++ include/asm-x86/pgalloc_32.h | 31 --- include/asm-x86/pgalloc_64.h | 26 -- 5 files changed, 39 insertions(+), 67 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -843,11 +843,6 @@ } #endif -void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) -{ - tlb_remove_page(tlb, virt_to_page(pmd)); -} - void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) { tlb_remove_page(tlb, virt_to_page(pud)); diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -23,6 +23,14 @@ paravirt_release_pt(page_to_pfn(pte)); tlb_remove_page(tlb, pte); } + +#if PAGETABLE_LEVELS > 2 +void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) +{ + paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); + tlb_remove_page(tlb, virt_to_page(pmd)); +} +#endif /* PAGETABLE_LEVELS > 2 */ #ifdef CONFIG_X86_64 static inline void pgd_list_add(pgd_t *pgd) diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -177,13 +177,3 @@ __FIXADDR_TOP = -reserve - PAGE_SIZE; __VMALLOC_RESERVE += reserve; } - -#ifdef CONFIG_X86_PAE - -void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) -{ - paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT); - tlb_remove_page(tlb, virt_to_page(pmd)); -} - -#endif diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h --- a/include/asm-x86/pgalloc.h +++ b/include/asm-x86/pgalloc.h @@ -40,6 +40,37 @@ extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte); +static inline void pmd_populate_kernel(struct mm_struct *mm, + pmd_t *pmd, pte_t *pte) +{ + paravirt_alloc_pt(mm, __pa(pte) >> PAGE_SHIFT); + set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); +} + +static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, + struct page *pte) +{ + unsigned long pfn = page_to_pfn(pte); + + paravirt_alloc_pt(mm, pfn); + set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE)); +} + +#if PAGETABLE_LEVELS > 2 +static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) +{ + return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); +} + +static inline void pmd_free(pmd_t *pmd) +{ + BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); + free_page((unsigned long)pmd); +} + +extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); +#endif /* PAGETABLE_LEVELS > 2 */ + #ifdef CONFIG_X86_32 # include "pgalloc_32.h" #else diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h --- a/include/asm-x86/pgalloc_32.h +++ b/include/asm-x86/pgalloc_32.h @@ -1,38 +1,7 @@ #ifndef _I386_PGALLOC_H #define _I386_PGALLOC_H -static inline void pmd_populate_kernel(struct mm_struct *mm, - pmd_t *pmd, pte_t *pte) -{ - paravirt_alloc_pt(mm, __pa(pte) >> PAGE_SHIFT); - set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); -} - -static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte) -{ - unsigned long pfn = page_to_pfn(pte); - - paravirt_alloc_pt(mm, pfn); - set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE)); -} - #ifdef CONFIG_X86_PAE -/* - * In the PAE case we free the pmds as part of the pgd. - */ -static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) -{ - return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); -} - -static inline void pmd_free(pmd_t *pmd) -{ - BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); - free_page((unsigned long)pmd); -} - -extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd); - static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd) { paravirt_alloc_pd(mm, __pa(pmd) >> PAGE_SHIFT); diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h --- a/include/asm-x86/pgalloc_64.h +++ b/include/asm-x86/pgalloc_64.h @@ -2,11 +2,6 @@ #define _X86_64_PGALLOC_H #include - -static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t *pte) -{ - set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte))); -} static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) { @@ -16,22 +11,6 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud) { set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud))); -} - -stati
Re: [PATCH 3 of 5] x86/pgtable.h: demacro ptep_set_access_flags
Ingo Molnar wrote: another thing: these inlines are a bit fat and they are used in more than one place. Please move them into pgtable.c. The rule of thumb is: if an inline is more than 2 lines big, it is a likely candidate for uninlining. (and even many 2-liners, and even some 1-liners are candidates) Especially under paravirt the MMU inlines grow these update notifiers so they become even fatter. I agree, but I wanted to keep it semantically equivalent to the original. I'll add a move to out of line patch. having functions instead of inlines also simplifies the type dependencies by quite a degree. Indeed, the floating asm/tlbflush.h is a bit of a wart. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6 of 7] x86: move pud/pgd functions into common asm/pgalloc.h
Ingo Molnar wrote: * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: Common definitions for 4-level pagetable functions. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/pgtable.c|7 ++ include/asm-x86/pgalloc.h| 46 -- include/asm-x86/pgalloc_32.h | 24 - include/asm-x86/pgalloc_64.h | 32 - 4 files changed, 47 insertions(+), 62 deletions(-) random-qa found an early bootup hang on 32-bit (config attached). The config you sent was 64-bit. i bisected it down to this patch of yours. It's a bit large so it's not obvious what is happening. Could you please keep patches that do functional changes smaller? Will do, though this one is more or less pure code motion. But I can make it actual pure code motion with a separate merge patch. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6 of 7] x86: move pud/pgd functions into common asm/pgalloc.h
Ingo Molnar wrote: yes but the early hang is very real so either my hardware is stubbornly ignoring that your patch is pure code movement (in which case i'll have to have a word or two with my hardware), or your patch is perhaps wrong somewhere ;-) I see what it is. set_pud on 32-bit needs only _PAGE_PRESENT, but for 64-bit it needs _PAGE_TABLE. generally you can protect yourself against full reverts by separating the NOP changes from the non-NOP changes. If a change is small enough i might spot the bug immediately and fix it - otherwise i have to undo your whole series to keep the x86.git ball rolling. I thought we went through this excercise a few times already :-/ ... This one was supposed to be pure motion, but my eyeball diff failed me. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2 of 7] x86: add common mm/pgtable.c
Ingo Molnar wrote: * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: Add a common arch/x86/mm/pgtable.c file for common pagetable functions. randconfig testing found a build breakage on 32-bit, and that got bisected down to this patch of yours. Couldn't reproduce. What was the failure? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2 of 7] x86: add common mm/pgtable.c
Ingo Molnar wrote: oops, i thought i pasted that. It was this: arch/x86/mm/pgtable.c: In function 'pgd_alloc': arch/x86/mm/pgtable.c:213: error: implicit declaration of function 'quicklist_alloc' arch/x86/mm/pgtable.c:213: warning: initialization makes pointer from integer without a cast arch/x86/mm/pgtable.c:218: error: implicit declaration of function 'quicklist_free' arch/x86/mm/pgtable.c: In function 'check_pgt_cache': arch/x86/mm/pgtable.c:233: error: implicit declaration of function 'quicklist_trim' also, config re-attached. (maybe i messed up the previous one) Still can't reproduce, but it's a simple case of missing headers. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2 of 7] x86: add common mm/pgtable.c
Ingo Molnar wrote: * Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: Ingo Molnar wrote: oops, i thought i pasted that. It was this: arch/x86/mm/pgtable.c: In function 'pgd_alloc': arch/x86/mm/pgtable.c:213: error: implicit declaration of function 'quicklist_alloc' arch/x86/mm/pgtable.c:213: warning: initialization makes pointer from integer without a cast arch/x86/mm/pgtable.c:218: error: implicit declaration of function 'quicklist_free' arch/x86/mm/pgtable.c: In function 'check_pgt_cache': arch/x86/mm/pgtable.c:233: error: implicit declaration of function 'quicklist_trim' also, config re-attached. (maybe i messed up the previous one) Still can't reproduce, but it's a simple case of missing headers. ok, i'll figure it out if/when it happens with your resent queue. I added an explict to pgtable.c, so there's no excuse to still complain. Will resend the combined series shortly. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
Jody Belka wrote: Hi all, I thought I'd try out 2.6.25-rc1 as a xen 32-bit pae domU the other day. Unfortunately, I didn't get very far very fast, as the domain just crashed immediately upon booting, without any direct feedback (I did have messages on the xen message buffer, which helped). This even with earlyprintk turned on. After a long, arduous journey, I managed to track this down to the following: -- commit 551889a6e2a24a9c06fd453ea03b57b7746ffdc0 x86: construct 32-bit boot time page tables in native format. Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled kernel are in PAE format. early_ioremap is updated to use the standard page table accessors. Clear any mappings beyond max_low_pfn from the boot page tables in native_pagetable_setup_start because the initial mappings can extend beyond the range of physical memory and into the vmalloc area. Derived from patches by Eric Biederman and H. Peter Anvin. [ [EMAIL PROTECTED]: PAE swapper_pg_dir needs to be page-sized fix ] -- However, to make life more interesting, just reverting this isn't quite enough to get us to the promised land. If we try, we find that although we do now start booting, we crash again a short way into the process. In a different manner though. Specifically, in early_ioremap_clear. Reverting the above commit /except/ for the changes to arch/x86/mm/ioremap.c gets everything working again. Well, except that we can't shutdown/reboot properly, but I've sent a patch for that in another email. I'm afraid i've no idea what needs to be done to get the change to work with xen, but i'm willing to try out any patches people come up with. Please cc me on any replies, as i'm not subscribed, thanks. Hi, Although I'm on vacation, I happened to download a recent copy of x86.git and found that it crashes early. Here's a couple of patches to apply; I don't know if they apply to current git, but I hope it helps. J Subject: x86/early_ioremap: don't assume we're using swapper_pg_dir At the early stages of boot, before the kernel pagetable has been fully initialized, a Xen kernel will still be running off the Xen-provided pagetables rather than swapper_pg_dir[]. Therefore, readback cr3 to determine the base of the pagetable rather than assuming swapper_pg_dir[]. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/mm/ioremap.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) === --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -265,7 +265,9 @@ static inline pmd_t * __init early_ioremap_pmd(unsigned long addr) { - pgd_t *pgd = &swapper_pg_dir[pgd_index(addr)]; + /* Don't assume we're using swapper_pg_dir at this point */ + pgd_t *base = __va(read_cr3()); + pgd_t *pgd = &base[pgd_index(addr)]; pud_t *pud = pud_offset(pgd, addr); pmd_t *pmd = pmd_offset(pud, addr); Subject: xen: unpin initial Xen pagetable once we're finished with it Unpin the Xen-provided pagetable once we've finished with it, so it doesn't cause stray references which cause later swapper_pg_dir pagetable updates to fail. Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- arch/x86/xen/enlighten.c |4 1 file changed, 4 insertions(+) === --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -798,6 +798,10 @@ * added to the table can be prepared properly for Xen. */ xen_write_cr3(__pa(base)); + + /* Unpin initial Xen pagetable */ + pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, + PFN_DOWN(__pa(xen_start_info->pt_base))); } static __init void xen_pagetable_setup_done(pgd_t *base)
Re: 2.6.25-rc1 xen pvops regression
Joel Becker wrote: On Wed, Feb 13, 2008 at 10:59:33PM +1100, Jeremy Fitzhardinge wrote: I thought I'd try out 2.6.25-rc1 as a xen 32-bit pae domU the other day. Unfortunately, I didn't get very far very fast, as the domain just crashed immediately upon booting, without any direct feedback (I did have messages on the xen message buffer, which helped). This even with earlyprintk turned on. After a long, arduous journey, I managed to track this down to the following: -- commit 551889a6e2a24a9c06fd453ea03b57b7746ffdc0 I'm seeing the same problem, with no messages at all from xen other than "domain crashed, restart disabled" in xend.log. I got a different commit in my bisect, 0947b2f31ca1ea1211d3cde2dbd8fcec579ef395 (i386 boot: replace boot_ioremap with enhanced bt_ioremap - enhance bt_ioremap). I started from yesterday's 96b5a46e2a72dc1829370c87053e0cd558d58bc0 (WMI: initialize wmi_blocks.list even if ACPI is disabled) and a known good 9b73e76f3cf63379dcf45fcd4f112f5812418d0a (Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6). Although I'm on vacation, I happened to download a recent copy of x86.git and found that it crashes early. Here's a couple of patches to apply; I don't know if they apply to current git, but I hope it helps. Subject: x86/early_ioremap: don't assume we're using swapper_pg_dir Subject: xen: unpin initial Xen pagetable once we're finished with it After my bisect was done, I re-pulled from Linus and discovered these patches. Searching for these emails, they certainly sound like my problem. But the kernel does not boot, commit 10270d4838bdc493781f5a1cf2e90e9c34c9142f (acpi: fix acpi_os_read_pci_configuration() misuse of raw_pci_read()). Still no output from Xen - pygrub selects the kernel, and then the domain just dies back to the dom0 shell. Attached are my latest .config and my bisect log. Is the domain ending up in the crashed state? Do you get a register dump with xm dmesg? That would be very useful in determining what went wrong. You may need to compile Xen with debug=y in Config.mk. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 1/3] x86: use ELF format in compressed images.
Ian Campbell wrote: On Thu, 2008-02-14 at 17:01 +, Ian Campbell wrote: I have a xen domain builder patch as well. I was waiting for the Linux side to gain some traction before putting it forward (I'd attach it now but it's at home on a laptop which is sleeping). Here it is: # HG changeset patch # User [EMAIL PROTECTED] # Date 1203011758 0 # Node ID 3079b4b3835e3aba52bb6548bbbced70471a9f32 # Parent 42369d21641d6297dc369441c3bfd355880d28c0 Support loading Linux bzImage v2.08 and up. Do you also have a patch to update the boot protocol? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv3 1/3] x86: use ELF format in compressed images.
H. Peter Anvin wrote: Jeremy Fitzhardinge wrote: Do you also have a patch to update the boot protocol? Looking for anything different than the root of this thread? Yes, the patch for the Xen domain builder to boot a bzImage using the Linux boot protocol rather than the Xen one. Ian's patch will extract the ELF file from the bzImage, but still boot it by finding the Xen entrypoint in the notes, with %esi pointing to the Xen start_info rather than the boot_params (unless I'm missing something). J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
Joel Becker wrote: On Thu, Feb 14, 2008 at 06:50:52PM +1100, Jeremy Fitzhardinge wrote: I'm seeing the same problem, with no messages at all from xen other than "domain crashed, restart disabled" in xend.log. I got a different commit in my bisect, 0947b2f31ca1ea1211d3cde2dbd8fcec579ef395 (i386 boot: replace boot_ioremap with enhanced bt_ioremap - enhance bt_ioremap). I started from yesterday's 96b5a46e2a72dc1829370c87053e0cd558d58bc0 (WMI: initialize wmi_blocks.list even if ACPI is disabled) and a known good 9b73e76f3cf63379dcf45fcd4f112f5812418d0a (Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6). Is the domain ending up in the crashed state? Do you get a register dump with xm dmesg? That would be very useful in determining what went wrong. You may need to compile Xen with debug=y in Config.mk. I didn't know xm dmesg existed :-) Regarding debug=y, I'm using a prepackaged dom0 set. Here's what I find in xm dmesg: Joel (XEN) mm.c:1825:d109 Bad type (saw 2801 != exp e000) for mfn 3a2f0f (pfn f0) (XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 0003a2f0f063 for dom109 (XEN) mm.c:1825:d109 Bad type (saw 2801 != exp e000) for mfn 3a2f0f (pfn f0) (XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 0003a2f0f063 for dom109 (XEN) mm.c:3331:d109 ptwr_emulate: could not get_page_from_l1e() Hm, I have a suspicion about what this might be. I'll haven't tried reproducing it yet though. (XEN) Unhandled page fault in domain 109 on VCPU 0 (ec=0003) (XEN) Pagetable walk from c01687f0: (XEN) L4[0x000] = 0003a2933027 06cc (XEN) L3[0x003] = 00039afea027 0005 (XEN) L2[0x000] = 00039bfb7067 1048 (XEN) L1[0x168] = 0003a2e97061 0168 (XEN) domain_crash_sync called from entry.S (XEN) Domain 109 (vcpu#0) crashed on cpu#2: (XEN) [ Xen-3.1.3-rc3 x86_64 debug=n Not tainted ] (XEN) CPU:2 (XEN) RIP:e019:[<c04040bd>] What does this EIP correspond to in your kernel? Also: c01687f0 c0417ab6 c040288f c040299a c0403270 (as guesses of potential callers to try and work out a stack trace). Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
Joel Becker wrote: On Sat, Feb 16, 2008 at 01:44:26PM +1100, Jeremy Fitzhardinge wrote: Joel Becker wrote: (XEN) mm.c:1825:d109 Bad type (saw 2801 != exp e000) for mfn 3a2f0f (pfn f0) (XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 0003a2f0f063 for dom109 (XEN) mm.c:1825:d109 Bad type (saw 2801 != exp e000) for mfn 3a2f0f (pfn f0) (XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 0003a2f0f063 for dom109 (XEN) mm.c:3331:d109 ptwr_emulate: could not get_page_from_l1e() Hm, I have a suspicion about what this might be. I'll haven't tried reproducing it yet though. (XEN) Unhandled page fault in domain 109 on VCPU 0 (ec=0003) (XEN) Pagetable walk from c01687f0: (XEN) L4[0x000] = 0003a2933027 06cc (XEN) L3[0x003] = 00039afea027 0005 (XEN) L2[0x000] = 00039bfb7067 1048 (XEN) L1[0x168] = 0003a2e97061 0168 (XEN) domain_crash_sync called from entry.S (XEN) Domain 109 (vcpu#0) crashed on cpu#2: (XEN) [ Xen-3.1.3-rc3 x86_64 debug=n Not tainted ] (XEN) CPU:2 (XEN) RIP:e019:[<c04040bd>] What does this EIP correspond to in your kernel? Also: c01687f0 c0417ab6 c040288f c040299a c0403270 (as guesses of potential callers to try and work out a stack trace). ksymoops is no help at all, but I got these from objdump of vmlinux: c04040bd xen_set_pte c0417ab6 set_pte_present c040288f set_bit c040299a __raw_spin_unlock c0403270 __set_64bit (My usual technique is use "gdb vmlinux" and "x/i 0x" to do the lookup.) Unfortunately that doesn't narrow down what the kernel was actually trying to do at the time. Clearly a set_pte; looks like someone is trying to create a writable mapping of an existing pte page. Does "console=hvc0 earlyprintk=xen" on the kernel command line give any clue about how far it gets before crashing? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86/mm: stop allocating pmd page if failed
On 07/24/2012 06:15 AM, Yuanhan Liu wrote: > The old code would call __get_free_page() even though previous > allocation fail met. This is not needed. Yeah, I guess, but its hardly worth changing. J > > Signed-off-by: Yuanhan Liu > Cc: Jeremy Fitzhardinge > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > --- > arch/x86/mm/pgtable.c | 18 +- > 1 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c > index 8573b83..6760348 100644 > --- a/arch/x86/mm/pgtable.c > +++ b/arch/x86/mm/pgtable.c > @@ -181,24 +181,24 @@ static void free_pmds(pmd_t *pmds[]) > { > int i; > > - for(i = 0; i < PREALLOCATED_PMDS; i++) > - if (pmds[i]) > - free_page((unsigned long)pmds[i]); > + for(i = 0; i < PREALLOCATED_PMDS; i++) { > + if (pmds[i] == NULL) > + break; > + free_page((unsigned long)pmds[i]); > + } > } > > static int preallocate_pmds(pmd_t *pmds[]) > { > int i; > - bool failed = false; > > for(i = 0; i < PREALLOCATED_PMDS; i++) { > - pmd_t *pmd = (pmd_t *)__get_free_page(PGALLOC_GFP); > - if (pmd == NULL) > - failed = true; > - pmds[i] = pmd; > + pmds[i] = (pmd_t *)__get_free_page(PGALLOC_GFP); > + if (pmds[i] == NULL) > + break; > } > > - if (failed) { > + if (i < PREALLOCATED_PMDS) { > free_pmds(pmds); > return -ENOMEM; > } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver
On 10/09/2012 06:14 PM, Andrew Morton wrote: > On Wed, 10 Oct 2012 00:09:12 + KY Srinivasan wrote: > + if (!pg) { + *alloc_error = true; + return i * alloc_unit; + } + + totalram_pages -= alloc_unit; >>> Well, I'd consider totalram_pages to be an mm-private thing which drivers >>> shouldn't muck with. Why is this done? >> By modifying the totalram_pages, the information presented in /proc/meminfo >> correctly reflects what is currently assigned to the guest (MemTotal). > eh? /proc/meminfo:MemTotal tells you the total memory in the machine. > The only thing which should change it after boot is memory hotplug. [...] > Why on earth do balloon drivers do this? If the amount of memory which > is consumed by balloons is interesting then it should be exported via a > standalone metric, not by mucking with totalram_pages. Balloon drivers are trying to fake a form of page-by-page memory hotplug. When they allocate memory from the kernel, they're actually giving the pages back to the hypervisor to redistribute to other guests. They reduce totalram_pages to try and reflect that the memory is no longer the kernel (in Xen, at least, the pfns will no longer have any physical page underlying them). I agree this is pretty ugly; it would be nice to have some better interface to indicate what's going on. At one point I tried to use the memory hotplug interfaces for larger-scale dynamic transfers of memory between a domain and the host, but when I last looked at it, it was too coarse grained and heavyweight to replace the balloon mechanism. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks
On 06/01/2013 01:14 PM, Andi Kleen wrote: > FWIW I use the paravirt spinlock ops for adding lock elision > to the spinlocks. Does lock elision still use the ticketlock algorithm/structure, or are they different? If they're still basically ticketlocks, then it seems to me that they're complimentary - hle handles the fastpath, and pv the slowpath. > This needs to be done at the top level (so the level you're removing) > > However I don't like the pv mechanism very much and would > be fine with using an static key hook in the main path > like I do for all the other lock types. Right. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks
On 06/01/2013 12:21 PM, Raghavendra K T wrote: > x86/spinlock: Replace pv spinlocks with pv ticketlocks > > From: Jeremy Fitzhardinge I'm not sure what the etiquette is here; I did the work while at Citrix, but jer...@goop.org is my canonical email address. The Citrix address is dead and bounces, so is useless for anything. Probably best to change it. J > > Rather than outright replacing the entire spinlock implementation in > order to paravirtualize it, keep the ticket lock implementation but add > a couple of pvops hooks on the slow patch (long spin on lock, unlocking > a contended lock). > > Ticket locks have a number of nice properties, but they also have some > surprising behaviours in virtual environments. They enforce a strict > FIFO ordering on cpus trying to take a lock; however, if the hypervisor > scheduler does not schedule the cpus in the correct order, the system can > waste a huge amount of time spinning until the next cpu can take the lock. > > (See Thomas Friebel's talk "Prevent Guests from Spinning Around" > http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) > > To address this, we add two hooks: > - __ticket_spin_lock which is called after the cpu has been >spinning on the lock for a significant number of iterations but has >failed to take the lock (presumably because the cpu holding the lock >has been descheduled). The lock_spinning pvop is expected to block >the cpu until it has been kicked by the current lock holder. > - __ticket_spin_unlock, which on releasing a contended lock >(there are more cpus with tail tickets), it looks to see if the next >cpu is blocked and wakes it if so. > > When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub > functions causes all the extra code to go away. > > Signed-off-by: Jeremy Fitzhardinge > Reviewed-by: Konrad Rzeszutek Wilk > Tested-by: Attilio Rao > [ Raghavendra: Changed SPIN_THRESHOLD ] > Signed-off-by: Raghavendra K T > --- > arch/x86/include/asm/paravirt.h | 32 > arch/x86/include/asm/paravirt_types.h | 10 ++ > arch/x86/include/asm/spinlock.h | 53 > +++-- > arch/x86/include/asm/spinlock_types.h |4 -- > arch/x86/kernel/paravirt-spinlocks.c | 15 + > arch/x86/xen/spinlock.c |8 - > 6 files changed, 61 insertions(+), 61 deletions(-) > > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index cfdc9ee..040e72d 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum > fixed_addresses */ idx, > > #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS) > > -static inline int arch_spin_is_locked(struct arch_spinlock *lock) > +static __always_inline void __ticket_lock_spinning(struct arch_spinlock > *lock, > + __ticket_t ticket) > { > - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock); > + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); > } > > -static inline int arch_spin_is_contended(struct arch_spinlock *lock) > +static __always_inline void ticket_unlock_kick(struct arch_spinlock > *lock, > + __ticket_t ticket) > { > - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock); > -} > -#define arch_spin_is_contended arch_spin_is_contended > - > -static __always_inline void arch_spin_lock(struct arch_spinlock *lock) > -{ > - PVOP_VCALL1(pv_lock_ops.spin_lock, lock); > -} > - > -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock, > - unsigned long flags) > -{ > - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags); > -} > - > -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock) > -{ > - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock); > -} > - > -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock) > -{ > - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock); > + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); > } > > #endif > diff --git a/arch/x86/include/asm/paravirt_types.h > b/arch/x86/include/asm/paravirt_types.h > index 0db1fca..d5deb6d 100644 > --- a/arch/x86/include/asm/paravirt_types.h > +++ b/arch/x86/include/asm/paravirt_types.h > @@ -327,13 +327,11 @@ struct pv_mmu_ops { > }; > > struct arch_spinlock; > +#include > + > struct pv_lock_ops { > - int (*spin_is_locked)(st
Re: [PATCH] x86/asm: avoid mnemonics without type suffix
(resent without HTML) On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote: > 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30) > changed a bunch of btrl/btsl instructions to btr/bts, with the following > justification: > > The inline assembly for the bit operations has been changed to remove > explicit sizing hints on the instructions, so the assembler will pick > the appropriate instruction forms depending on the architecture and > the context. > > Unfortunately, GNU as does no such thing, and the AT&T syntax manual > [1] contains no references to any such inference. As evidenced by the > following experiment, gas always disambiguates btr/bts to btrl/btsl. > Feed the following input to gas: > > btrl$1, 0 > btr $1, 0 > btsl$1, 0 > bts $1, 0 When I originally did those patches, I was careful make sure that we didn't give implied sizes to operations with only immediate and/or memory operands because - in general - gas can't infer the operation size from such operands. However, in the case of the bit test/set operations, the memory access size is not really derived from the operation size (the SDM is a bit vague), and even if it were it would be an operation rather than semantic difference. So there's no real problem with gas choosing 'l' as a default size in the absence of any explicit override or constraint. > Check that btr matches btrl, and bts matches btsl in both cases: > > $ as --32 -a in.s > $ as --64 -a in.s > > To avoid giving readers the illusion of such an inference, and for > clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to > disambiguate btr/bts automatically. That sounds reasonable for all other operations because it makes a real semantic difference, but overly strict for bit operations. J > [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf > > Cc: Jeremy Fitzhardinge > Cc: Andi Kleen > Cc: Linus Torvalds > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Eli Friedman > Cc: Jim Grosbach > Cc: Stephen Checkoway > Cc: LLVMdev > Signed-off-by: Ramkumar Ramachandra > --- > We discussed this pretty extensively on LLVMDev, but I'm still not > sure that I haven't missed something. > > arch/x86/include/asm/bitops.h | 16 > arch/x86/include/asm/percpu.h | 2 +- > 2 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h > index 6dfd019..6ed3d1e 100644 > --- a/arch/x86/include/asm/bitops.h > +++ b/arch/x86/include/asm/bitops.h > @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > : "iq" ((u8)CONST_MASK(nr)) > : "memory"); > } else { > - asm volatile(LOCK_PREFIX "bts %1,%0" > + asm volatile(LOCK_PREFIX "btsl %1,%0" > : BITOP_ADDR(addr) : "Ir" (nr) : "memory"); > } > } > @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr) > */ > static inline void __set_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory"); > + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory"); > } > > /** > @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr) > : CONST_MASK_ADDR(nr, addr) > : "iq" ((u8)~CONST_MASK(nr))); > } else { > - asm volatile(LOCK_PREFIX "btr %1,%0" > + asm volatile(LOCK_PREFIX "btrl %1,%0" > : BITOP_ADDR(addr) > : "Ir" (nr)); > } > @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile > unsigned long *addr) > > static inline void __clear_bit(int nr, volatile unsigned long *addr) > { > - asm volatile("btr %1,%0" : ADDR : "Ir" (nr)); > + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr)); > } > > /* > @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile > unsigned long *addr) > { > int oldbit; > > - asm volatile(LOCK_PREFIX "bts %2,%1\n\t" > + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t" >"sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory"); > > return oldbit; > @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile > unsigned long *addr) > { > int ol
Re: [PATCH] x86/asm: avoid mnemonics without type suffix
(Resent without HTML) On 07/14/2013 10:19 AM, Linus Torvalds wrote: > Now, there are possible cases where you want to make the size explicit > because you are mixing memory operand sizes and there can be nasty > performance implications of doing a 32-bit write and then doing a > 64-bit read of the result. I'm not actually aware of us having ever > worried/cared about it, but it's a possible source of trouble: mixing > bitop instructions with non-bitop instructions can have some subtle > interactions, and you need to be careful, since the size of the > operand affects both the offset *and* the memory access size. The SDM entry for BT mentions that the instruction may touch 2 or 4 bytes depending on the operand size, but doesn't specifically mention that a 64 bit operation size touches 8 bytes - and it doesn't mention anything at all about operand size and access size in BTR/BTS/BTC (unless it's implied as part of the discussion about encoding the MSBs of a constant bit offset in the offset of the addressing mode). Is that an oversight? > The > access size generally is meaningless from a semantic standpoint > (little-endian being the only sane model), but the access size *can* > have performance implications for the write queue forwarding. It looks like that if the base address isn't aligned then neither is the generated access, so you could get a protection fault if it overlaps a page boundary, which is a semantic rather than purely operational difference. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix
On 07/14/2013 12:30 PM, Tim Northover wrote: >> And that is why I think you should just consider "bt $x,y" to be >> trivially the same thing and not at all ambiguous. Because there is >> ABSOLUTELY ZERO ambiguity when people write >> >>bt $63, mem >> >> Zero. Nada. None. The semantics are *exactly* the same for btl and btq >> in this case, so why would you want the user to specify one or the >> other? > I don't think you've actually tested that, have you? (x86-64) > > int main() { > long val = 0x; > char res; > > asm("btl $63, %1\n\tsetc %0" : "=r"(res) : "m"(val)); > printf("%d\n", res); > > asm("btq $63, %1\n\tsetc %0" : "=r"(res) : "m"(val)); > printf("%d\n", res); > } Blerk. It doesn't undermine the original point - that gas can unambiguously choose the right operation size for a constant bit offset - but yes, the operation size is meaningful in the case of a immediate bit offset. Its pretty nasty of Intel to hide that detail in Table 3-2, far from the instructions which use it... J > > Tim. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Replace in linux-next the xen, xen-two, xen-arm with xen/tip.git tree instead.
On 07/30/2013 12:53 PM, Konrad Rzeszutek Wilk wrote: > Hey, > > I was wondering if it would be possible to remove from linux-next > the three xen trees and instead use a combined tree, similar to the > x86 tip (so the various maintainers share it)? > > The ones that would be removed are: > > xen git > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git#upstream/xen > xen-two git > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git#linux-next > xen-arm git > git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git#linux-next > > And instead it would be pulled from: > > xen-tip git > git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git#linux-next > > I presume you need Ack's from all of us (so Jeremy and Stefano) so CC-ing > them here. > > And Acked-by: Konrad Rzeszutek Wilk > Ack from me. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
On 08/13/2013 01:02 PM, Raghavendra K T wrote: > * Ingo Molnar [2013-08-13 18:55:52]: > >> Would be nice to have a delta fix patch against tip:x86/spinlocks, which >> I'll then backmerge into that series via rebasing it. >> > There was a namespace collision of PER_CPU lock_waiting variable when > we have both Xen and KVM enabled. > > Perhaps this week wasn't for me. Had run 100 times randconfig in a loop > for the fix sent earlier :(. > > Ingo, below delta patch should fix it, IIRC, I hope you will be folding this > back to patch 14/14 itself. Else please let me. > I have already run allnoconfig, allyesconfig, randconfig with below patch. > But will > test again. This should apply on top of tip:x86/spinlocks. > > ---8<--- > From: Raghavendra K T > > Fix Namespace collision for lock_waiting > > Signed-off-by: Raghavendra K T > --- > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index d442471..b8ef630 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -673,7 +673,7 @@ struct kvm_lock_waiting { > static cpumask_t waiting_cpus; > > /* Track spinlock on which a cpu is waiting */ > -static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting); > +static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting); Has static stopped meaning static? J > > static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) > { > @@ -685,7 +685,7 @@ static void kvm_lock_spinning(struct arch_spinlock *lock, > __ticket_t want) > if (in_nmi()) > return; > > - w = &__get_cpu_var(lock_waiting); > + w = &__get_cpu_var(klock_waiting); > cpu = smp_processor_id(); > start = spin_time_start(); > > @@ -756,7 +756,7 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, > __ticket_t ticket) > > add_stats(RELEASED_SLOW, 1); > for_each_cpu(cpu, &waiting_cpus) { > - const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu); > + const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu); > if (ACCESS_ONCE(w->lock) == lock && > ACCESS_ONCE(w->want) == ticket) { > add_stats(RELEASED_SLOW_KICKED, 1); > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] MAINTAINERS: Remove Jeremy from the Xen subsystem.
On 08/05/2013 11:05 AM, Konrad Rzeszutek Wilk wrote: > Jeremy has been a key person in making Linux work with Xen. > He has been enjoying the last year working on something > different so reflect that in the maintainers file. Ack. J > > CC: Jeremy Fitzhardinge > Signed-off-by: Konrad Rzeszutek Wilk > --- > CREDITS | 1 + > MAINTAINERS | 1 - > 2 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/CREDITS b/CREDITS > index 206d0fc..646a0a9 100644 > --- a/CREDITS > +++ b/CREDITS > @@ -1120,6 +1120,7 @@ D: author of userfs filesystem > D: Improved mmap and munmap handling > D: General mm minor tidyups > D: autofs v4 maintainer > +D: Xen subsystem > S: 987 Alabama St > S: San Francisco > S: CA, 94110 > diff --git a/MAINTAINERS b/MAINTAINERS > index defc053..440af74 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -9237,7 +9237,6 @@ F: drivers/media/tuners/tuner-xc2028.* > > XEN HYPERVISOR INTERFACE > M: Konrad Rzeszutek Wilk > -M: Jeremy Fitzhardinge > L: xen-de...@lists.xensource.com (moderated for non-subscribers) > L: virtualizat...@lists.linux-foundation.org > S: Supported -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
Joel Becker wrote: Unfortunately that doesn't narrow down what the kernel was actually trying to do at the time. Clearly a set_pte; looks like someone is trying to create a writable mapping of an existing pte page. Does "console=hvc0 earlyprintk=xen" on the kernel command line give any clue about how far it gets before crashing? I built a kernel using your .config here, but I can't reproduce the problem. It makes it all the way to trying to start init (failed at that point because I didn't create an initrd with the xvd module to mount /). Console is already hvc0, but earlyprintk gets us: --8<- Reserving virtual address space above 0xf57fe000 Linux version 2.6.25-rc2-bisectme ([EMAIL PROTECTED]) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #21 SMP Fri Feb 15 16:28:35 PST 2008 ACPI in unprivileged domain disabled BIOS-provided physical RAM map: Xen: - 7800 (usable) console [xenboot0] enabled 1192MB HIGHMEM available. 727MB LOWMEM available. Started domain ca-test58 Scan SMP from c000 for 1024 bytes. Scan SMP from c009fc00 for 1024 bytes. Scan SMP from c00f for 65536 bytes. NX (Execute Disable) protection: active Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 186366 HighMem186366 -> 491520 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0:0 -> 491520 -->8- That's it. I get: Entering add_active_range(0, 0, 16384) 0 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 Normal 4096 ->16384 HighMem 16384 ->16384 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0:0 ->16384 On node 0 totalpages: 16384 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 4064 pages, LIFO batch:0 Normal zone: 96 pages used for memmap Normal zone: 12192 pages, LIFO batch:1 HighMem zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ... What happens if you give the domain less memory? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
Ian Campbell wrote: On Tue, 2008-02-19 at 23:43 -0800, H. Peter Anvin wrote: Ian Campbell wrote: On Mon, 2008-02-18 at 02:40 -0800, Joel Becker wrote: On Sun, Feb 17, 2008 at 06:49:21PM +, Ian Campbell wrote: x86/xen: Do not scan for DMI unless the DMI region is reserved by e820. This fixed it. I'm now booting successfully. Thank you! Excellent. Jeremy, are you happy for this to go in? I had no problem with it, but Peter's objection seems substantial enough. As far as the actual change goes I was assuming that any machine that has DMI/SMBIOS would easily be new enough to have an E820 which could be expected to reserve this region. Looks like I was mistaken about how long E820 had been around and/or how reliably it is used to reserve the tables. Anyway, will have to think of another solution. Well, the way we've handled this kind of thing elsewhere is to just reserve that pseudophys address space in earlish Xen init code and fill it with not-DMI things (zero, I guess). It's a bit of a waste of memory, but maybe we can recover it once DMI has given up and gone away. This also makes it easy to insert faked-up DMI info if that turns out to be useful. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 06/11] xen: move arch/x86/xen/events.c undedr drivers/xen and split out arch specific part.
[EMAIL PROTECTED] wrote: diff --git a/arch/x86/xen/events.c b/drivers/xen/events.c similarity index 95% rename from arch/x86/xen/events.c rename to drivers/xen/events.c index dcf613e..7474739 100644 --- a/arch/x86/xen/events.c +++ b/drivers/xen/events.c @@ -37,7 +37,9 @@ #include #include -#include "xen-ops.h" +#ifdef CONFIG_X86 +# include "../arch/x86/xen/xen-ops.h" +#endif Hm. Perhaps it would be better to move whatever definition you need into a header in a common place (or move xen-ops.h entirely). J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/11] xen: add missing definitions for xen grant table which ia64/xen needs.
[EMAIL PROTECTED] wrote: Yep. We removed the guest handle stuff for the initial upstreaming, since it isn't necessary on x86 and it quietened some of the reviewer noise. But I expected we'd need to reintroduce it at some stage. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer
Markus Armbruster wrote: Forgot to mention: This patch depends on Subject: [PATCH] xen: Make xen-blkfront write its protocol ABI to xenstore From: Markus Armbruster <> Date: Thu, 06 Dec 2007 14:45:53 +0100 http://lkml.org/lkml/2007/12/6/132 Sorry! Sorry, I haven't pushed this upstream yet, since there didn't seem to be any particular urgency. What's the dependency? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/11] Xen arch portability patches
[EMAIL PROTECTED] wrote: Hi. Recently the xen-ia64 community started to make efforts to merge xen/ia64 Linux to upstream. The first step is to merge up domU portion. This patchset is preliminary for xen/ia64 domU linux making the current xen/x86 domU code more arch generic and adding missing definitions and files. I haven't looked at the whole series yet, but this seems fine in principle. One thing: using attachments to post makes it hard to do inline comments on the patches. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
Markus Armbruster wrote: This is a pair of Xen para-virtual frontend device drivers: drivers/video/xen-fbfront.c provides a framebuffer, and drivers/input/xen-kbdfront provides keyboard and mouse. Unless they're actually inter-dependent, could you post this as two separate patches? I don't know anything about these parts of the kernel, so it would be nice to make it very obvious which changes are fb vs mouse/keyboard. (I guess input/* vs video/* should make it obvious, but it looks like input has a config dependency on fb, so I'll avoid making too many presumptions...) (Couple of comments below) J The backends run in dom0 user space. Signed-off-by: Markus Armbruster <[EMAIL PROTECTED]> --- drivers/input/Kconfig|9 drivers/input/Makefile |2 drivers/input/xen-kbdfront.c | 337 +++ drivers/video/Kconfig| 14 drivers/video/Makefile |1 drivers/video/xen-fbfront.c | 550 +++ include/xen/interface/io/fbif.h | 124 include/xen/interface/io/kbdif.h | 114 8 files changed, 1151 insertions(+) diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig index 9dea14d..5f9d860 100644 --- a/drivers/input/Kconfig +++ b/drivers/input/Kconfig @@ -149,6 +149,15 @@ config INPUT_APMPOWER To compile this driver as a module, choose M here: the module will be called apm-power. +config XEN_KBDDEV_FRONTEND + tristate "Xen virtual keyboard and mouse support" + depends on XEN_FBDEV_FRONTEND + default y + help + This driver implements the front-end of the Xen virtual + keyboard and mouse device driver. It communicates with a back-end + in another domain. + comment "Input Device Drivers" source "drivers/input/keyboard/Kconfig" diff --git a/drivers/input/Makefile b/drivers/input/Makefile index 2ae87b1..98c4f9a 100644 --- a/drivers/input/Makefile +++ b/drivers/input/Makefile @@ -23,3 +23,5 @@ obj-$(CONFIG_INPUT_TOUCHSCREEN) += touchscreen/ obj-$(CONFIG_INPUT_MISC) += misc/ obj-$(CONFIG_INPUT_APMPOWER) += apm-power.o + +obj-$(CONFIG_XEN_KBDDEV_FRONTEND) += xen-kbdfront.o diff --git a/drivers/input/xen-kbdfront.c b/drivers/input/xen-kbdfront.c new file mode 100644 index 000..84f65cf --- /dev/null +++ b/drivers/input/xen-kbdfront.c @@ -0,0 +1,337 @@ +/* + * Xen para-virtual input device + * + * Copyright (C) 2005 Anthony Liguori <[EMAIL PROTECTED]> + * Copyright (C) 2006-2008 Red Hat, Inc., Markus Armbruster <[EMAIL PROTECTED]> + * + * Based on linux/drivers/input/mouse/sermouse.c + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file COPYING in the main directory of this archive for + * more details. + */ + +/* + * TODO: + * + * Switch to grant tables together with xen-fbfront.c. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct xenkbd_info { + struct input_dev *kbd; + struct input_dev *ptr; + struct xenkbd_page *page; + int evtchn, irq; + struct xenbus_device *xbdev; + char phys[32]; +}; + +static int xenkbd_remove(struct xenbus_device *); +static int xenkbd_connect_backend(struct xenbus_device *, struct xenkbd_info *); +static void xenkbd_disconnect_backend(struct xenkbd_info *); + +/* + * Note: if you need to send out events, see xenfb_do_update() for how + * to do that. + */ + +static irqreturn_t input_handler(int rq, void *dev_id) +{ + struct xenkbd_info *info = dev_id; + struct xenkbd_page *page = info->page; + __u32 cons, prod; + + prod = page->in_prod; + if (prod == page->in_cons) + return IRQ_HANDLED; + rmb(); /* ensure we see ring contents up to prod */ + for (cons = page->in_cons; cons != prod; cons++) { + union xenkbd_in_event *event; + struct input_dev *dev; + event = &XENKBD_IN_RING_REF(page, cons); + + dev = info->ptr; + switch (event->type) { + case XENKBD_TYPE_MOTION: + input_report_rel(dev, REL_X, event->motion.rel_x); + input_report_rel(dev, REL_Y, event->motion.rel_y); + break; + case XENKBD_TYPE_KEY: + dev = NULL; + if (test_bit(event->key.keycode, info->kbd->keybit)) + dev = info->kbd; + if (test_bit(event->key.keycode, info->ptr->keybit)) + dev = info->ptr; + if (dev) + input_report_key(dev, event->key.keycode, +event->key.pressed); + else + printk(KERN_WARNING +
[PATCH] xen: Implement getgeo for Xen virtual block device.
The below implements the getgeo hook for Xen block devices. Extracted from the xen-unstable tree where it has been used for ages. It is useful to have because it allows things like grub2 (used by the Debian installer images) to work in a guest domain without having to sprinkle Xen specific hacks around the place. Signed-off-by: Ian Campbell <[EMAIL PROTECTED]> From: Ian Campbell <[EMAIL PROTECTED]> Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> --- drivers/block/xen-blkfront.c | 18 ++ 1 file changed, 18 insertions(+) === --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -37,6 +37,7 @@ #include #include +#include #include #include @@ -134,6 +135,22 @@ static void blkif_restart_queue_callback { struct blkfront_info *info = (struct blkfront_info *)arg; schedule_work(&info->work); +} + +int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg) +{ + /* We don't have real geometry info, but let's at least return + values consistent with the size of the device */ + sector_t nsect = get_capacity(bd->bd_disk); + sector_t cylinders = nsect; + + hg->heads = 0xff; + hg->sectors = 0x3f; + sector_div(cylinders, hg->heads * hg->sectors); + hg->cylinders = cylinders; + if ((sector_t)(hg->cylinders + 1) * hg->heads * hg->sectors < nsect) + hg->cylinders = 0x; + return 0; } /* @@ -946,6 +963,7 @@ static struct block_device_operations xl .owner = THIS_MODULE, .open = blkif_open, .release = blkif_release, + .getgeo = blkif_getgeo, }; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
Ian Campbell wrote: I'll see if I can track down where the page is getting used and have a go at getting in there first. It must be pretty early to be allocated already when dmi_scan_machine gets called. It's possible that the domain builder might have already allocated a PT at this address. I haven't checked but I think currently the domain builder always puts PT pages after the kernel so hopefully it's only a theoretical problem. Yes, it does. And presumably the early pagetable builder is guaranteed to avoid special memory like the DMI space. But the bug definitely seems to be a result of the DMI code trying to make a RW mapping of a pagetable page, so something is amiss there. Ooh, sleazy hack idea: make DMI always map RO, so even if it does get a pagetable it causes no complaint... A bit awkward, since there doesn't seem to be an RO form of early_ioremap. Another option I was thinking of was a command line option to disable DMI, which (maybe) isn't terribly useful in itself but it introduces an associated variable to frob with. That's similar to how the TSC was handled in the past (well, the opposite since TSC was forced on). Yep, that would work too. Still curious about why a pagetable page is ending up in that range though. Seems like it shouldn't be possible, since we shouldn't be allowed to allocate from those pages, at least until the DMI probe has happened... Unless the early allocator is only excluded from e820 reserved pages, which would cause a problem on systems which don't reserve the DMI space... HPA? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen: Implement getgeo for Xen virtual block device.
Linus Torvalds wrote: On Thu, 21 Feb 2008, Jeremy Fitzhardinge wrote: Signed-off-by: Ian Campbell <[EMAIL PROTECTED]> From: Ian Campbell <[EMAIL PROTECTED]> Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> This is just wrong. The From: goes at the *top*, and if it's not there, my scripts won't pick it up as the author. OK. Have you fixed it, or shall I resend? Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
H. Peter Anvin wrote: Still curious about why a pagetable page is ending up in that range though. Seems like it shouldn't be possible, since we shouldn't be allowed to allocate from those pages, at least until the DMI probe has happened... Unless the early allocator is only excluded from e820 reserved pages, which would cause a problem on systems which don't reserve the DMI space... HPA? I thought the problem was a Xen-provided pagetable from before Linux started? Hm, I don't think so. The domain-builder pagetable is put after the kernel, so it shouldn't be under 1M. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen: Implement getgeo for Xen virtual block device.
Linus Torvalds wrote: On Thu, 21 Feb 2008, Jeremy Fitzhardinge wrote: OK. Have you fixed it, or shall I resend? I'll fix it, but I want people to know so that I don't have to fix things like this in the future (*). Linus (*) I keed, I keed. Of *course* I'll have to fix things like this in the future too. But hopefully not quite as often. Putting the From: in the Signed-off-by block is a result of two thoughts: 1. putting it at the top makes the most sense from an email perspective, but it often seem to get lost by various patch-posting programs if it gets tangled in the Subject/summary part of the patch. The result is that it needs to float in an odd way: Subject: wooble the foo From: Foo Woobler <[EMAIL PROTECTED]> Wooble foos in the appropriate manner. Signed-off-by: Foo Woobler <[EMAIL PROTECTED]> Cc: Bar Mangler <[EMAIL PROTECTED]> 2. There's already a block of email addresses which describe how people relate to this patch, so why not put From: there (since it isn't really an email From header, but a patch metadata header). I'd assumed that tools which pick "Thing: Email" pairs out of a patch would deal with From in the same place as a Signed-off-by. After all, tools deal with Cc:s there. I'll make sure From: is in the right place in future, but I just wanted to point out it wasn't complete randomness. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
H. Peter Anvin wrote: Jeremy Fitzhardinge wrote: It seems to me that those pages are being handed out as heap pages by the early allocator. In the Xen case this is OK because there's nothing magic about them. But if real hardware doesn't reserve these pages in the E820 map, then they could end up being used as regular memory by mistake, which is an issue. No, they couldn't. On real hardware they'll be memory types 0 or 2, depending on whether or not they're marked reserved. Available RAM is type 1. OK. Well, perhaps Ian's patch could be amended to test to see if the e820 map marks the ISA ROM region as normal RAM, and skip it if so? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] netvm: check for page == NULL when propogating the skb->pfmemalloc flag
On 08/13/2012 03:47 AM, Mel Gorman wrote: > Resending to correct Jeremy's address. > > On Wed, Aug 08, 2012 at 03:50:46PM -0700, David Miller wrote: >> From: Mel Gorman >> Date: Tue, 7 Aug 2012 09:55:55 +0100 >> >>> Commit [c48a11c7: netvm: propagate page->pfmemalloc to skb] is responsible >>> for the following bug triggered by a xen network driver >> ... >>> The problem is that the xenfront driver is passing a NULL page to >>> __skb_fill_page_desc() which was unexpected. This patch checks that >>> there is a page before dereferencing. >>> >>> Reported-and-Tested-by: Konrad Rzeszutek Wilk >>> Signed-off-by: Mel Gorman >> That call to __skb_fill_page_desc() in xen-netfront.c looks completely bogus. >> It's the only driver passing NULL here. >> >> That whole song and dance figuring out what to do with the head >> fragment page, depending upon whether the length is greater than the >> RX_COPY_THRESHOLD, is completely unnecessary. >> >> Just use something like a call to __pskb_pull_tail(skb, len) and all >> that other crap around that area can simply be deleted. > I looked at this for a while but I did not see how __pskb_pull_tail() > could be used sensibly but I'm simily not familiar with writing network > device drivers or Xen. > > This messing with RX_COPY_THRESHOLD seems to be related to how the frontend > and backend communicate (maybe some fixed limitation of the xenbus). The > existing code looks like it is trying to take the fragments received and > pass them straight to the backend without copying by passing the fragments > to the backend without copying. I worry that if I try converting this to > __pskb_pull_tail() that it would either hit the limitation of xenbus or > introduce copying where it is not wanted. > > I'm going to have to punt this to Jeremy and the other Xen folk as I'm not > sure what the original intention was and I don't have a Xen setup anywhere > to test any patch. Jeremy, xen folk? It's been a while since I've looked at that stuff, but as I remember, the issue is that since the packet ring memory is shared with another domain which may be untrustworthy, we want to make copies of the headers before making any decisions based on them so that the other domain can't change them after header processing but before they're actually sent. (The packet payload is considered less important, but of course the same issue applies if you're using some kind of content-aware packet filter.) So that's the rationale for always copying RX_COPY_THRESHOLD, even if the packet is larger than that amount. As far as I know, changing this behaviour wouldn't break the ring protocol, but it does introduce a potential security issue. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 52/74] x86, lto, paravirt: Don't rely on local assembler labels
On 08/18/2012 07:56 PM, Andi Kleen wrote: > From: Andi Kleen > > The paravirt patching code assumes that it can reference a > local assembler label between two different top level assembler > statements. This does not work with some experimental gcc builds, > where the assembler code may end up in different assembler files. Egad, what are those zany gcc chaps up to now? J > > Replace it with extern / global /asm linkage labels. > > This also removes one redundant copy of the macro. > > Cc: jer...@goop.org > Signed-off-by: Andi Kleen > --- > arch/x86/include/asm/paravirt_types.h |9 + > arch/x86/kernel/paravirt.c|5 - > 2 files changed, 5 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/asm/paravirt_types.h > b/arch/x86/include/asm/paravirt_types.h > index 4f262bc..6a464ba 100644 > --- a/arch/x86/include/asm/paravirt_types.h > +++ b/arch/x86/include/asm/paravirt_types.h > @@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops; > _paravirt_alt(insn_string, "%c[paravirt_typenum]", > "%c[paravirt_clobber]") > > /* Simple instruction patching code. */ > -#define DEF_NATIVE(ops, name, code) \ > - extern const char start_##ops##_##name[] __visible, \ > - end_##ops##_##name[] __visible; \ > - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":") > +#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b > ":\n\t" > + > +#define DEF_NATIVE(ops, name, code) \ > + __visible extern const char start_##ops##_##name[], > end_##ops##_##name[]; \ > + asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, > name)) > > unsigned paravirt_patch_nop(void); > unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len); > diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c > index 17fff18..947255e 100644 > --- a/arch/x86/kernel/paravirt.c > +++ b/arch/x86/kernel/paravirt.c > @@ -62,11 +62,6 @@ void __init default_banner(void) > pv_info.name); > } > > -/* Simple instruction patching code. */ > -#define DEF_NATIVE(ops, name, code) \ > - extern const char start_##ops##_##name[], end_##ops##_##name[]; \ > - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":") > - > /* Undefined instruction for dealing with missing ops pointers. */ > static const unsigned char ud2a[] = { 0x0f, 0x0b }; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 53/74] x86, lto, paravirt: Make paravirt thunks global
On 08/18/2012 07:56 PM, Andi Kleen wrote: > From: Andi Kleen > > The paravirt thunks use a hack of using a static reference to a static > function to reference that function from the top level statement. > > This assumes that gcc always generates static function names in a specific > format, which is not necessarily true. > > Simply make these functions global and asmlinkage. This way the > static __used variables are not needed and everything works. I'm not a huge fan of unstaticing all this stuff, but it doesn't surprise me that the current code is brittle in the face of gcc changes. J > > Changed in paravirt and in all users (Xen and vsmp) > > Cc: jer...@goop.org > Signed-off-by: Andi Kleen > --- > arch/x86/include/asm/paravirt.h |2 +- > arch/x86/kernel/vsmp_64.c |8 > arch/x86/xen/irq.c |8 > arch/x86/xen/mmu.c | 16 > 4 files changed, 17 insertions(+), 17 deletions(-) > > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index a0facf3..cc733a6 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -804,9 +804,9 @@ static __always_inline void arch_spin_unlock(struct > arch_spinlock *lock) > */ > #define PV_CALLEE_SAVE_REGS_THUNK(func) > \ > extern typeof(func) __raw_callee_save_##func; \ > - static void *__##func##__ __used = func;\ > \ > asm(".pushsection .text;" \ > + ".globl __raw_callee_save_" #func " ; " \ > "__raw_callee_save_" #func ": " \ > PV_SAVE_ALL_CALLER_REGS \ > "call " #func ";" \ > diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c > index 992f890..f393d6d 100644 > --- a/arch/x86/kernel/vsmp_64.c > +++ b/arch/x86/kernel/vsmp_64.c > @@ -33,7 +33,7 @@ > * and vice versa. > */ > > -static unsigned long vsmp_save_fl(void) > +asmlinkage unsigned long vsmp_save_fl(void) > { > unsigned long flags = native_save_fl(); > > @@ -43,7 +43,7 @@ static unsigned long vsmp_save_fl(void) > } > PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl); > > -static void vsmp_restore_fl(unsigned long flags) > +asmlinkage void vsmp_restore_fl(unsigned long flags) > { > if (flags & X86_EFLAGS_IF) > flags &= ~X86_EFLAGS_AC; > @@ -53,7 +53,7 @@ static void vsmp_restore_fl(unsigned long flags) > } > PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl); > > -static void vsmp_irq_disable(void) > +asmlinkage void vsmp_irq_disable(void) > { > unsigned long flags = native_save_fl(); > > @@ -61,7 +61,7 @@ static void vsmp_irq_disable(void) > } > PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable); > > -static void vsmp_irq_enable(void) > +asmlinkage void vsmp_irq_enable(void) > { > unsigned long flags = native_save_fl(); > > diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c > index 1573376..3dd8831 100644 > --- a/arch/x86/xen/irq.c > +++ b/arch/x86/xen/irq.c > @@ -21,7 +21,7 @@ void xen_force_evtchn_callback(void) > (void)HYPERVISOR_xen_version(0, NULL); > } > > -static unsigned long xen_save_fl(void) > +asmlinkage unsigned long xen_save_fl(void) > { > struct vcpu_info *vcpu; > unsigned long flags; > @@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void) > } > PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl); > > -static void xen_restore_fl(unsigned long flags) > +asmlinkage void xen_restore_fl(unsigned long flags) > { > struct vcpu_info *vcpu; > > @@ -66,7 +66,7 @@ static void xen_restore_fl(unsigned long flags) > } > PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl); > > -static void xen_irq_disable(void) > +asmlinkage void xen_irq_disable(void) > { > /* There's a one instruction preempt window here. We need to > make sure we're don't switch CPUs between getting the vcpu > @@ -77,7 +77,7 @@ static void xen_irq_disable(void) > } > PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable); > > -static void xen_irq_enable(void) > +asmlinkage void xen_irq_enable(void) > { > struct vcpu_info *vcpu; > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index b65a761..9f82443 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -429,7 +429,7 @@ static pteval_t iomap_pte(pteval_t val) > return val; > } > > -static pteval_t xen_pte_val(pte_t pte) > +asmlinkage pteval_t xen_pte_val(pte_t pte) > { > pteval_t pteval = pte.pte; > #if 0 > @@ -446,7 +446,7 @@ static pteval_t xen_pte_val(pte_t pte) > } > PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val); > > -static pgdval_t xen_pgd_val(pgd_t pgd) > +asmlinkage pgdval_t xen_pgd_val(pgd_t pgd) > { > return pte_mfn_to_pfn(pgd.pgd); > } > @@
Re: [Xen-devel] [PATCH] let XEN depend on PAE
Arnd Hannemann wrote: As paravirtualized xen guests won't work with !X86_PAE, change the Kconfig accordingly. !PAE is supposed to work, but it is a rarely used configuration. How does it fail? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] let XEN depend on PAE
Arnd Hannemann wrote: This is with 2.6.24.2, but latest-git looks the same: I also tried with 2.6.23 which crashes instantly, without any output of the guest. I'm not too surprised. Non-PAE Xen is a bit of a rarity, and it only gets tested rarely. Chris Wright did spend some time on it a while ago, but I don't know that its had any real attention since. I've been making sure non-PAE compiles, but I've been lax about testing it. This is the first usermode exec, I guess? The backtrace is a bit odd; I've never seen a problem in move_page_tables before. Does "xm dmesg" tell you what Xen is complaining about? You may need to compile with debug=y in Config.mk. [0.599806] 1 multicall(s) failed: cpu 0 [0.599816] call 1/2: op=26 arg=[c1051860] result=0 [0.599825] call 2/2: op=14 arg=[bf9c7000] result=-22 [0.599841] [ cut here ] [0.599851] kernel BUG at arch/x86/xen/multicalls.c:103! [0.599861] invalid opcode: [#1] SMP [0.599871] Modules linked in: [0.599879] [0.599885] Pid: 1, comm: init Not tainted (2.6.24.2 #6) [0.599895] EIP: 0061:[] EFLAGS: 00010202 CPU: 0 [0.599910] EIP is at xen_mc_flush+0x19c/0x1b0 [0.599919] EAX: EBX: c10510a0 ECX: c1051060 EDX: c1051060 [0.599930] ESI: 0002 EDI: 0001 EBP: c2417c10 ESP: c2417be4 [0.599940] DS: 007b ES: 007b FS: 00d8 GS: SS: e021 [0.599951] Process init (pid: 1, ti=c2417000 task=c2416ab0 task.ti=c2417000) [0.599960] Stack: c0443c98 0002 0002 000e bf9c7000 ffea c1051060 0200 [0.599984]0067 c193fffc bf9c7000 c2417c18 c0101112 c2417c5c c0166dfc c193ce40 [0.66]c193e5c0 c000 c193e5c0 1000 c000 c193ce40 c198e71c c10331cc [0.600029] Call Trace: [0.600036] [] show_trace_log_lvl+0x1a/0x30 [0.600050] [] show_stack_log_lvl+0xa9/0xd0 [0.600062] [] show_registers+0xca/0x1e0 [0.600074] [] die+0x11a/0x250 [0.600085] [] do_trap+0x83/0xb0 [0.600096] [] do_invalid_op+0x88/0xa0 [0.600108] [] error_code+0x72/0x80 [0.600121] [] xen_leave_lazy+0x12/0x20 [0.600134] [] move_page_tables+0x27c/0x300 [0.600149] [] setup_arg_pages+0x162/0x2a0 [0.600162] [] load_elf_binary+0x3d3/0x1bd0 [0.600175] [] search_binary_handler+0x92/0x200 [0.600190] [] load_script+0x1bf/0x200 [0.600202] [] search_binary_handler+0x92/0x200 [0.600215] [] do_execve+0x15b/0x180 [0.600227] [] sys_execve+0x2e/0x80 [0.600241] [] syscall_call+0x7/0xb [0.600253] === [0.600259] Code: 24 08 89 44 24 0c 89 74 24 04 c7 04 24 98 3c 44 c0 e8 c9 36 02 00 8b 45 ec 83 c3 20 8b 90 00 0b 00 00 39 d6 72 c0 e9 04 ff ff ff <0f> 0b eb fe 0f 0b eb fe 8d b6 00 00 00 00 8d bf 00 00 00 00 55 [0.600370] EIP: [] xen_mc_flush+0x19c/0x1b0 SS:ESP e021:c2417be4 [0.600393] ---[ end trace a686db401f06e173 ]--- [0.600403] Kernel panic - not syncing: Attempted to kill init! full dmesg, config here: http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00716.html Best regards, Arnd Hannemann J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] xen: Implement getgeo for Xen virtual block device.
Linus Torvalds wrote: This isn't a problem with things like "Signed-off-by:" etc tags, because they have no automated meaning and don't really change the commit itself, but the "From:"/"Date:"/"Subject:" markers at the head of the message really do have real meaning, and get removed from the commit message and instead get put into the SCM headers. It may be worth having a definitive and unambiguous Author: tag then, which can appear among Signed-off-by:s and is used in preference to anything else. From: is a useful heuristic which seems to work well in general, but as you say, it gets a bit hairy when you have something which means different things to different parts of the software stack at the same time. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Xen-devel] [PATCH] let XEN depend on PAE
Arnd Hannemann wrote: Jeremy Fitzhardinge wrote: Arnd Hannemann wrote: This is with 2.6.24.2, but latest-git looks the same: I also tried with 2.6.23 which crashes instantly, without any output of the guest. I'm not too surprised. Non-PAE Xen is a bit of a rarity, and it only gets tested rarely. Chris Wright did spend some time on it a while ago, but I don't know that its had any real attention since. I've been making sure non-PAE compiles, but I've been lax about testing it. This is the first usermode exec, I guess? The backtrace is a bit odd; I've never seen a problem in move_page_tables before. Yes its trying to execute the first script in initramfs, I also tried with initramdisk and got a similar error. (move_page_tables also involved) Does "xm dmesg" tell you what Xen is complaining about? You may need to compile with debug=y in Config.mk. (XEN) mm.c:645:d44 Non-privileged (44) attempt to map I/O space I will recompile with debug=y and post the output. If I reduce the dom0 memory with dom0_mem=20 I see something like 0080 with dom0_mem=80 I always see . That's helpful. Looks like the mfn is getting mushed to 0. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer driver
Markus Armbruster wrote: Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: Markus Armbruster wrote: This is a pair of Xen para-virtual frontend device drivers: drivers/video/xen-fbfront.c provides a framebuffer, and drivers/input/xen-kbdfront provides keyboard and mouse. Unless they're actually inter-dependent, could you post this as two separate patches? I don't know anything about these parts of the kernel, so it would be nice to make it very obvious which changes are fb vs mouse/keyboard. I could do that do that, but the intermediate step (one driver, not the other) is somewhat problematic: the backend in dom0 needs both drivers, and will refuse to complete device initialization unless they're both present. That's OK. In that case keep them together. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
compile problem in current x86.git
CC arch/x86/kernel/traps_32.o /home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/traps_32.c:59:27: error: asm/kmemcheck.h: No such file or directory asm-x86/kmemcheck.h does seem to be completely missing. Looks like 8db0acefb3025795abe3f37669354677a03de680 "x86: add hooks for kmemcheck" should have added the file. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compile problem in current x86.git
Ingo Molnar wrote: * Vegard Nossum <[EMAIL PROTECTED]> wrote: asm-x86/kmemcheck.h does seem to be completely missing. Looks like 8db0acefb3025795abe3f37669354677a03de680 "x86: add hooks for kmemcheck" should have added the file. Hm. This is x86#testing, no? I don't think there's any kmemcheck code whatsoever in other branches. The file should be added with this commit: kmemcheck: add the kmemcheck core http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commit;h=c83d05d69382945c92a2e7a2b168c1cc2aa77c29 yes, x86.git looks fine here too: ~/linux.trees.git> git-checkout -b tmp x86/testing Branch tmp set up to track remote branch refs/remotes/x86/testing. Switched to a new branch "tmp" ~/linux.trees.git> cd include/asm-x86/ ~/linux.trees.git/include/asm-x86> ls -l kmemcheck.h -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 kmemcheck.h ~/linux.trees.git/include/asm-x86> cd .. ~/linux.trees.git/include> cd .. ~/linux.trees.git> ls -ldt include/asm-x86/kmemcheck.h -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 include/asm-x86/kmemcheck.h ~/linux.trees.git> git-log | head -1 commit c9d2f5489cec70f814bf64033290e5f05b4d7f33 I'm using #mm. Should I be using #testing? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compile problem in current x86.git
Ingo Molnar wrote: Jeremy, you might want to start tracking x86.git#testing: http://people.redhat.com/mingo/x86.git/README if you want to follow the latest & greatest x86.git code. Right, will do. J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.25-rc1 xen pvops regression
Mark McLoughlin wrote: @@ -371,6 +372,9 @@ void __init dmi_scan_machine(void) } } else { + if (e820_all_mapped(0xF, 0xF+0x1, E820_RAM)) + goto out; One issue with using the e820 map for this is that a Xen Dom0 will also have this region marked as RAM in the e820 map, but will set up a fixmap for it, allowing dmi_scan_machine() to map the region. Would it be easier to just fake up a mapping so that window points to the real dmi area, and mark E820 accordingly? J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] FW: proposal for systems that do not require security
On Tue, Apr 10, 2001 at 02:35:52PM +0200, Heusden, Folkert van wrote: > So, I was wondering: isn't it a nice idea to have a switch in the > configuration menu to disable entropy-gathering in the interrupt-routines, > have some simplistic routine (like x'=(x * m + a) % p) which returns a non- > cryptographic value, and something similar symplistic for the network- > traffic routines? No, that's a very bad idea. If you think it's a problem, just remove the random driver altogether. It's much better for something to get ENXIO rather than thinking it's getting real randomness. You can still get TCP sequence numbers by sampling the cycle counter or something. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fix for SMP deadlock in autofs4
This is a fix for a potential deadlock in autofs4's expire routine. It tries to use dput() while holding the dcache_lock. This isn't a problem in principle since dput() should only try to take the dcache_lock when the counter makes a transition to zero, which can't happen in this case. Unfortunately the generic (and only) implementation of atomic_dec_and_lock always takes the lock, so deadlocks. Obviously, this only effects SMP. UP's wise avoidance of spinlocks saves it once again. The simple solution is simply to replace dput() with atomic_dec(). The count can't reach zero because we did a dget_locked() and held dcache_lock the whole time, so we never need to worry about the rest of the dput() logic. --- ../2.4/fs/autofs4/expire.c Wed Jan 31 00:20:50 2001 +++ fs/autofs4/expire.c Fri Apr 20 01:29:53 2001 @@ -223,7 +223,8 @@ mntput(p); return dentry; } - dput(d); + + atomic_dec(&d->d_count); /* dput(), but we'll never hit zero */ mntput(p); } spin_unlock(&dcache_lock); J PGP signature
Re: Fix for SMP deadlock in autofs4
On Fri, Apr 20, 2001 at 05:00:04AM -0400, Alexander Viro wrote: > Frankly, I'd rather add dput_locked() in dcache.c. The bug is real and > since autofs4 is not the only place like that... I'll look into that > stuff. Sounds fine. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fix for SMP deadlock in autofs4
On Fri, Apr 20, 2001 at 10:59:43PM -0700, Linus Torvalds wrote: > It's untested, but looks fairly obvious. It removes the increment, and > changes autofs4_expire() to properly bump the count of the returned dentry > (and callers will dput() it when done). This may be unnecessarily careful, > but it's the RightThing(tm) to do. I suppose so. It is pretty paranoid, because of autofs4's extra reference it can't (shouldn't) ever drop to zero until the filesystem allows it to drop to zero. In other words, if it helps, it's hiding another bug. But you're right, if this were a general routine, it should definitely return with an elevated count. > Jeremy, would you mind verifying that this WorksForYou(tm)? Looks fine to me. I'll give it a spin. J PGP signature
Re: Fix for SMP deadlock in autofs4
On Sat, Apr 21, 2001 at 02:21:38AM -0400, Alexander Viro wrote: > Looks sane for me. However, I would add check for dentry being hashed and > would skip the unhashed ones. Otherwise you can get a directory that > had been removed but is still busy - doesn't look like a right thing to > do. Jeremy? It wouldn't hurt. It can't happen in practice since unlink/rmdir happen in very controlled ways (only the automount daemon is allowed to perform those ops, so it will keep them in sync). J PGP signature
Re: Fix for SMP deadlock in autofs4
On Fri, Apr 20, 2001 at 03:53:45PM -0400, Alexander Viro wrote: > > Why are we doing the mntget/dget at all? We hold the spinlock, so we know > > they are not going away. Not doing the mntget/dget means that we (a) run > > faster and (b) don't have the bug, because we don't need to put the damn > > things. > > > > Comments? > > It looks like you are right, but I wonder how the hell did that code > happen at all. Looks like somewhere around 2.4.0-test10-pre* dcache_lock > was moved out of is_tree_busy() and covered dget/dput. Hmm... Might be > my fault - I don't remember doing that, but... I did it. I couldn't see a point in continiously taking and releasing the dcache lock, since it just increased complexity and expire is not a performance-critical path (ie, it happens rarely). I kept the dget/put out caution and ignorance, but they're clearly problematic. I'm happy to drop them if holding dcache_lock is enough to keep the tree stable while I traverse it. > Removing that will require an obvious change in is_tree_busy() (shift > count by 1). However, the real question is WTF are we trying to > get in autofs4_expire() - it returns dentry without grabbing a > reference to it. The only thing that saves us is that we have a > ramfs-style situation (dentries are pinned until we rmdir) and > everything up to the point where we silently forget about dentry > is covered by BKL. Since ->rmdir() is under BKL too it's enough, > but... Eww... The dentry it returns is always an autofs4 dentry, and autofs4 always keeps a refcount on its dentries like ramfs (because like ramfs, autofs4 exists only in the dcache). > Jeremy, what are you really trying to do there? is_tree_busy() > seems to be written in assumption that mnt/dentry is not a > mountpoint but root of a subtree with something mounted on its > leaves. And autofs4_expire() traverses the list of root's > subdirectories, picks one that has nothing busy mounted in > _its_ subdirectories and essentially pass the name to caller. > Which sends that name (of first-level subdirectory) to > userland. Exactly right. > Is that what you really want there? It looks very odd - why don't we pass > the names of actual mountpoints? What's wrong with the case when foo/bar > is busy, but foo/baz is not? Say for example you have an autofs4 filesystem mounted on /net. When you do a "cd /net/host", all of host's exported NFS filesystems are mounted on the directory /net/host; obviously the mountpoint /net/host is an autofs4 directory. autofs4_expire traverses the directories in its root and finds the ones which are currently unused and have been idle for some time. Since all the filesystems mounted under /net/host are part of the same logical tree, it examines them as a single unit so they can be umounted as a single unit. Note that /net/host may not itself be a mountpoint. If host doesn't export / but only, say, /home and /usr/local, then there'll be a tree of skeleton directories in the autofs4 filesystem to create the paths up to the mountpoints (so there'll be /net/host/{home,usr/local}). But because everything under /net/host is treated as a single unit, it's only correct for autofs4_expire to return /net/host, not /net/host/home or /net/host/usr/local. The simplifying assumption I make is that there's a single root directory with a number of sub-directories; each subdirectory is treated as a single unit. They general case would be to mark some directories as being the root of an atomic set, and other directories simply being structural, but the need has never come up (and it can be worked around by having nested autofs filesystems). J PGP signature
Re: IDE disk slow? There's help...
On Fri, Oct 20, 2000 at 03:16:14PM -0400, safemode wrote: > That's what i was thinking, but 30MB/s seems to be quite an exaggeration. I reliably get 30MB/s with my IBM 30G 7200rpm ATA66 drive, using a Via VT82C586 controller. 2.4.0-test9. Modern drives are really fast. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Update to autofs4 for new(-ish) VFS stuff
Ever since the addition of struct vfs_mount, autofs4 has got the "is this filesystem busy" test wrong. This patch against 2.4.0-test9 makes it smarter. J --- linux.orig/CREDITS Tue Oct 3 17:30:15 2000 +++ linux/CREDITS Tue Oct 3 17:56:04 2000 @@ -795,13 +795,16 @@ S: Germany N: Jeremy Fitzhardinge -E: [EMAIL PROTECTED] +E: [EMAIL PROTECTED] +W: http://www.goop.org/~jeremy +D: author of userfs filesystem D: Improved mmap and munmap handling D: General mm minor tidyups -S: 67 Surrey St. -S: Darlinghurst, Sydney -S: New South Wales 2010 -S: Australia +D: autofs v4 filesystem rework +S: 987 Alabama St +S: San Francisco +S: SA, 94110 +S: USA N: Ralf Flaxa E: [EMAIL PROTECTED] diff -x *.o -x *~ -x *.flags -x .depend -x .hdepend -u 2.3/fs/autofs4/expire.c local-2.3/fs/autofs4/expire.c --- linux.orig/fs/autofs4/expire.c Wed Sep 6 18:02:29 2000 +++ linux/fs/autofs4/expire.c Sat Oct 21 19:07:24 2000 @@ -3,7 +3,7 @@ * linux/fs/autofs/expire.c * * Copyright 1997-1998 Transmeta Corporation -- All Rights Reserved - * Copyright 1999 Jeremy Fitzhardinge <[EMAIL PROTECTED]> + * Copyright 1999-2000 Jeremy Fitzhardinge <[EMAIL PROTECTED]> * * This file is part of the Linux kernel and is made available under * the terms of the GNU General Public License, version 2, or at your @@ -15,46 +15,139 @@ /* * Determine if a subtree of the namespace is busy. + * + * mnt is the mount tree under the autofs mountpoint */ -static int is_tree_busy(struct vfsmount *mnt) +static inline int is_vfsmnt_tree_busy(struct vfsmount *mnt) { struct vfsmount *this_parent = mnt; struct list_head *next; int count; - spin_lock(&dcache_lock); - count = atomic_read(&mnt->mnt_count) - 2; - if (!is_autofs4_dentry(mnt->mnt_mountpoint)) - count--; + count = atomic_read(&mnt->mnt_count) - 1; + repeat: next = this_parent->mnt_mounts.next; + DPRINTK(("is_vfsmnt_tree_busy: mnt=%p, this_parent=%p, next=%p\n", +mnt, this_parent, next)); resume: - while (next != &this_parent->mnt_mounts) { - struct list_head *tmp = next; - struct vfsmount *p = list_entry(tmp, struct vfsmount, + for( ; next != &this_parent->mnt_mounts; next = next->next) { + struct vfsmount *p = list_entry(next, struct vfsmount, mnt_child); - next = tmp->next; - /* Decrement count for unused children */ - count += atomic_read(&p->mnt_count) - 2; + + /* -1 for struct vfs_mount's normal count, + -1 to compensate for child's reference to parent */ + count += atomic_read(&p->mnt_count) - 1 - 1; + + DPRINTK(("is_vfsmnt_tree_busy: p=%p, count now %d\n", +p, count)); + if (!list_empty(&p->mnt_mounts)) { this_parent = p; goto repeat; } /* root is busy if any leaf is busy */ - if (atomic_read(&p->mnt_count) > 1) { - spin_unlock(&dcache_lock); + if (atomic_read(&p->mnt_count) > 1) return 1; - } } - /* -* All done at this level ... ascend and resume the search. -*/ + + /* All done at this level ... ascend and resume the search. */ if (this_parent != mnt) { next = this_parent->mnt_child.next; this_parent = this_parent->mnt_parent; goto resume; } - spin_unlock(&dcache_lock); + + DPRINTK(("is_vfsmnt_tree_busy: count=%d\n", count)); + return count != 0; /* remaining users? */ +} + +/* Traverse a dentry's list of vfsmounts and return the number of + non-busy mounts */ +static int check_vfsmnt(struct vfsmount *mnt, struct dentry *dentry) +{ + int ret = 0; + struct list_head *tmp; + + list_for_each(tmp, &dentry->d_vfsmnt) { + struct vfsmount *vfs = list_entry(tmp, struct vfsmount, + mnt_clash); + DPRINTK(("check_vfsmnt: mnt=%p, dentry=%p, tmp=%p, vfs=%p\n", +mnt, dentry, tmp, vfs)); + if (vfs->mnt_parent != mnt || /* don't care about busy-ness of other +namespaces */ + !is_vfsmnt_tree_busy(vfs)) + ret++; + } + + DPRINTK(("check_vfsmnt: ret=%d\n", ret)); + return ret; +} + +/* Check dentry tree for busyness. If a dentry appears to be busy + because it is a mountpoint, check to see if the mounted + filesystem is busy. */ +static int is_tree_busy(stru
Re: IDE disk slow? There's help...
On Fri, Oct 20, 2000 at 01:22:59PM -0700, Andre Hedrick wrote: > On Fri, 20 Oct 2000 [EMAIL PROTECTED] wrote: > > > [EMAIL PROTECTED] wrote.. > > > > > I reliably get 30MB/s with my IBM 30G 7200rpm ATA66 drive, using a > > > Via VT82C586 controller. 2.4.0-test9. Modern drives are really fast. > > > > Hmm, I'm confused here. > > VIA 586 can only do up to UDMA 2, which should return speeds less than > > that. My system has an identical configuration, and I get ~12MB/s > > No the are the pci device ide but different guts. This is the ugliness > that most never see. I think you left some words out. Are you saying that this is one of those chips which change PCI id in order to give the appearance of backwards compatability? That it's not really a VT82C586? Thanks, J PGP signature
Re: Request for info on proc system update frequency
On Wed, Oct 18, 2000 at 04:48:48PM +0100, Stephen Tweedie wrote: > On Tue, Oct 17, 2000 at 12:31:24AM -0400, John Kacur wrote: > > I'm trying to understand how the proc file system works. In particular > > I'd like to know more about the algorithm by which the information is > > updated and how frequently. > > It is "live": the file contents are generated on demand when you read > them. A very few proc files include time-averaged data (such as the > load average); everything else is absolutely uptodate. ...at the instant you read it. It may be out of date a nanosecond later. [Yes, a nit-pick, but worth making clear to the original poster.] J PGP signature
Re: The zen of kernel virtual addresses
On Sat, Oct 21, 2000 at 01:37:26PM -0600, Jonathan Corbet wrote: > physical address > An address as known by the low-level hardware. In the modern > world, these can be 64-bit quantities, even on 32-bit systems. > These are the addresses used by /dev/mem - which appears to work > only for low memory. A phyical address is the address the CPU uses to talk to memory. It is not necessarily the same kind of address a device uses to see memory: they use bus addresses. The simple case is bus address == physical address, but there are many variations. Systems with an IOMMU (or equiv) present devices with a completely different view of system memory. J PGP signature
[PATCH] address-space identification for /proc
Hi, /proc has no way to indicate whether tasks share an address space. This one-liner patch adds a new ASID: field to /proc//status so there's some way to see address-space sharing between tasks. While this is hardly a bug-fix, it is a pretty useful thing to know which is otherwise completely absent. J --- ../2.3/fs/proc/array.c Mon Oct 9 17:03:53 2000 +++ linux/fs/proc/array.c Thu Oct 26 15:20:52 2000 @@ -294,6 +294,7 @@ for(line=0;(len=sprintf_regs(line,buffer,task,NULL,NULL))!=0;line++) buffer+=len; #endif + buffer += sprintf("ASID: %p\n", mm); return buffer - orig; } PGP signature
Re: [PATCH] address-space identification for /proc
On Thu, Oct 26, 2000 at 03:45:27PM -0700, I wrote: > + buffer += sprintf("ASID: %p\n", mm); Obviously, this should be: + buffer += sprintf("ASID:\t%p\n", mm); for consistency. J PGP signature
Re: [PATCH] address-space identification for /proc
On Thu, Oct 26, 2000 at 07:01:26PM -0400, Johannes Erdfelt wrote: > and even more obvious: > > + buffer += sprintf(buffer, "ASID:\t%p\n", mm); > > Actually putting it into the buffer would be useful as well :) That serves me right for hand-editing patches. J -- Repeat to self: I am not Linus PGP signature
Re: Linux-2.4.0-test10
On Tue, Oct 31, 2000 at 08:55:13PM +, Alan Cox wrote: > Does autofs4 work yet Autofs4 was fixed in 2.4.0-test10-pre6 or so. Autofs4 for 2.2.x has been working for some time, though I just updated the 2.2 patch so it doesn't stomp on autofs (v3). J PGP signature
Re: Status of ReiserFS + Journalling
On Thu, Oct 05, 2000 at 11:33:30AM +0200, Helge Hafting wrote: > A power failure might leave you with a corrupt disk block. That is > detectable (read failure) and you may then reconstruct it using the > rest of the stripe. This will get you data from either before > or after the update was supposed to happen. How would you be able to tell which disk contains the bad stripe? RAID reconstruction relies on knowing which disk to reconstruct because it's obviously bad - there's out of band information in the form of I/O errors. If you only have an incompletely updated stripe on a disk, you don't know which data to reconstruct from parity. I think the only way of doing this properly is to either have battery-backed cache, or by having journalling at the RAID level. J PGP signature
User-mode linux stack overflow: could be generic problem
Hi, I've been playing with user-mode linux (2.4.0-pre9). It works well on one machine, but on my laptop I'm consistently getting stack overflows just as init is started. The backtrace (from a breakpoint at panic()): (gdb) bt #0 panic (fmt=0x10112e00 "Stack overflowed onto current_task page") at panic.c:54 #1 0x100a244d in check_stack_overflow (ptr=0x5015ccc8) at process_kern.c:715 #2 0x1009ddc9 in set_signals (enable=0) at signal_user.c:50 #3 0x100050b0 in __wake_up (q=0x5014afc8, mode=35) at sched.c:714 #4 0x10020ea5 in end_buffer_io_sync (bh=0x5014af80, uptodate=1) at /home/jeremy/uml/2.3/include/linux/locks.h:34 #5 0x100607a4 in end_that_request_first (req=0x500e8f00, uptodate=1, name=0x1011390d "User-mode block device") at ll_rw_blk.c:1000 #6 0x100a3d48 in ubd_finish () at /home/jeremy/uml/2.3/include/linux/blk.h:396 #7 0x100a3dd5 in ubd_handler () at ubd.c:222 #8 0x100a3e00 in ubd_intr (irq=3, dev=0x1012c0a0, unused=0x5015cd88) at ubd.c:229 #9 0x1009c6bf in handle_IRQ_event (irq=3, regs=0x5015cd88, action=0x500573c0) at irq.c:148 #10 0x1009c85f in do_IRQ (irq=3, user_mode=0) at irq.c:313 #11 0x1009cf8d in sigio_handler (sig=29) at irq_user.c:53 #12 0x100a7318 in __restore () at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127 #13 0x1009de50 in set_signals (enable=3) at signal_user.c:65 #14 0x1005fb46 in generic_unplug_device (data=0x10160650) at ll_rw_blk.c:364 #15 0x100204ab in __wait_on_buffer (bh=0x5014af80) at /home/jeremy/uml/2.3/include/linux/tqueue.h:120 #16 0x100212be in bread (dev=25088, block=92508, size=1024) at /home/jeremy/uml/2.3/include/linux/locks.h:20 #17 0x10041a40 in ext2_get_block (inode=0x5013c0a0, iblock=288, bh_result=0x5014ae00, create=0) at inode.c:250 #18 0x10021edd in block_read_full_page (page=0x50008b74, get_block=0x10041978 ) at buffer.c:1613 #19 0x10042014 in ext2_readpage (file=0x500de1e0, page=0x50008b74) at inode.c:659 #20 0x10013eb1 in read_cluster_nonblocking (file=0x500de1e0, offset=77, filesize=78) at filemap.c:440 #21 0x1001525c in filemap_nopage (area=0x500d2c60, address=134832128, no_share=2) at filemap.c:1391 #22 0x1001209d in do_no_page (mm=0x500d41c0, vma=0x500d2c60, address=134832392, write_access=2, page_table=0x50159258) at memory.c:1150 #23 0x100121d4 in handle_mm_fault (mm=0x500d41c0, vma=0x500d2c60, address=134832392, write_access=2) at memory.c:1207 #24 0x100a01b3 in segv (address=134832392, ip=268665530, is_write=2, is_user=0) at trap_kern.c:89 #25 0x100a0902 in segv_handler (sig=11) at trap_user.c:258 #26 0x100a7318 in __restore () at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127 #27 0x100397d4 in load_elf_binary (bprm=0x5015db24, regs=0x0) at binfmt_elf.c:714 #28 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0) at exec.c:809 #29 0x10038226 in load_script (bprm=0x5015db24, regs=0x0) at binfmt_script.c:92 #30 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0) at exec.c:809 #31 0x100289cd in do_execve (filename=0x500df000 "/etc/rc.d/rc.sysinit", argv=0xbf7ffb14, envp=0x804f2c0, regs=0x0) at exec.c:902 #32 0x1009c3fc in execve1 (file=0x500df000 "/etc/rc.d/rc.sysinit", argv=0xbf7ffb14, env=0x804f2c0) at exec_kern.c:77 #33 0x1009c474 in sys_execve (file=0xbf7ffa88 "", argv=0xbf7ffb14, env=0x804f2c0) at exec_kern.c:101 #34 0x1009eab7 in execute_syscall (syscall=11, args=0x5015dcf8) at syscall_kern.c:340 #35 0x1009eeb8 in syscall_handler (unused=0) at syscall_user.c:113 #36 0x1009bf03 in fork_handler (sig=10) at process.c:96 #37 0x100a7318 in __restore () at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127 This is a pretty deep stack, but there's nothing unexpected there. I would guess that some kind of very fast disk drive would also cause this kind of deep stack on real hardware, if it can complete the I/O and interrupt before the reschedule. I tried adding some inlines to make the stack use a little shallower, but it didn't help. Any suggestions on how to get this working? Thanks, J PGP signature
Re: User-mode linux stack overflow: could be generic problem
On Sun, Oct 08, 2000 at 12:35:48AM -0500, Jeff Dike wrote: > I've been waiting for someone to send me that stack. There aren't any real > smoking guns there. I'm guessing that the difference between your laptop and > the machine it works on is that your laptop is running a fairly recent kernel > (2.4.0-testx) and the other isn't. Yep, that's right. > The sigcontext struct greatly increased in > size (to ~800 bytes IIRC) to accomodate the MMX registers or something. There > are three signals on your stack, so those frames by themselves are taking up > half the stack page. > > Anyway, the patch below removes 256 bytes from the set_signals frame. It > ought to alleviate things a bit. I'll be looking for other things I can do, > as well. Let me know how it works for you. I'm afraid this doesn't help. The stack still overflows at the same point. It looks like each signal frame is ~760 bytes. Even with this patch, the overflow is 808 bytes (without the patch it's 1232 bytes). J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: User-mode linux stack overflow: could be generic problem
On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote: > [EMAIL PROTECTED] said: > > Even with this patch, the overflow is 808 bytes (without the patch > > it's 1232 bytes). > > I was mulling over some other changes that would have saved another 256 bytes, > but those don't look like they would help. Try the patch below. It > essentially gives up and lets the stack occupy half of the lower page. Well, that sweeps the problem under the carpet enough to make progress... > Also, could you look at the stack pointer at each frame, to see if you are > encountering any stack hogs in the generic kernel? In a different situation, > I found devfs putting a 3K structure on the stack. OK, I'll look into it. J PGP signature
Re: User-mode linux stack overflow: could be generic problem
On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote: > Also, could you look at the stack pointer at each frame, to see if you are > encountering any stack hogs in the generic kernel? In a different situation, > I found devfs putting a 3K structure on the stack. OK, top candidates on that stack trace are: __restore():764 do_execve: 340 load_elf_binary:324 segv: 180 sigio_handler: 176 load_script:172 ext2_get_block: 160 set_signals:156 block_read_full_page: 124 Looks like do_execve should be pretty easy to shrink: most of the stack is in a local of type struct linux_binprm (308 bytes), which could be kmalloced. I guess this would have some cost in speed, so I don't suppose this could be a generic patch. Anyway, it isn't a solution in itself. load_elf_binary is harder to deal with, since it just has lots of locals, each relatively small. segv is mostly a local struct siginfo (128 bytes). sigio_handler is mostly an fd_set (128 bytes). load_script has a local buffer for remembering the interpreter (128 bytes). All up, there's about 660 bytes of stack which can be relatively easily saved by converting locals to kmalloced memory, which still isn't enough to solve the problem. I haven't looked into UML's interrupt handling, but perhaps another approach is to try and avoid recursive interrupts/exceptions and do some kind of tail-recursion optimisation in the exception/signal handler. I don't know if this would cause problems (deadlocks?). Alternatively, could you just use a bigger stack? J PGP signature
2.6.12-rc2-mm1: ieee1394 process hang
I'm having problems with 1394 in 2.6.12-rc2-mm1. When I connect my Apple iSight camera, it is not detected; repeated connections/disconnections don't help. When I tried to rmmod all the appropriate modules (rmmod video1394 raw1394 ohci1394 ieee1394), the rmmod command hung. Alt-Sysreq-t shows this: rmmod D F75593C0 0 7206 7193 (NOTLB) e43fbda0 0086 e43fbdd0 f75593c0 f78bbd20 29b325f2 04d2 0848 2ec09330 04d2 e0556560 e0556688 f792f258 e43fbdd4 e43fb000 e43fbdf4 c02ade0e e0556560 c01142b0 Call Trace: [] wait_for_completion+0x6e/0xc0 [] device_del+0x16/0x70 [] device_unregister+0xb/0x20 [] nodemgr_remove_ne+0x6d/0x90 [ieee1394] [] __nodemgr_remove_host_dev+0xb/0x10 [ieee1394] [] device_for_each_child+0x29/0x50 [] nodemgr_remove_host_dev+0x15/0x40 [ieee1394] [] __unregister_host+0x75/0xb0 [ieee1394] [] highlevel_remove_host+0x2d/0x60 [ieee1394] [] hpsb_remove_host+0x3b/0x60 [ieee1394] [] ohci1394_pci_remove+0x8b/0x250 [ohci1394] [] pci_device_remove+0x2c/0x40 [] device_release_driver+0x7c/0x80 [] __remove_driver+0x8/0x10 [] driver_for_each_device+0x43/0x70 [] driver_detach+0x16/0x18 [] bus_remove_driver+0x26/0x40 [] driver_unregister+0xe/0x20 [] pci_unregister_driver+0xe/0x20 [] sys_delete_module+0x14d/0x160 [] sysenter_past_esp+0x54/0x75 This last worked for me in 2.6.12-rc1-mm3; I didn't have a chance to test -rc1-mm4. .config attached and lspci attached. J 00:00.0 Host bridge: Intel Corp. 82855PM Processor to I/O Controller (rev 03) Subsystem: IBM: Unknown device 0529 Flags: bus master, fast devsel, latency 0 Memory at d000 (32-bit, prefetchable) [size=256M] Capabilities: [e4] Vendor Specific Information Capabilities: [a0] AGP version 2.0 00:01.0 PCI bridge: Intel Corp. 82855PM Processor to AGP Controller (rev 03) (prog-if 00 [Normal decode]) Flags: bus master, 66Mhz, fast devsel, latency 96 Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 I/O behind bridge: 3000-3fff Memory behind bridge: c010-c01f Prefetchable memory behind bridge: e000-e7ff 00:1d.0 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) Subsystem: IBM: Unknown device 052d Flags: bus master, medium devsel, latency 0, IRQ 11 I/O ports at 1800 [size=32] 00:1d.1 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) Subsystem: IBM: Unknown device 052d Flags: bus master, medium devsel, latency 0, IRQ 5 I/O ports at 1820 [size=32] 00:1d.2 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) Subsystem: IBM: Unknown device 052d Flags: bus master, medium devsel, latency 0, IRQ 9 I/O ports at 1840 [size=32] 00:1d.7 USB Controller: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI]) Subsystem: IBM: Unknown device 052e Flags: bus master, medium devsel, latency 0, IRQ 5 Memory at c000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port 00:1e.0 PCI bridge: Intel Corp. 82801 Mobile PCI Bridge (rev 81) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=08, sec-latency=64 I/O behind bridge: 4000-8fff Memory behind bridge: c020-cfff Prefetchable memory behind bridge: e800-efff 00:1f.0 ISA bridge: Intel Corp. 82801DBM (ICH4-M) LPC Interface Bridge (rev 01) Flags: bus master, medium devsel, latency 0 00:1f.1 IDE interface: Intel Corp. 82801DBM (ICH4-M) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP]) Subsystem: IBM: Unknown device 052d Flags: bus master, medium devsel, latency 0, IRQ 9 I/O ports at I/O ports at I/O ports at I/O ports at I/O ports at 1860 [size=16] Memory at 4000 (32-bit, non-prefetchable) [size=1K] 00:1f.3 SMBus: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01) Subsystem: IBM: Unknown device 052d Flags: medium devsel, IRQ 10 I/O ports at 1880 [size=32] 00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01) Subsystem: IBM: Unknown device 0534 Flags: bus master, medium devsel, latency 0, IRQ 10 I/O ports at 1c00 [size=256] I/O ports at 18c0 [size=64] Memory at cc00 (32-bit, non-prefetchable) [size=512] Memory at c800 (32-bit, non-prefetchable) [size=256] Capabilities: [50] Power Management
Re: [PATCH] symlink.c
Quoting John Martin <[EMAIL PROTECTED]>: > this patch adds a check to make sure memory was allocated, returns an > error code otherwise. autofs4_dentry_ino doesn't allocate memory; it just extracts the fsdata pointer from the dentry structure. If it's returning NULL, then there's something else wrong and you're papering over the symptoms. Are you seeing this happen? Linus, please don't apply this. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/