Re: [PATCH] GBPAGES: Fix global bit for 64bit

2008-01-31 Thread Jeremy Fitzhardinge

Andi Kleen wrote:

[Ideally apply before the patch to enable gbpages direct mapping]

The gbpages direct patch assumed that __PAGE_KERNEL contains _PAGE_GLOBAL
(I think because that was true at some point in git-x86 and i forgot
to remove it again when forward porting) 

This is currently true on 32bit, but not on 64bit which does not make 
much sense. Add it to 64bit too. Also get rid of the obsolete MAKE_GLOBAL.
  


Last time when my patch to do this was in the tree, it caused random 
things to fail, even after increasing the strength of various tlb 
flushes.  Did you change something to fix this?


   J


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Index: linux/include/asm-x86/pgtable.h
===
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -69,11 +69,13 @@
 #define _PAGE_KERNEL (_PAGE_KERNEL_EXEC | _PAGE_NX)
 
 #ifndef __ASSEMBLY__

+/* These are set up based on CPUID capabilities */
 extern pteval_t __PAGE_KERNEL, __PAGE_KERNEL_EXEC;
 #endif /* __ASSEMBLY__ */
 #else
+/* 64bit can assume more CPUID capabilities */
 #define __PAGE_KERNEL_EXEC \
-   (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED)
+   (_PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_GLOBAL)
 #define __PAGE_KERNEL  (__PAGE_KERNEL_EXEC | _PAGE_NX)
 #endif
 
@@ -86,22 +88,16 @@ extern pteval_t __PAGE_KERNEL, __PAGE_KE

 #define __PAGE_KERNEL_LARGE(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC   (__PAGE_KERNEL_EXEC | _PAGE_PSE)
 
-#ifdef CONFIG_X86_32

-# define MAKE_GLOBAL(x)__pgprot((x))
-#else
-# define MAKE_GLOBAL(x)__pgprot((x) | _PAGE_GLOBAL)
-#endif
-
-#define PAGE_KERNELMAKE_GLOBAL(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO MAKE_GLOBAL(__PAGE_KERNEL_RO)
-#define PAGE_KERNEL_EXEC   MAKE_GLOBAL(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX MAKE_GLOBAL(__PAGE_KERNEL_RX)
-#define PAGE_KERNEL_NOCACHEMAKE_GLOBAL(__PAGE_KERNEL_NOCACHE)
-#define PAGE_KERNEL_EXEC_NOCACHE   MAKE_GLOBAL(__PAGE_KERNEL_EXEC_NOCACHE)
-#define PAGE_KERNEL_LARGE  MAKE_GLOBAL(__PAGE_KERNEL_LARGE)
-#define PAGE_KERNEL_LARGE_EXEC MAKE_GLOBAL(__PAGE_KERNEL_LARGE_EXEC)
-#define PAGE_KERNEL_VSYSCALL   MAKE_GLOBAL(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VSYSCALL_NOCACHE   
MAKE_GLOBAL(__PAGE_KERNEL_VSYSCALL_NOCACHE)
+#define PAGE_KERNEL__pgprot(__PAGE_KERNEL)
+#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO)
+#define PAGE_KERNEL_EXEC   __pgprot(__PAGE_KERNEL_EXEC)
+#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX)
+#define PAGE_KERNEL_NOCACHE__pgprot(__PAGE_KERNEL_NOCACHE)
+#define PAGE_KERNEL_EXEC_NOCACHE   __pgprot(__PAGE_KERNEL_EXEC_NOCACHE)
+#define PAGE_KERNEL_LARGE  __pgprot(__PAGE_KERNEL_LARGE)
+#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC)
+#define PAGE_KERNEL_VSYSCALL   __pgprot(__PAGE_KERNEL_VSYSCALL)
+#define PAGE_KERNEL_VSYSCALL_NOCACHE   __pgprot(__PAGE_KERNEL_VSYSCALL_NOCACHE)
 
 /* xwr */

 #define __P000 PAGE_NONE
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc compile error caused by x86 arch updates

2008-01-31 Thread Jeremy Fitzhardinge

Adrian Bunk wrote:

On Thu, Jan 31, 2008 at 05:15:23PM +0100, Ingo Molnar wrote:
  

* Adrian Bunk <[EMAIL PROTECTED]> wrote:



On Thu, Jan 31, 2008 at 05:00:33PM +0100, Ingo Molnar wrote:
  

* Adrian Bunk <[EMAIL PROTECTED]> wrote:


You tested x86 but broke more than half a dozen other archtectures, 
with at least 3 different commits breaking other architectures.
  
Note that all known breakages are fixed in current -git, except for the 
s390 problem that Martin/Nick posted the fix.

What about the breakages caused by commit 
a5a19c63f4e55e32dc0bc3d936d7f94793d8b380 (this commit broke the 
defconfig compilation on at least avr32, blackfin, sh, sparc and uml)?
  

the patch below fixes that.



The sparc breakage (might not have been reported until now and I 
bisected it just a few minutes ago) is caused by the following part of 
commit a5a19c63f4e55e32dc0bc3d936d7f94793d8b380:


--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 


  


Drat.  I added that because of

/* only sparc can not include linux/pagemap.h in this file
* so leave page_cache_release and release_pages undeclared... */
#define free_page_and_swap_cache(page) \
page_cache_release(page)
#define free_pages_and_swap_cache(pages, nr) \
release_pages((pages), (nr), 0);


But I guess I overlooked the comment...

I guess the fix is to scatter linux/pagemap.h into the appropriate 
places where these macros are used (asm-generic/tlb.h, for a start).


   J

The compile error with the sparc defconfig is:

<--  snip  -->

...
  CC  init/main.o
In file included from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/highmem.h:24,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:10,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swap.h:9,
 from include2/asm/pgtable.h:15,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/mm.h:39,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-generic/dma-mapping.h:17,
 from include2/asm/dma-mapping.h:6,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/dma-mapping.h:52,
 from include2/asm/sbus.h:10,
 from include2/asm/dma.h:13,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/bootmem.h:8,
 from /home/bunk/linux/kernel-2.6/git/linux-2.6/init/main.c:26:
include2/asm/highmem.h: In function 'kmap':
include2/asm/highmem.h:60: error: implicit declaration of function 'PageHighMem'
include2/asm/highmem.h:61: error: implicit declaration of function 
'page_address'
include2/asm/highmem.h:61: warning: return makes pointer from integer without a 
cast
In file included from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/swap.h:9,
 from include2/asm/pgtable.h:15,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/mm.h:39,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-generic/dma-mapping.h:17,
 from include2/asm/dma-mapping.h:6,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/dma-mapping.h:52,
 from include2/asm/sbus.h:10,
 from include2/asm/dma.h:13,
 from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/bootmem.h:8,
 from /home/bunk/linux/kernel-2.6/git/linux-2.6/init/main.c:26:
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h: In function 
'lock_page':
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:169: error: 
implicit declaration of function 'TestSetPageLocked'
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h: In function 
'wait_on_page_locked':
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:199: error: 
implicit declaration of function 'PageLocked'
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:200: error: 
'PG_locked' undeclared (first use in this function)
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:200: error: 
(Each undeclared identifier is reported only once
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:200: error: 
for each function it appears in.)
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h: In function 
'wait_on_page_writeback':
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:208: error: 
implicit declaration of function 'PageWriteback'
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/pagemap.h:209: error: 
'PG_writeback' undeclared (first use in this function)
In file included from 
/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-generic/dma-mapping.h:17,
 from include2/asm/dma-mapping.h:6,
 

Re: [git pull] x86 arch updates for v2.6.25

2008-01-31 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Adrian Bunk <[EMAIL PROTECTED]> wrote:

  
What about the breakages caused by commit 
a5a19c63f4e55e32dc0bc3d936d7f94793d8b380 (this commit broke the 
defconfig compilation on at least avr32, blackfin, sh, sparc and uml)?


the patch below fixes that.
  

Is it safe, or why did Jeremy state in the commit
"I removed this include to avoid an include cycle"?



that is an x86.git complication alone, and only affects 32-bit PAE: it 
is solved by the uninlining patch (that i've queued up to before the 
asm-generic/tlb.h revert/fix).


Ingo

->
Subject: x86: uninline __pte_free_tlb() and __pmd_free_tlb()
From: Ingo Molnar <[EMAIL PROTECTED]>

this also removes an include file dependency.
  


Yes, that simplifies things a lot.

   J

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/mm/pgtable_32.c |   19 +++
 include/asm-x86/pgalloc_32.h |   19 ++-
 2 files changed, 21 insertions(+), 17 deletions(-)

Index: linux/arch/x86/mm/pgtable_32.c
===
--- linux.orig/arch/x86/mm/pgtable_32.c
+++ linux/arch/x86/mm/pgtable_32.c
@@ -376,3 +376,22 @@ void check_pgt_cache(void)
 {
quicklist_trim(0, pgd_dtor, 25, 16);
 }
+
+void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
+{
+   paravirt_release_pt(page_to_pfn(pte));
+   tlb_remove_page(tlb, pte);
+}
+
+void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
+{
+   /* This is called just after the pmd has been detached from
+  the pgd, which requires a full tlb flush to be recognized
+  by the CPU.  Rather than incurring multiple tlb flushes
+  while the address space is being pulled down, make the tlb
+  gathering machinery do a full flush when we're done. */
+   tlb->fullmm = 1;
+
+   paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
+   tlb_remove_page(tlb, virt_to_page(pmd));
+}
Index: linux/include/asm-x86/pgalloc_32.h
===
--- linux.orig/include/asm-x86/pgalloc_32.h
+++ linux/include/asm-x86/pgalloc_32.h
@@ -51,11 +51,7 @@ static inline void pte_free(struct page 
 }
 
 
-static inline void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte)

-{
-   paravirt_release_pt(page_to_pfn(pte));
-   tlb_remove_page(tlb, pte);
-}
+extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte);
 
 #ifdef CONFIG_X86_PAE

 /*
@@ -72,18 +68,7 @@ static inline void pmd_free(pmd_t *pmd)
free_page((unsigned long)pmd);
 }
 
-static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)

-{
-   /* This is called just after the pmd has been detached from
-  the pgd, which requires a full tlb flush to be recognized
-  by the CPU.  Rather than incurring multiple tlb flushes
-  while the address space is being pulled down, make the tlb
-  gathering machinery do a full flush when we're done. */
-   tlb->fullmm = 1;
-
-   paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
-   tlb_remove_page(tlb, virt_to_page(pmd));
-}
+extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd);
 
 static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)

 {
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86.git: wants to build as ia64

2008-01-31 Thread Jeremy Fitzhardinge
Something in current x86.git is making my kernel build as ia64.  I don't 
see any obvious change which might be the cause.


: ezr:pts/6; make oldconfig
make -C /home/jeremy/hg/xen/paravirt/linux 
O=/home/jeremy/hg/xen/paravirt/linux-i386/. oldconfig
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 16: 
ia64-linux-gcc: command not found
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 17: 
ia64-linux-objdump: command not found
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 19: 
[: !=: unary operator expected
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 30: 
ia64-linux-gcc: command not found
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/toolchain-flags: line 31: 
ia64-linux-readelf: command not found
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/check-gas: line 7: 
ia64-linux-gcc: command not found
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/check-gas: line 8: 
ia64-linux-objdump: command not found
/home/jeremy/hg/xen/paravirt/linux/arch/ia64/scripts/check-gas: line 10: [: !=: 
unary operator expected
/home/jeremy/hg/xen/paravirt/linux/scripts/gcc-version.sh: line 25: 
ia64-linux-gcc: command not found
/home/jeremy/hg/xen/paravirt/linux/scripts/gcc-version.sh: line 26: 
ia64-linux-gcc: command not found
 GEN /home/jeremy/hg/xen/paravirt/linux-i386/Makefile
scripts/kconfig/conf -o arch/ia64/Kconfig
drivers/net/Kconfig:1743:warning: 'select' used by config symbol 'CPMAC' refers 
to undefined symbol 'FIXED_MII_100_FDX'
drivers/spi/Kconfig:156:warning: 'select' used by config symbol 'SPI_PXA2XX' 
refers to undefined symbol 'PXA_SSP'
.config:12:warning: trying to assign nonexistent symbol GENERIC_CMOS_UPDATE
.config:13:warning: trying to assign nonexistent symbol CLOCKSOURCE_WATCHDOG
.config:14:warning: trying to assign nonexistent symbol GENERIC_CLOCKEVENTS
.config:15:warning: trying to assign nonexistent symbol 
GENERIC_CLOCKEVENTS_BROADCAST
.config:18:warning: trying to assign nonexistent symbol SEMAPHORE_SLEEPERS
.config:22:warning: trying to assign nonexistent symbol GENERIC_ISA_DMA
.config:25:warning: trying to assign nonexistent symbol GENERIC_HWEIGHT
.config:29:warning: trying to assign nonexistent symbol RWSEM_GENERIC_SPINLOCK
.config:37:warning: trying to assign nonexistent symbol ZONE_DMA32
.config:43:warning: trying to assign nonexistent symbol X86_SMP
.config:44:warning: trying to assign nonexistent symbol X86_32_SMP
...

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc compile error caused by x86 arch updates

2008-01-31 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

  

But I guess I overlooked the comment...

I guess the fix is to scatter linux/pagemap.h into the appropriate 
places where these macros are used (asm-generic/tlb.h, for a start).



no. The fix is always to undo the damage ASAP, to keep the window of 
breakage minimized.
  


Yes, sorry about that.  Uninlining the asm-x86/pgalloc.h functions is 
the right thing to do anyway.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04 of 11] x86: fix early_ioremap pagetable ops

2008-01-31 Thread Jeremy Fitzhardinge

Ian Campbell wrote:


This seems to have ended up in f6df72e71eba621b2f5c49b3a763116fac748f6e
as:
+   paravirt_release_pt(__pa(pmd) >> PAGE_SHIFT);

and the pmd_populate_kernel hunk is missing altogether.

---
>From bfa2a08064a269dd7906ed5f60e436360e1360e7 Mon Sep 17 00:00:00 2001
From: Ian Campbell <[EMAIL PROTECTED]>
Date: Thu, 31 Jan 2008 18:56:06 +
Subject: [PATCH] x86: fix early_ioremap pagetable ops for paravirt.

Some important parts of f6df72e71eba621b2f5c49b3a763116fac748f6e got dropped
along the way, reintroduce them.
  


Yep.

Acked-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>


Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
---
 arch/x86/mm/ioremap.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ed4208e..93d931e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -302,7 +302,7 @@ void __init early_ioremap_init(void)
 
 	pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));

memset(bm_pte, 0, sizeof(bm_pte));
-   set_pmd(pmd, __pmd(__pa(bm_pte) | _PAGE_TABLE));
+   pmd_populate_kernel(&init_mm, pmd, bm_pte);
 
 	/*

 * The boot-ioremap range spans multiple pmds, for which
@@ -332,7 +332,7 @@ void __init early_ioremap_clear(void)
 
 	pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));

pmd_clear(pmd);
-   paravirt_release_pt(__pa(pmd) >> PAGE_SHIFT);
+   paravirt_release_pt(__pa(bm_pte) >> PAGE_SHIFT);
__flush_tlb_all();
 }
 
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04 of 11] x86: fix early_ioremap pagetable ops

2008-01-31 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Ian Campbell <[EMAIL PROTECTED]> wrote:

  
Some important parts of f6df72e71eba621b2f5c49b3a763116fac748f6e got 
dropped along the way, reintroduce them.



thanks, applied. AFAICS it should only affect paravirt, not the native 
kernel, right?
  


Correct.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86: PAE swapper_pg_dir needs to be page-sized

2008-01-31 Thread Jeremy Fitzhardinge

Xen currently needs swapper_pg_dir page aligned and sized.  This fixes
the second part of that...

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
arch/x86/kernel/head_32.S |1 +
1 file changed, 1 insertion(+)

===
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -640,6 +640,7 @@
# else
#  error "Kernel PMDs should be 1, 2 or 3"
# endif
+   .align PAGE_SIZE_asm/* needs to be page-sized too */
#endif

.data


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: PAE swapper_pg_dir needs to be page-sized

2008-02-01 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

thanks, applied. I'm wondering, where did we break that?
  


In the "PAE from boot" patch, I would guess.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0 of 4] x86: cleanups from pmd lifetime series

2008-02-01 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:
FYI, only this one applied to the latest x86.git tree, could you please 
resend? I guess the pgalloc.h related revert interfered.
  


OK.  I'll do a quick rebase to this morning's tree and resend.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0 of 4] x86: cleanups from pmd lifetime series

2008-02-01 Thread Jeremy Fitzhardinge
Hi Ingo,

Here's a followup set from that last batch of patches:
 1. fix up the pgd_ctor merge, so that non-PAE will end up getting
kernel mappings
 2. revert "optimise-pud_clear-cr3-reload"
 3. only do a cr3 reload if pud_clear is being used on the active pagetable
 4. update documentation about PAE tlb flushing.

The third of these makes pud_clear more robust, since it doesn't rely
on it being followed by the right kind of TLB flush.  In practice it
shouldn't make any performance difference, since the only performance
critical paths pud_clear is used on are exit and execve, and they both
operate on some other pagetable at the time the old pagetable is being
pulled down.

It will generate TLB flushes in the case of a usermode process
munmapping a 1+G chunk of its address space, or something to do with
unsharing a hugetlbfs mapping.  I don't think either of these are
performance critical.

Thanks,
J


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary

2008-02-01 Thread Jeremy Fitzhardinge
Rather than unconditionally reloading cr3, only do so if the pud we're
updating is within the active pgd.

This eliminates TLB flushes most of the time.  The
performance-critical uses of pud_clear are during execve and exit, but
in those cases cr3 is referring to some other pagetable.  The only
other use of pud_clear is during a large (1Gbyte+) munmap, and those
are sufficiently rare that a couple of cr3 reloads won't hurt.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgtable-3level.h |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -93,17 +93,20 @@
 
 static inline void pud_clear(pud_t *pudp)
 {
+   unsigned long pgd;
+
set_pud(pudp, __pud(0));
 
/*
 * Pentium-II erratum A13: in PAE mode we explicitly have to flush
 * the TLB via cr3 if the top-level pgd is changed...
 *
-* XXX I don't think we need to worry about this here, since
-* when clearing the pud, the calling code needs to flush the
-* tlb anyway.  But do it now for safety's sake. - jsgf
+* Make sure the pud entry we're updating is within the
+* current pgd to avoid unnecessary TLB flushes.
 */
-   write_cr3(read_cr3());
+   pgd = read_cr3();
+   if (__pa(pudp) >= pgd && __pa(pudp) < (pgd + 
sizeof(pgd_t)*PTRS_PER_PGD))
+   write_cr3(pgd);
 }
 
 #define pud_page(pud) \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4 of 4] x86: update reference for PAE tlb flushing

2008-02-01 Thread Jeremy Fitzhardinge
Remove bogus reference to "Pentium-II erratum A13" and point to the
actual canonical source of information about what requirements x86
processors have for PAE pagetable updates.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgalloc_32.h |6 --
 include/asm-x86/pgtable-3level.h |6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
--- a/include/asm-x86/pgalloc_32.h
+++ b/include/asm-x86/pgalloc_32.h
@@ -80,8 +80,10 @@
set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT));
 
/*
-* Pentium-II erratum A13: in PAE mode we explicitly have to flush
-* the TLB via cr3 if the top-level pgd is changed...
+* According to Intel App note "TLBs, Paging-Structure Caches,
+* and Their Invalidation", April 2007, document 317080-001,
+* section 8.1: in PAE mode we explicitly have to flush the
+* TLB via cr3 if the top-level pgd is changed...
 */
if (mm == current->active_mm)
write_cr3(read_cr3());
diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -98,8 +98,10 @@
set_pud(pudp, __pud(0));
 
/*
-* Pentium-II erratum A13: in PAE mode we explicitly have to flush
-* the TLB via cr3 if the top-level pgd is changed...
+* According to Intel App note "TLBs, Paging-Structure Caches,
+* and Their Invalidation", April 2007, document 317080-001,
+* section 8.1: in PAE mode we explicitly have to flush the
+* TLB via cr3 if the top-level pgd is changed...
 *
 * Make sure the pud entry we're updating is within the
 * current pgd to avoid unnecessary TLB flushes.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor

2008-02-01 Thread Jeremy Fitzhardinge
The constructors for PAE and non-PAE pgd_ctors are more or less
identical, and can be made into the same function.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: William Irwin <[EMAIL PROTECTED]>

---
 arch/x86/mm/pgtable_32.c |   58 +-
 1 file changed, 22 insertions(+), 36 deletions(-)

diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -219,50 +219,39 @@
list_del(&page->lru);
 }
 
+#define UNSHARED_PTRS_PER_PGD  \
+   (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
 
-
-#if (PTRS_PER_PMD == 1)
-/* Non-PAE pgd constructor */
-static void pgd_ctor(void *pgd)
+static void pgd_ctor(void *p)
 {
+   pgd_t *pgd = p;
unsigned long flags;
 
-   /* !PAE, no pagetable sharing */
+   /* Clear usermode parts of PGD */
memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 
spin_lock_irqsave(&pgd_lock, flags);
 
-   /* must happen under lock */
-   clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
-   swapper_pg_dir + USER_PTRS_PER_PGD,
-   KERNEL_PGD_PTRS);
-   paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
-   __pa(swapper_pg_dir) >> PAGE_SHIFT,
-   USER_PTRS_PER_PGD,
+   /* If the pgd points to a shared pagetable level (either the
+  ptes in non-PAE, or shared PMD in PAE), then just copy the
+  references from swapper_pg_dir. */
+   if (PAGETABLE_LEVELS == 2 ||
+   (PAGETABLE_LEVELS == 3 && SHARED_KERNEL_PMD)) {
+   clone_pgd_range(pgd + USER_PTRS_PER_PGD,
+   swapper_pg_dir + USER_PTRS_PER_PGD,
KERNEL_PGD_PTRS);
-   pgd_list_add(pgd);
+   paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
+   __pa(swapper_pg_dir) >> PAGE_SHIFT,
+   USER_PTRS_PER_PGD,
+   KERNEL_PGD_PTRS);
+   }
+
+   /* list required to sync kernel mapping updates */
+   if (!SHARED_KERNEL_PMD)
+   pgd_list_add(pgd);
+
spin_unlock_irqrestore(&pgd_lock, flags);
 }
-#else  /* PTRS_PER_PMD > 1 */
-/* PAE pgd constructor */
-static void pgd_ctor(void *pgd)
-{
-   /* PAE, kernel PMD may be shared */
-
-   if (SHARED_KERNEL_PMD) {
-   clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
-   swapper_pg_dir + USER_PTRS_PER_PGD,
-   KERNEL_PGD_PTRS);
-   } else {
-   unsigned long flags;
-
-   memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
-   spin_lock_irqsave(&pgd_lock, flags);
-   pgd_list_add(pgd);
-   spin_unlock_irqrestore(&pgd_lock, flags);
-   }
-}
-#endif /* PTRS_PER_PMD */
 
 static void pgd_dtor(void *pgd)
 {
@@ -275,9 +264,6 @@
pgd_list_del(pgd);
spin_unlock_irqrestore(&pgd_lock, flags);
 }
-
-#define UNSHARED_PTRS_PER_PGD  \
-   (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
 
 #ifdef CONFIG_X86_PAE
 /*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2 of 4] x86: revert "defer cr3 reload when doing pud_clear()"

2008-02-01 Thread Jeremy Fitzhardinge
Revert "defer cr3 reload when doing pud_clear()" since I'm going to
replace it.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/pgtable_32.c |7 ---
 include/asm-x86/pgtable-3level.h |   21 ++---
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -373,13 +373,6 @@
 
 void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
-   /* This is called just after the pmd has been detached from
-  the pgd, which requires a full tlb flush to be recognized
-  by the CPU.  Rather than incurring multiple tlb flushes
-  while the address space is being pulled down, make the tlb
-  gathering machinery do a full flush when we're done. */
-   tlb->fullmm = 1;
-
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
tlb_remove_page(tlb, virt_to_page(pmd));
 }
diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -96,23 +96,14 @@
set_pud(pudp, __pud(0));
 
/*
-* In principle we need to do a cr3 reload here to make sure
-* the processor recognizes the changed pgd.  In practice, all
-* the places where pud_clear() gets called are followed by
-* full tlb flushes anyway, so we can defer the cost here.
+* Pentium-II erratum A13: in PAE mode we explicitly have to flush
+* the TLB via cr3 if the top-level pgd is changed...
 *
-* Specifically:
-*
-* mm/memory.c:free_pmd_range() - immediately after the
-* pud_clear() it does a pmd_free_tlb().  We change the
-* mmu_gather structure to do a full tlb flush (which has the
-* effect of reloading cr3) when the pagetable free is
-* complete.
-*
-* arch/x86/mm/hugetlbpage.c:huge_pmd_unshare() - the call to
-* this is followed by a flush_tlb_range, which on x86 does a
-* full tlb flush.
+* XXX I don't think we need to worry about this here, since
+* when clearing the pud, the calling code needs to flush the
+* tlb anyway.  But do it now for safety's sake. - jsgf
 */
+   write_cr3(read_cr3());
 }
 
 #define pud_page(pud) \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [12/12] GBPAGES: Switch direct mapping setup over to set_pte

2008-02-01 Thread Jeremy Fitzhardinge

Andi Kleen wrote:

[Actually not needed for gbpages, but an indepedent, but related cleanup]

Use set_pte() for setting up the 2MB pages in the direct mapping similar 
to what the earlier GBPAGES patches did for the 1GB PUDs.


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/init_64.c |6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -289,7 +289,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned 
 	int i = pmd_index(address);
 
 	for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {

-   unsigned long entry;
pmd_t *pmd = pmd_page + pmd_index(address);
 
 		if (address >= end) {
@@ -303,9 +302,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned 
 		if (pmd_val(*pmd))

continue;
 
-		entry = __PAGE_KERNEL_LARGE|_PAGE_GLOBAL|address;

-   entry &= __supported_pte_mask;
-   set_pmd(pmd, __pmd(entry));
+   set_pte((pte_t *)pmd,
+   pfn_pte(address >> PAGE_SHIFT, PAGE_KERNEL_LARGE));
  


Why?  64-bit Xen will need this to be set_pmd if its an update to L2 of 
the table.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [12/12] GBPAGES: Switch direct mapping setup over to set_pte

2008-02-01 Thread Jeremy Fitzhardinge

Andi Kleen wrote:

Why?  64-bit Xen will need this to be set_pmd if its an update to L2 of
the table.



Then change_page_attr() and hugepages will already not work because they both 
do exactly that.


And I didn't want to duplicate this manual code for the GBpages case, so i 
changed it everywhere to the standard way.


It's a bit moot because Xen doesn't support any kind of large page yet, 
but there has been some work in that area.  The main problem with using 
set_pte is that Xen supports trap'n'emulate for pte-level accesses, but 
not for upper levels.


Looks like you're right about the rest of cpa; may as well make it all 
consistent for now, and we can fix it later when the need arises.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3 of 5] x86/pgtable.h: demacro ptep_set_access_flags

2008-02-02 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgtable.h |   24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -287,6 +287,8 @@
 #define pte_update_defer(mm, addr, ptep)   do { } while (0)
 #endif
 
+#include 
+
 /*
  * We only update the dirty/accessed state if we set
  * the dirty bit by hand in the kernel, since the hardware
@@ -295,16 +297,18 @@
  * bit at the same time.
  */
 #define  __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
-#define ptep_set_access_flags(vma, address, ptep, entry, dirty)
\
-({ \
-   int __changed = !pte_same(*(ptep), entry);  \
-   if (__changed && dirty) {   \
-   *ptep = entry;  \
-   pte_update_defer((vma)->vm_mm, (address), (ptep));  \
-   flush_tlb_page(vma, address);   \
-   }   \
-   __changed;  \
-})
+static inline int ptep_set_access_flags(struct vm_area_struct *vma,
+   unsigned long address, pte_t *ptep,
+   pte_t entry, int dirty)
+{
+   int changed = !pte_same(*ptep, entry);
+   if (changed && dirty) {
+   *ptep = entry;
+   pte_update_defer(vma->vm_mm, address, ptep);
+   flush_tlb_page(vma, address);
+   }
+   return changed;
+}
 
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 #define ptep_test_and_clear_young(vma, addr, ptep) ({  \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0 of 5] x86: add alloc/release_pud; more demacroing

2008-02-02 Thread Jeremy Fitzhardinge
Hi Ingo,

This series:
 1. Renames the alloc/release_{pt,pd} calls to _pte, _pmd so that its
clear what they operate on.
 2. Adds alloc/release_pud, and puts calls in the appropriate places
 3. Demacros some stuff in pgtable.h

A bit eclectic, but all fairly straightforward (and no changes to
non-x86 headers ;).

Thanks,
J


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2 of 5] x86: add pud_alloc for 4-level pagetables

2008-02-02 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/kernel/paravirt.c |2 ++
 arch/x86/mm/pgtable.c  |1 +
 include/asm-x86/paravirt.h |   11 +++
 include/asm-x86/pgalloc.h  |3 +++
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -385,8 +385,10 @@
.alloc_pte = paravirt_nop,
.alloc_pmd = paravirt_nop,
.alloc_pmd_clone = paravirt_nop,
+   .alloc_pud = paravirt_nop,
.release_pte = paravirt_nop,
.release_pmd = paravirt_nop,
+   .release_pud = paravirt_nop,
 
.set_pte = native_set_pte,
.set_pte_at = native_set_pte_at,
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -34,6 +34,7 @@
 #if PAGETABLE_LEVELS > 3
 void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
+   paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
tlb_remove_page(tlb, virt_to_page(pud));
 }
 #endif /* PAGETABLE_LEVELS > 3 */
diff --git a/include/asm-x86/paravirt.h b/include/asm-x86/paravirt.h
--- a/include/asm-x86/paravirt.h
+++ b/include/asm-x86/paravirt.h
@@ -223,8 +223,10 @@
void (*alloc_pte)(struct mm_struct *mm, u32 pfn);
void (*alloc_pmd)(struct mm_struct *mm, u32 pfn);
void (*alloc_pmd_clone)(u32 pfn, u32 clonepfn, u32 start, u32 count);
+   void (*alloc_pud)(struct mm_struct *mm, u32 pfn);
void (*release_pte)(u32 pfn);
void (*release_pmd)(u32 pfn);
+   void (*release_pud)(u32 pfn);
 
/* Pagetable manipulation functions */
void (*set_pte)(pte_t *ptep, pte_t pteval);
@@ -918,6 +920,15 @@
PVOP_VCALL1(pv_mmu_ops.release_pmd, pfn);
 }
 
+static inline void paravirt_alloc_pud(struct mm_struct *mm, unsigned pfn)
+{
+   PVOP_VCALL2(pv_mmu_ops.alloc_pud, mm, pfn);
+}
+static inline void paravirt_release_pud(unsigned pfn)
+{
+   PVOP_VCALL1(pv_mmu_ops.release_pud, pfn);
+}
+
 #ifdef CONFIG_HIGHPTE
 static inline void *kmap_atomic_pte(struct page *page, enum km_type type)
 {
diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h
--- a/include/asm-x86/pgalloc.h
+++ b/include/asm-x86/pgalloc.h
@@ -11,8 +11,10 @@
 #define paravirt_alloc_pte(mm, pfn) do { } while (0)
 #define paravirt_alloc_pmd(mm, pfn) do { } while (0)
 #define paravirt_alloc_pmd_clone(pfn, clonepfn, start, count) do { } while (0)
+#define paravirt_alloc_pud(mm, pfn) do { } while (0)
 #define paravirt_release_pte(pfn) do { } while (0)
 #define paravirt_release_pmd(pfn) do { } while (0)
+#define paravirt_release_pud(pfn) do { } while (0)
 #endif
 
 /*
@@ -93,6 +95,7 @@
 #if PAGETABLE_LEVELS > 3
 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
 {
+   paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud)));
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4 of 5] x86/pgtable.h: demacro ptep_test_and_clear_young

2008-02-02 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgtable.h |   20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -311,15 +311,17 @@
 }
 
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
-#define ptep_test_and_clear_young(vma, addr, ptep) ({  \
-   int __ret = 0;  \
-   if (pte_young(*(ptep))) \
-   __ret = test_and_clear_bit(_PAGE_BIT_ACCESSED,  \
-  &(ptep)->pte);   \
-   if (__ret)  \
-   pte_update((vma)->vm_mm, addr, ptep);   \
-   __ret;  \
-})
+static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
+   unsigned long addr, pte_t *ptep)
+{
+   int ret = 0;
+   if (pte_young(*ptep))
+   ret = test_and_clear_bit(_PAGE_BIT_ACCESSED,
+&ptep->pte);
+   if (ret)
+   pte_update(vma->vm_mm, addr, ptep);
+   return ret;
+}
 
 #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
 #define ptep_clear_flush_young(vma, address, ptep) \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5 of 5] x86/pgtable.h: demacro ptep_clear_flush_young

2008-02-02 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgtable.h |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h
--- a/include/asm-x86/pgtable.h
+++ b/include/asm-x86/pgtable.h
@@ -324,14 +324,15 @@
 }
 
 #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
-#define ptep_clear_flush_young(vma, address, ptep) \
-({ \
-   int __young;\
-   __young = ptep_test_and_clear_young((vma), (address), (ptep));  \
-   if (__young)\
-   flush_tlb_page(vma, address);   \
-   __young;\
-})
+static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
+unsigned long address, pte_t *ptep)
+{
+   int young;
+   young = ptep_test_and_clear_young(vma, address, ptep);
+   if (young)
+   flush_tlb_page(vma, address);
+   return young;
+}
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1 of 5] x86: rename paravirt_alloc_pt etc after the pagetable structure

2008-02-02 Thread Jeremy Fitzhardinge
Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/kernel/paravirt.c |   10 +-
 arch/x86/kernel/vmi_32.c   |   20 ++--
 arch/x86/mm/init_32.c  |6 +++---
 arch/x86/mm/ioremap.c  |2 +-
 arch/x86/mm/pageattr.c |2 +-
 arch/x86/mm/pgtable.c  |   16 
 arch/x86/xen/enlighten.c   |   30 +++---
 include/asm-x86/paravirt.h |   32 
 include/asm-x86/pgalloc.h  |   16 
 9 files changed, 67 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -382,11 +382,11 @@
.flush_tlb_single = native_flush_tlb_single,
.flush_tlb_others = native_flush_tlb_others,
 
-   .alloc_pt = paravirt_nop,
-   .alloc_pd = paravirt_nop,
-   .alloc_pd_clone = paravirt_nop,
-   .release_pt = paravirt_nop,
-   .release_pd = paravirt_nop,
+   .alloc_pte = paravirt_nop,
+   .alloc_pmd = paravirt_nop,
+   .alloc_pmd_clone = paravirt_nop,
+   .release_pte = paravirt_nop,
+   .release_pmd = paravirt_nop,
 
.set_pte = native_set_pte,
.set_pte_at = native_set_pte_at,
diff --git a/arch/x86/kernel/vmi_32.c b/arch/x86/kernel/vmi_32.c
--- a/arch/x86/kernel/vmi_32.c
+++ b/arch/x86/kernel/vmi_32.c
@@ -392,13 +392,13 @@
 }
 #endif
 
-static void vmi_allocate_pt(struct mm_struct *mm, u32 pfn)
+static void vmi_allocate_pte(struct mm_struct *mm, u32 pfn)
 {
vmi_set_page_type(pfn, VMI_PAGE_L1);
vmi_ops.allocate_page(pfn, VMI_PAGE_L1, 0, 0, 0);
 }
 
-static void vmi_allocate_pd(struct mm_struct *mm, u32 pfn)
+static void vmi_allocate_pmd(struct mm_struct *mm, u32 pfn)
 {
/*
 * This call comes in very early, before mem_map is setup.
@@ -409,20 +409,20 @@
vmi_ops.allocate_page(pfn, VMI_PAGE_L2, 0, 0, 0);
 }
 
-static void vmi_allocate_pd_clone(u32 pfn, u32 clonepfn, u32 start, u32 count)
+static void vmi_allocate_pmd_clone(u32 pfn, u32 clonepfn, u32 start, u32 count)
 {
vmi_set_page_type(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE);
vmi_check_page_type(clonepfn, VMI_PAGE_L2);
vmi_ops.allocate_page(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE, clonepfn, 
start, count);
 }
 
-static void vmi_release_pt(u32 pfn)
+static void vmi_release_pte(u32 pfn)
 {
vmi_ops.release_page(pfn, VMI_PAGE_L1);
vmi_set_page_type(pfn, VMI_PAGE_NORMAL);
 }
 
-static void vmi_release_pd(u32 pfn)
+static void vmi_release_pmd(u32 pfn)
 {
vmi_ops.release_page(pfn, VMI_PAGE_L2);
vmi_set_page_type(pfn, VMI_PAGE_NORMAL);
@@ -871,15 +871,15 @@
 
vmi_ops.allocate_page = vmi_get_function(VMI_CALL_AllocatePage);
if (vmi_ops.allocate_page) {
-   pv_mmu_ops.alloc_pt = vmi_allocate_pt;
-   pv_mmu_ops.alloc_pd = vmi_allocate_pd;
-   pv_mmu_ops.alloc_pd_clone = vmi_allocate_pd_clone;
+   pv_mmu_ops.alloc_pte = vmi_allocate_pte;
+   pv_mmu_ops.alloc_pmd = vmi_allocate_pmd;
+   pv_mmu_ops.alloc_pmd_clone = vmi_allocate_pmd_clone;
}
 
vmi_ops.release_page = vmi_get_function(VMI_CALL_ReleasePage);
if (vmi_ops.release_page) {
-   pv_mmu_ops.release_pt = vmi_release_pt;
-   pv_mmu_ops.release_pd = vmi_release_pd;
+   pv_mmu_ops.release_pte = vmi_release_pte;
+   pv_mmu_ops.release_pmd = vmi_release_pmd;
}
 
/* Set linear is needed in all cases */
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -68,7 +68,7 @@
if (!(pgd_val(*pgd) & _PAGE_PRESENT)) {
pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE);
 
-   paravirt_alloc_pd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT);
+   paravirt_alloc_pmd(&init_mm, __pa(pmd_table) >> PAGE_SHIFT);
set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
pud = pud_offset(pgd, 0);
BUG_ON(pmd_table != pmd_offset(pud, 0));
@@ -97,7 +97,7 @@
(pte_t *)alloc_bootmem_low_pages(PAGE_SIZE);
}
 
-   paravirt_alloc_pt(&init_mm, __pa(page_table) >> PAGE_SHIFT);
+   paravirt_alloc_pte(&init_mm, __pa(page_table) >> PAGE_SHIFT);
set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE));
BUG_ON(page_table != pte_offset_kernel(pmd, 0));
}
@@ -374,7 +374,7 @@
 
pte_clear(NULL, va, pte);
}
-   paravirt_alloc_pd(&init_mm, __pa(base) >> PAGE_SHIFT);
+   paravirt_alloc_pmd(&init_mm, __pa(base) >> PAGE_SHIFT);
 }
 
 void __init native_

[PATCH 0 of 7] x86: more pgalloc unification

2008-02-02 Thread Jeremy Fitzhardinge
Hi Ingo,

This series does more unification of pgalloc, and creates a unified
mm/pgtable.c for common pagetable functions.  Ends up removing
pgalloc_32/64.h in favour of pgalloc.h.

[ I thought I'd mailed this earlier, but I don't see it on lkml.
Maybe I created the mbox without sending it.  Anyway, this should go
before the set I just mailed. ]

Thanks,
J


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1 of 7] x86: convert pgalloc_64.h from macros to inlines

2008-02-02 Thread Jeremy Fitzhardinge
Convert asm-x86/pgalloc_64.h from macros into inline functions.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgalloc_64.h |   41 ++---
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -842,3 +842,18 @@
return 0;
 }
 #endif
+
+void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
+{
+   tlb_remove_page(tlb, pte);
+}
+
+void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
+{
+   tlb_remove_page(tlb, virt_to_page(pmd));
+}
+
+void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
+{
+   tlb_remove_page(tlb, virt_to_page(pud));
+}
diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h
--- a/include/asm-x86/pgalloc_64.h
+++ b/include/asm-x86/pgalloc_64.h
@@ -1,16 +1,24 @@
 #ifndef _X86_64_PGALLOC_H
 #define _X86_64_PGALLOC_H
 
-#include 
 #include 
 #include 
+#include 
 
-#define pmd_populate_kernel(mm, pmd, pte) \
-   set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
-#define pud_populate(mm, pud, pmd) \
-   set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)))
-#define pgd_populate(mm, pgd, pud) \
-   set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud)))
+static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t 
*pte)
+{
+   set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)));
+}
+
+static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
+{
+   set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)));
+}
+
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+{
+   set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud)));
+}
 
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page 
*pte)
 {
@@ -109,11 +117,10 @@
 static inline void pte_free(struct page *pte)
 {
__free_page(pte);
-} 
+}
 
-#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte))
-
-#define __pmd_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
-#define __pud_free_tlb(tlb,x)   tlb_remove_page((tlb),virt_to_page(x))
+extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte);
+extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd);
+extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud);
 
 #endif /* _X86_64_PGALLOC_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3 of 7] x86: put paravirt stubs into common asm/pgalloc.h

2008-02-02 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/pageattr.c   |2 --
 include/asm-x86/pgalloc.h|   11 +++
 include/asm-x86/pgalloc_32.h |   10 --
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -249,9 +249,7 @@
address = __pa(address);
addr = address & LARGE_PAGE_MASK;
pbase = (pte_t *)page_address(base);
-#ifdef CONFIG_X86_32
paravirt_alloc_pt(&init_mm, page_to_pfn(base));
-#endif
 
pgprot_val(ref_prot) &= ~_PAGE_NX;
for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE)
diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h
--- a/include/asm-x86/pgalloc.h
+++ b/include/asm-x86/pgalloc.h
@@ -4,6 +4,16 @@
 #include 
 #include   /* for struct page */
 #include 
+
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define paravirt_alloc_pt(mm, pfn) do { } while (0)
+#define paravirt_alloc_pd(mm, pfn) do { } while (0)
+#define paravirt_alloc_pd_clone(pfn, clonepfn, start, count) do { } while (0)
+#define paravirt_release_pt(pfn) do { } while (0)
+#define paravirt_release_pd(pfn) do { } while (0)
+#endif
 
 /*
  * Allocate and free page tables.
diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
--- a/include/asm-x86/pgalloc_32.h
+++ b/include/asm-x86/pgalloc_32.h
@@ -1,15 +1,5 @@
 #ifndef _I386_PGALLOC_H
 #define _I386_PGALLOC_H
-
-#ifdef CONFIG_PARAVIRT
-#include 
-#else
-#define paravirt_alloc_pt(mm, pfn) do { } while (0)
-#define paravirt_alloc_pd(mm, pfn) do { } while (0)
-#define paravirt_alloc_pd_clone(pfn, clonepfn, start, count) do { } while (0)
-#define paravirt_release_pt(pfn) do { } while (0)
-#define paravirt_release_pd(pfn) do { } while (0)
-#endif
 
 static inline void pmd_populate_kernel(struct mm_struct *mm,
   pmd_t *pmd, pte_t *pte)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4 of 7] x86: move pte functions into common asm/pgalloc.h

2008-02-02 Thread Jeremy Fitzhardinge
Common definitions for 2-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/pgtable.c|6 ++
 arch/x86/mm/pgtable_32.c |6 --
 include/asm-x86/pgalloc.h|   16 
 include/asm-x86/pgalloc_32.h |   17 -
 include/asm-x86/pgalloc_64.h |   19 ---
 5 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -843,11 +843,6 @@
 }
 #endif
 
-void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
-{
-   tlb_remove_page(tlb, pte);
-}
-
 void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
tlb_remove_page(tlb, virt_to_page(pmd));
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -16,6 +16,12 @@
pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
 #endif
return pte;
+}
+
+void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
+{
+   paravirt_release_pt(page_to_pfn(pte));
+   tlb_remove_page(tlb, pte);
 }
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -178,12 +178,6 @@
__VMALLOC_RESERVE += reserve;
 }
 
-void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
-{
-   paravirt_release_pt(page_to_pfn(pte));
-   tlb_remove_page(tlb, pte);
-}
-
 #ifdef CONFIG_X86_PAE
 
 void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h
--- a/include/asm-x86/pgalloc.h
+++ b/include/asm-x86/pgalloc.h
@@ -24,6 +24,22 @@
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address);
 struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address);
 
+/* Should really implement gc for free page table pages. This could be
+   done with a reference count in struct page. */
+
+static inline void pte_free_kernel(pte_t *pte)
+{
+   BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
+   free_page((unsigned long)pte);
+}
+
+static inline void pte_free(struct page *pte)
+{
+   __free_page(pte);
+}
+
+extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte);
+
 #ifdef CONFIG_X86_32
 # include "pgalloc_32.h"
 #else
diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
--- a/include/asm-x86/pgalloc_32.h
+++ b/include/asm-x86/pgalloc_32.h
@@ -15,23 +15,6 @@
paravirt_alloc_pt(mm, pfn);
set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE));
 }
-
-/*
- * Allocate and free page tables.
- */
-
-static inline void pte_free_kernel(pte_t *pte)
-{
-   free_page((unsigned long)pte);
-}
-
-static inline void pte_free(struct page *pte)
-{
-   __free_page(pte);
-}
-
-
-extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte);
 
 #ifdef CONFIG_X86_PAE
 /*
diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h
--- a/include/asm-x86/pgalloc_64.h
+++ b/include/asm-x86/pgalloc_64.h
@@ -45,21 +45,6 @@
free_page((unsigned long)pud);
 }
 
-/* Should really implement gc for free page table pages. This could be
-   done with a reference count in struct page. */
-
-static inline void pte_free_kernel(pte_t *pte)
-{
-   BUG_ON((unsigned long)pte & (PAGE_SIZE-1));
-   free_page((unsigned long)pte); 
-}
-
-static inline void pte_free(struct page *pte)
-{
-   __free_page(pte);
-}
-
-extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte);
 extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd);
 extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2 of 7] x86: add common mm/pgtable.c

2008-02-02 Thread Jeremy Fitzhardinge
Add a common arch/x86/mm/pgtable.c file for common pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/Makefile_32  |2
 arch/x86/mm/Makefile_64  |2
 arch/x86/mm/pgtable.c|  234 ++
 arch/x86/mm/pgtable_32.c |  185 -
 include/asm-x86/pgalloc.h|   19 +++
 include/asm-x86/pgalloc_32.h |   11 -
 include/asm-x86/pgalloc_64.h |   61 --
 7 files changed, 255 insertions(+), 259 deletions(-)

diff --git a/arch/x86/mm/Makefile_32 b/arch/x86/mm/Makefile_32
--- a/arch/x86/mm/Makefile_32
+++ b/arch/x86/mm/Makefile_32
@@ -2,7 +2,7 @@
 # Makefile for the linux i386-specific parts of the memory manager.
 #
 
-obj-y  := init_32.o pgtable_32.o fault.o ioremap.o extable.o pageattr.o mmap.o
+obj-y  := init_32.o pgtable.o pgtable_32.o fault.o ioremap.o extable.o 
pageattr.o mmap.o
 
 obj-$(CONFIG_NUMA) += discontig_32.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
diff --git a/arch/x86/mm/Makefile_64 b/arch/x86/mm/Makefile_64
--- a/arch/x86/mm/Makefile_64
+++ b/arch/x86/mm/Makefile_64
@@ -2,7 +2,7 @@
 # Makefile for the linux x86_64-specific parts of the memory manager.
 #
 
-obj-y   := init_64.o fault.o ioremap.o extable.o pageattr.o mmap.o
+obj-y   := init_64.o fault.o ioremap.o extable.o pageattr.o pgtable.o mmap.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NUMA) += numa_64.o
 obj-$(CONFIG_K8_NUMA) += k8topology_64.o
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
new file mode 100644
--- /dev/null
+++ b/arch/x86/mm/pgtable.c
@@ -0,0 +1,235 @@
+#include 
+#include 
+
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+   return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+}
+
+struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
+{
+   struct page *pte;
+
+#ifdef CONFIG_HIGHPTE
+   pte = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO, 0);
+#else
+   pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
+#endif
+   return pte;
+}
+
+#ifdef CONFIG_X86_64
+static inline void pgd_list_add(pgd_t *pgd)
+{
+   struct page *page = virt_to_page(pgd);
+
+   spin_lock(&pgd_lock);
+   list_add(&page->lru, &pgd_list);
+   spin_unlock(&pgd_lock);
+}
+
+static inline void pgd_list_del(pgd_t *pgd)
+{
+   struct page *page = virt_to_page(pgd);
+
+   spin_lock(&pgd_lock);
+   list_del(&page->lru);
+   spin_unlock(&pgd_lock);
+}
+
+pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+   unsigned boundary;
+   pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+   if (!pgd)
+   return NULL;
+   pgd_list_add(pgd);
+   /*
+* Copy kernel pointers in from init.
+* Could keep a freelist or slab cache of those because the kernel
+* part never changes.
+*/
+   boundary = pgd_index(__PAGE_OFFSET);
+   memset(pgd, 0, boundary * sizeof(pgd_t));
+   memcpy(pgd + boundary,
+  init_level4_pgt + boundary,
+  (PTRS_PER_PGD - boundary) * sizeof(pgd_t));
+   return pgd;
+}
+
+void pgd_free(pgd_t *pgd)
+{
+   BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
+   pgd_list_del(pgd);
+   free_page((unsigned long)pgd);
+}
+
+#else
+/*
+ * List of all pgd's needed for non-PAE so it can invalidate entries
+ * in both cached and uncached pgd's; not needed for PAE since the
+ * kernel pmd is shared. If PAE were not to share the pmd a similar
+ * tactic would be needed. This is essentially codepath-based locking
+ * against pageattr.c; it is the unique case in which a valid change
+ * of kernel pagetables can't be lazily synchronized by vmalloc faults.
+ * vmalloc faults work because attached pagetables are never freed.
+ * -- wli
+ */
+static inline void pgd_list_add(pgd_t *pgd)
+{
+   struct page *page = virt_to_page(pgd);
+
+   list_add(&page->lru, &pgd_list);
+}
+
+static inline void pgd_list_del(pgd_t *pgd)
+{
+   struct page *page = virt_to_page(pgd);
+
+   list_del(&page->lru);
+}
+
+#define UNSHARED_PTRS_PER_PGD  \
+   (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
+
+static void pgd_ctor(void *p)
+{
+   pgd_t *pgd = p;
+   unsigned long flags;
+
+   /* Clear usermode parts of PGD */
+   memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
+
+   spin_lock_irqsave(&pgd_lock, flags);
+
+   /* If the pgd points to a shared pagetable level (either the
+  ptes in non-PAE, or shared PMD in PAE), then just copy the
+  references from swapper_pg_dir. */
+   if (PAGETABLE_LEVELS == 2 ||
+   (PAGETABLE_LEVELS == 3 && SHARED_KERNEL_PMD)) {
+   clone_pgd_range(pgd + USER_PTRS_PER_PGD,
+   swapper_pg_dir + USER_PTRS_PER_PGD,
+

[PATCH 7 of 7] x86: move all the pgd_list handling to one place

2008-02-02 Thread Jeremy Fitzhardinge
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/pgtable.c |   24 +---
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -39,32 +39,30 @@
 #endif /* PAGETABLE_LEVELS > 3 */
 #endif /* PAGETABLE_LEVELS > 2 */
 
-#ifdef CONFIG_X86_64
 static inline void pgd_list_add(pgd_t *pgd)
 {
struct page *page = virt_to_page(pgd);
 
-   spin_lock(&pgd_lock);
list_add(&page->lru, &pgd_list);
-   spin_unlock(&pgd_lock);
 }
 
 static inline void pgd_list_del(pgd_t *pgd)
 {
struct page *page = virt_to_page(pgd);
 
-   spin_lock(&pgd_lock);
list_del(&page->lru);
-   spin_unlock(&pgd_lock);
 }
 
+#ifdef CONFIG_X86_64
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
unsigned boundary;
pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
if (!pgd)
return NULL;
+   spin_lock(&pgd_lock);
pgd_list_add(pgd);
+   spin_unlock(&pgd_lock);
/*
 * Copy kernel pointers in from init.
 * Could keep a freelist or slab cache of those because the kernel
@@ -81,7 +79,9 @@
 void pgd_free(pgd_t *pgd)
 {
BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
+   spin_lock(&pgd_lock);
pgd_list_del(pgd);
+   spin_unlock(&pgd_lock);
free_page((unsigned long)pgd);
 }
 
@@ -96,20 +96,6 @@
  * vmalloc faults work because attached pagetables are never freed.
  * -- wli
  */
-static inline void pgd_list_add(pgd_t *pgd)
-{
-   struct page *page = virt_to_page(pgd);
-
-   list_add(&page->lru, &pgd_list);
-}
-
-static inline void pgd_list_del(pgd_t *pgd)
-{
-   struct page *page = virt_to_page(pgd);
-
-   list_del(&page->lru);
-}
-
 #define UNSHARED_PTRS_PER_PGD  \
(SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6 of 7] x86: move pud/pgd functions into common asm/pgalloc.h

2008-02-02 Thread Jeremy Fitzhardinge
Common definitions for 4-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/pgtable.c|7 ++
 include/asm-x86/pgalloc.h|   46 --
 include/asm-x86/pgalloc_32.h |   24 -
 include/asm-x86/pgalloc_64.h |   32 -
 4 files changed, 47 insertions(+), 62 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -842,8 +842,3 @@
return 0;
 }
 #endif
-
-void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
-{
-   tlb_remove_page(tlb, virt_to_page(pud));
-}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -30,6 +30,13 @@
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
tlb_remove_page(tlb, virt_to_page(pmd));
 }
+
+#if PAGETABLE_LEVELS > 3
+void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
+{
+   tlb_remove_page(tlb, virt_to_page(pud));
+}
+#endif /* PAGETABLE_LEVELS > 3 */
 #endif /* PAGETABLE_LEVELS > 2 */
 
 #ifdef CONFIG_X86_64
diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h
--- a/include/asm-x86/pgalloc.h
+++ b/include/asm-x86/pgalloc.h
@@ -69,12 +69,46 @@
 }
 
 extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd);
+
+static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
+{
+   paravirt_alloc_pd(mm, __pa(pmd) >> PAGE_SHIFT);
+
+   /* Note: almost everything apart from _PAGE_PRESENT is
+  reserved at the pmd (PDPT) level. */
+   set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT));
+
+#ifdef CONFIG_X86_PAE
+   /*
+* According to Intel App note "TLBs, Paging-Structure Caches,
+* and Their Invalidation", April 2007, document 317080-001,
+* section 8.1: in PAE mode we explicitly have to flush the
+* TLB via cr3 if the top-level pgd is changed...
+*/
+   if (mm == current->active_mm)
+   write_cr3(read_cr3());
+#endif /* CONFIG_X86_PAE */
+}
+
+#if PAGETABLE_LEVELS > 3
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+{
+   set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud)));
+}
+
+static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+   return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+}
+
+static inline void pud_free(pud_t *pud)
+{
+   BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
+   free_page((unsigned long)pud);
+}
+
+extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud);
+#endif /* PAGETABLE_LEVELS > 3 */
 #endif /* PAGETABLE_LEVELS > 2 */
 
-#ifdef CONFIG_X86_32
-# include "pgalloc_32.h"
-#else
-# include "pgalloc_64.h"
-#endif
-
 #endif /* _ASM_X86_PGALLOC_H */
diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
deleted file mode 100644
--- a/include/asm-x86/pgalloc_32.h
+++ /dev/null
@@ -1,24 +0,0 @@
-#ifndef _I386_PGALLOC_H
-#define _I386_PGALLOC_H
-
-#ifdef CONFIG_X86_PAE
-static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
-{
-   paravirt_alloc_pd(mm, __pa(pmd) >> PAGE_SHIFT);
-
-   /* Note: almost everything apart from _PAGE_PRESENT is
-  reserved at the pmd (PDPT) level. */
-   set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT));
-
-   /*
-* According to Intel App note "TLBs, Paging-Structure Caches,
-* and Their Invalidation", April 2007, document 317080-001,
-* section 8.1: in PAE mode we explicitly have to flush the
-* TLB via cr3 if the top-level pgd is changed...
-*/
-   if (mm == current->active_mm)
-   write_cr3(read_cr3());
-}
-#endif /* CONFIG_X86_PAE */
-
-#endif /* _I386_PGALLOC_H */
diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h
deleted file mode 100644
--- a/include/asm-x86/pgalloc_64.h
+++ /dev/null
@@ -1,29 +0,0 @@
-#ifndef _X86_64_PGALLOC_H
-#define _X86_64_PGALLOC_H
-
-#include 
-
-static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
-{
-   set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)));
-}
-
-static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
-{
-   set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud)));
-}
-
-static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
-}
-
-static inline void pud_free (pud_t *pud)
-{
-   BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-   free_page((unsigned long)pud);
-}
-
-extern void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud);
-
-#endif /* _X86_64_PGALLOC_H */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5 of 7] x86: move pmd functions into common asm/pgalloc.h

2008-02-02 Thread Jeremy Fitzhardinge
Common definitions for 3-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/pgtable.c|8 
 arch/x86/mm/pgtable_32.c |   10 --
 include/asm-x86/pgalloc.h|   31 +++
 include/asm-x86/pgalloc_32.h |   31 ---
 include/asm-x86/pgalloc_64.h |   26 --
 5 files changed, 39 insertions(+), 67 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -843,11 +843,6 @@
 }
 #endif
 
-void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
-{
-   tlb_remove_page(tlb, virt_to_page(pmd));
-}
-
 void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
tlb_remove_page(tlb, virt_to_page(pud));
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -23,6 +23,14 @@
paravirt_release_pt(page_to_pfn(pte));
tlb_remove_page(tlb, pte);
 }
+
+#if PAGETABLE_LEVELS > 2
+void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
+{
+   paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
+   tlb_remove_page(tlb, virt_to_page(pmd));
+}
+#endif /* PAGETABLE_LEVELS > 2 */
 
 #ifdef CONFIG_X86_64
 static inline void pgd_list_add(pgd_t *pgd)
diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -177,13 +177,3 @@
__FIXADDR_TOP = -reserve - PAGE_SIZE;
__VMALLOC_RESERVE += reserve;
 }
-
-#ifdef CONFIG_X86_PAE
-
-void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
-{
-   paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
-   tlb_remove_page(tlb, virt_to_page(pmd));
-}
-
-#endif
diff --git a/include/asm-x86/pgalloc.h b/include/asm-x86/pgalloc.h
--- a/include/asm-x86/pgalloc.h
+++ b/include/asm-x86/pgalloc.h
@@ -40,6 +40,37 @@
 
 extern void __pte_free_tlb(struct mmu_gather *tlb, struct page *pte);
 
+static inline void pmd_populate_kernel(struct mm_struct *mm,
+  pmd_t *pmd, pte_t *pte)
+{
+   paravirt_alloc_pt(mm, __pa(pte) >> PAGE_SHIFT);
+   set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+}
+
+static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
+   struct page *pte)
+{
+   unsigned long pfn = page_to_pfn(pte);
+
+   paravirt_alloc_pt(mm, pfn);
+   set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE));
+}
+
+#if PAGETABLE_LEVELS > 2
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+   return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+}
+
+static inline void pmd_free(pmd_t *pmd)
+{
+   BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
+   free_page((unsigned long)pmd);
+}
+
+extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd);
+#endif /* PAGETABLE_LEVELS > 2 */
+
 #ifdef CONFIG_X86_32
 # include "pgalloc_32.h"
 #else
diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
--- a/include/asm-x86/pgalloc_32.h
+++ b/include/asm-x86/pgalloc_32.h
@@ -1,38 +1,7 @@
 #ifndef _I386_PGALLOC_H
 #define _I386_PGALLOC_H
 
-static inline void pmd_populate_kernel(struct mm_struct *mm,
-  pmd_t *pmd, pte_t *pte)
-{
-   paravirt_alloc_pt(mm, __pa(pte) >> PAGE_SHIFT);
-   set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
-}
-
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page 
*pte)
-{
-   unsigned long pfn = page_to_pfn(pte);
-
-   paravirt_alloc_pt(mm, pfn);
-   set_pmd(pmd, __pmd(((pteval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE));
-}
-
 #ifdef CONFIG_X86_PAE
-/*
- * In the PAE case we free the pmds as part of the pgd.
- */
-static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
-}
-
-static inline void pmd_free(pmd_t *pmd)
-{
-   BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-   free_page((unsigned long)pmd);
-}
-
-extern void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd);
-
 static inline void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 {
paravirt_alloc_pd(mm, __pa(pmd) >> PAGE_SHIFT);
diff --git a/include/asm-x86/pgalloc_64.h b/include/asm-x86/pgalloc_64.h
--- a/include/asm-x86/pgalloc_64.h
+++ b/include/asm-x86/pgalloc_64.h
@@ -2,11 +2,6 @@
 #define _X86_64_PGALLOC_H
 
 #include 
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, pte_t 
*pte)
-{
-   set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)));
-}
 
 static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
 {
@@ -16,22 +11,6 @@
 static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
 {
set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud)));
-}
-
-stati

Re: [PATCH 3 of 5] x86/pgtable.h: demacro ptep_set_access_flags

2008-02-02 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:
another thing: these inlines are a bit fat and they are used in more 
than one place. Please move them into pgtable.c. The rule of thumb is: 
if an inline is more than 2 lines big, it is a likely candidate for 
uninlining. (and even many 2-liners, and even some 1-liners are 
candidates) Especially under paravirt the MMU inlines grow these update 
notifiers so they become even fatter.
  


I agree, but I wanted to keep it semantically equivalent to the 
original.  I'll add a move to out of line patch.


having functions instead of inlines also simplifies the type 
dependencies by quite a degree.
  


Indeed, the floating asm/tlbflush.h is a bit of a wart.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6 of 7] x86: move pud/pgd functions into common asm/pgalloc.h

2008-02-02 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

  

Common definitions for 4-level pagetable functions.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/x86/mm/pgtable.c|7 ++
 include/asm-x86/pgalloc.h|   46 --
 include/asm-x86/pgalloc_32.h |   24 -
 include/asm-x86/pgalloc_64.h |   32 -
 4 files changed, 47 insertions(+), 62 deletions(-)



random-qa found an early bootup hang on 32-bit (config attached).
  


The config you sent was 64-bit.

i bisected it down to this patch of yours. It's a bit large so it's not 
obvious what is happening. Could you please keep patches that do 
functional changes smaller?


Will do, though this one is more or less pure code motion.  But I can 
make it actual pure code motion with a separate merge patch.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6 of 7] x86: move pud/pgd functions into common asm/pgalloc.h

2008-02-02 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:
yes but the early hang is very real so either my hardware is stubbornly 
ignoring that your patch is pure code movement (in which case i'll have 
to have a word or two with my hardware), or your patch is perhaps wrong 
somewhere ;-)
  


I see what it is.  set_pud on 32-bit needs only _PAGE_PRESENT, but for 
64-bit it needs _PAGE_TABLE.


generally you can protect yourself against full reverts by separating 
the NOP changes from the non-NOP changes. If a change is small enough i 
might spot the bug immediately and fix it - otherwise i have to undo 
your whole series to keep the x86.git ball rolling. I thought we went 
through this excercise a few times already :-/ ...
  


This one was supposed to be pure motion, but my eyeball diff failed me.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2 of 7] x86: add common mm/pgtable.c

2008-02-02 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

  

Add a common arch/x86/mm/pgtable.c file for common pagetable functions.



randconfig testing found a build breakage on 32-bit, and that got 
bisected down to this patch of yours.


Couldn't reproduce.  What was the failure?

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2 of 7] x86: add common mm/pgtable.c

2008-02-02 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

oops, i thought i pasted that. It was this:

arch/x86/mm/pgtable.c: In function 'pgd_alloc':
arch/x86/mm/pgtable.c:213: error: implicit declaration of function 
'quicklist_alloc'
arch/x86/mm/pgtable.c:213: warning: initialization makes pointer from integer 
without a cast
arch/x86/mm/pgtable.c:218: error: implicit declaration of function 
'quicklist_free'
arch/x86/mm/pgtable.c: In function 'check_pgt_cache':
arch/x86/mm/pgtable.c:233: error: implicit declaration of function 
'quicklist_trim'

also, config re-attached. (maybe i messed up the previous one)
  


Still can't reproduce, but it's a simple case of missing headers.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2 of 7] x86: add common mm/pgtable.c

2008-02-02 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

  

Ingo Molnar wrote:


oops, i thought i pasted that. It was this:

arch/x86/mm/pgtable.c: In function 'pgd_alloc':
arch/x86/mm/pgtable.c:213: error: implicit declaration of function 
'quicklist_alloc'
arch/x86/mm/pgtable.c:213: warning: initialization makes pointer from integer 
without a cast
arch/x86/mm/pgtable.c:218: error: implicit declaration of function 
'quicklist_free'
arch/x86/mm/pgtable.c: In function 'check_pgt_cache':
arch/x86/mm/pgtable.c:233: error: implicit declaration of function 
'quicklist_trim'

also, config re-attached. (maybe i messed up the previous one)
  

Still can't reproduce, but it's a simple case of missing headers.



ok, i'll figure it out if/when it happens with your resent queue.
  
I added an explict  to pgtable.c, so there's no 
excuse to still complain.  Will resend the combined series shortly.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-13 Thread Jeremy Fitzhardinge

Jody Belka wrote:

Hi all,

I thought I'd try out 2.6.25-rc1 as a xen 32-bit pae domU the other day.
Unfortunately, I didn't get very far very fast, as the domain just crashed
immediately upon booting, without any direct feedback (I did have messages
on the xen message buffer, which helped). This even with earlyprintk turned on.

After a long, arduous journey, I managed to track this down to the following:

--
commit  551889a6e2a24a9c06fd453ea03b57b7746ffdc0

x86: construct 32-bit boot time page tables in native format.

Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
kernel are in PAE format.

early_ioremap is updated to use the standard page table accessors.

Clear any mappings beyond max_low_pfn from the boot page tables in
native_pagetable_setup_start because the initial mappings can extend
beyond the range of physical memory and into the vmalloc area.

Derived from patches by Eric Biederman and H. Peter Anvin.

[ [EMAIL PROTECTED]: PAE swapper_pg_dir needs to be page-sized fix ]
--

However, to make life more interesting, just reverting this isn't quite
enough to get us to the promised land. If we try, we find that although
we do now start booting, we crash again a short way into the process.

In a different manner though. Specifically, in early_ioremap_clear.
Reverting the above commit /except/ for the changes to arch/x86/mm/ioremap.c
gets everything working again.

Well, except that we can't shutdown/reboot properly, but I've sent a patch
for that in another email.


I'm afraid i've no idea what needs to be done to get the change to work
with xen, but i'm willing to try out any patches people come up with.
Please cc me on any replies, as i'm not subscribed, thanks.


Hi,

Although I'm on vacation, I happened to download a recent copy of 
x86.git and found that it crashes early.  Here's a couple of patches to 
apply; I don't know if they apply to current git, but I hope it helps.


   J
Subject: x86/early_ioremap: don't assume we're using swapper_pg_dir

At the early stages of boot, before the kernel pagetable has been
fully initialized, a Xen kernel will still be running off the
Xen-provided pagetables rather than swapper_pg_dir[].  Therefore,
readback cr3 to determine the base of the pagetable rather than
assuming swapper_pg_dir[].

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/x86/mm/ioremap.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

===
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -265,7 +265,9 @@
 
 static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
 {
-	pgd_t *pgd = &swapper_pg_dir[pgd_index(addr)];
+	/* Don't assume we're using swapper_pg_dir at this point */
+	pgd_t *base = __va(read_cr3());
+	pgd_t *pgd = &base[pgd_index(addr)];
 	pud_t *pud = pud_offset(pgd, addr);
 	pmd_t *pmd = pmd_offset(pud, addr);
 
Subject: xen: unpin initial Xen pagetable once we're finished with it

Unpin the Xen-provided pagetable once we've finished with it, so it
doesn't cause stray references which cause later swapper_pg_dir
pagetable updates to fail.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
 arch/x86/xen/enlighten.c |4 
 1 file changed, 4 insertions(+)

===
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -798,6 +798,10 @@
 	 * added to the table can be prepared properly for Xen.
 	 */
 	xen_write_cr3(__pa(base));
+
+	/* Unpin initial Xen pagetable */
+	pin_pagetable_pfn(MMUEXT_UNPIN_TABLE,
+			  PFN_DOWN(__pa(xen_start_info->pt_base)));
 }
 
 static __init void xen_pagetable_setup_done(pgd_t *base)


Re: 2.6.25-rc1 xen pvops regression

2008-02-13 Thread Jeremy Fitzhardinge

Joel Becker wrote:

On Wed, Feb 13, 2008 at 10:59:33PM +1100, Jeremy Fitzhardinge wrote:
  

I thought I'd try out 2.6.25-rc1 as a xen 32-bit pae domU the other day.
Unfortunately, I didn't get very far very fast, as the domain just crashed
immediately upon booting, without any direct feedback (I did have messages
on the xen message buffer, which helped). This even with earlyprintk turned on.

After a long, arduous journey, I managed to track this down to the following:

--
commit  551889a6e2a24a9c06fd453ea03b57b7746ffdc0
  


I'm seeing the same problem, with no messages at all from xen
other than "domain crashed, restart disabled" in xend.log.  I got a
different commit in my bisect, 0947b2f31ca1ea1211d3cde2dbd8fcec579ef395
(i386 boot: replace boot_ioremap with enhanced bt_ioremap - enhance
bt_ioremap).  I started from yesterday's
96b5a46e2a72dc1829370c87053e0cd558d58bc0 (WMI: initialize
wmi_blocks.list even if ACPI is disabled) and a known good
9b73e76f3cf63379dcf45fcd4f112f5812418d0a (Merge
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6).

  
Although I'm on vacation, I happened to download a recent copy of  
x86.git and found that it crashes early.  Here's a couple of patches to  
apply; I don't know if they apply to current git, but I hope it helps.



  

Subject: x86/early_ioremap: don't assume we're using swapper_pg_dir
Subject: xen: unpin initial Xen pagetable once we're finished with it



After my bisect was done, I re-pulled from Linus and discovered
these patches.  Searching for these emails, they certainly sound like my
problem.  But the kernel does not boot, commit
10270d4838bdc493781f5a1cf2e90e9c34c9142f (acpi: fix
acpi_os_read_pci_configuration() misuse of raw_pci_read()).  Still no
output from Xen - pygrub selects the kernel, and then the domain just
dies back to the dom0 shell.
Attached are my latest .config and my bisect log.
  


Is the domain ending up in the crashed state?  Do you get a register 
dump with xm dmesg?  That would be very useful in determining what went 
wrong.  You may need to compile Xen with debug=y in Config.mk.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 1/3] x86: use ELF format in compressed images.

2008-02-14 Thread Jeremy Fitzhardinge

Ian Campbell wrote:

On Thu, 2008-02-14 at 17:01 +, Ian Campbell wrote:
  

I have a xen domain builder patch as well. I was waiting for the Linux
side to gain some traction before putting it forward (I'd attach it
now but it's at home on a laptop which is sleeping).



Here it is:

# HG changeset patch
# User [EMAIL PROTECTED]
# Date 1203011758 0
# Node ID 3079b4b3835e3aba52bb6548bbbced70471a9f32
# Parent  42369d21641d6297dc369441c3bfd355880d28c0
Support loading Linux bzImage v2.08 and up.
  


Do you also have a patch to update the boot protocol?

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv3 1/3] x86: use ELF format in compressed images.

2008-02-14 Thread Jeremy Fitzhardinge

H. Peter Anvin wrote:

Jeremy Fitzhardinge wrote:


Do you also have a patch to update the boot protocol?



Looking for anything different than the root of this thread?


Yes, the patch for the Xen domain builder to boot a bzImage using the 
Linux boot protocol rather than the Xen one.  Ian's patch will extract 
the ELF file from the bzImage, but still boot it by finding the Xen 
entrypoint in the notes, with %esi pointing to the Xen start_info rather 
than the boot_params (unless I'm missing something).


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-15 Thread Jeremy Fitzhardinge

Joel Becker wrote:

On Thu, Feb 14, 2008 at 06:50:52PM +1100, Jeremy Fitzhardinge wrote:
  

I'm seeing the same problem, with no messages at all from xen
other than "domain crashed, restart disabled" in xend.log.  I got a
different commit in my bisect, 0947b2f31ca1ea1211d3cde2dbd8fcec579ef395
(i386 boot: replace boot_ioremap with enhanced bt_ioremap - enhance
bt_ioremap).  I started from yesterday's
96b5a46e2a72dc1829370c87053e0cd558d58bc0 (WMI: initialize
wmi_blocks.list even if ACPI is disabled) and a known good
9b73e76f3cf63379dcf45fcd4f112f5812418d0a (Merge
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6).
  
Is the domain ending up in the crashed state?  Do you get a register  
dump with xm dmesg?  That would be very useful in determining what went  
wrong.  You may need to compile Xen with debug=y in Config.mk.



I didn't know xm dmesg existed :-)  Regarding debug=y, I'm using
a prepackaged dom0 set.  Here's what I find in xm dmesg:

Joel

(XEN) mm.c:1825:d109 Bad type (saw 2801 != exp e000) 
for mfn 3a2f0f (pfn f0)
(XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 
0003a2f0f063 for dom109
(XEN) mm.c:1825:d109 Bad type (saw 2801 != exp e000) 
for mfn 3a2f0f (pfn f0)
(XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 
0003a2f0f063 for dom109
(XEN) mm.c:3331:d109 ptwr_emulate: could not get_page_from_l1e()
  


Hm, I have a suspicion about what this might be.  I'll haven't tried 
reproducing it yet though.



(XEN) Unhandled page fault in domain 109 on VCPU 0 (ec=0003)
(XEN) Pagetable walk from c01687f0:
(XEN)  L4[0x000] = 0003a2933027 06cc
(XEN)  L3[0x003] = 00039afea027 0005
(XEN)  L2[0x000] = 00039bfb7067 1048 
(XEN)  L1[0x168] = 0003a2e97061 0168

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 109 (vcpu#0) crashed on cpu#2:
(XEN) [ Xen-3.1.3-rc3  x86_64  debug=n  Not tainted ]
(XEN) CPU:2
(XEN) RIP:e019:[<c04040bd>]
  


What does this EIP correspond to in your kernel?  Also:

c01687f0 c0417ab6 c040288f c040299a c0403270

(as guesses of potential callers to try and work out a stack trace).

Thanks,
J


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-16 Thread Jeremy Fitzhardinge

Joel Becker wrote:

On Sat, Feb 16, 2008 at 01:44:26PM +1100, Jeremy Fitzhardinge wrote:
  

Joel Becker wrote:


(XEN) mm.c:1825:d109 Bad type (saw 2801 != exp 
e000) for mfn 3a2f0f (pfn f0)
(XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 
0003a2f0f063 for dom109
(XEN) mm.c:1825:d109 Bad type (saw 2801 != exp 
e000) for mfn 3a2f0f (pfn f0)
(XEN) mm.c:649:d109 Error getting mfn 3a2f0f (pfn f0) from L1 entry 
0003a2f0f063 for dom109

(XEN) mm.c:3331:d109 ptwr_emulate: could not get_page_from_l1e()
  
  
Hm, I have a suspicion about what this might be.  I'll haven't tried 
reproducing it yet though.




(XEN) Unhandled page fault in domain 109 on VCPU 0 (ec=0003)
(XEN) Pagetable walk from c01687f0:
(XEN)  L4[0x000] = 0003a2933027 06cc
(XEN)  L3[0x003] = 00039afea027 0005
(XEN)  L2[0x000] = 00039bfb7067 1048 (XEN)  L1[0x168] = 
0003a2e97061 0168

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 109 (vcpu#0) crashed on cpu#2:
(XEN) [ Xen-3.1.3-rc3  x86_64  debug=n  Not tainted ]
(XEN) CPU:2
(XEN) RIP:e019:[<c04040bd>]
  
  

What does this EIP correspond to in your kernel?  Also:

c01687f0 c0417ab6 c040288f c040299a c0403270

(as guesses of potential callers to try and work out a stack trace).



ksymoops is no help at all, but I got these from objdump of
vmlinux:

c04040bd xen_set_pte
c0417ab6 set_pte_present
c040288f set_bit
c040299a __raw_spin_unlock
c0403270 __set_64bit


(My usual technique is use "gdb vmlinux" and "x/i 0x" to do the 
lookup.) 

Unfortunately that doesn't narrow down what the kernel was actually 
trying to do at the time.  Clearly a set_pte; looks like someone is 
trying to create a writable mapping of an existing pte page.


Does "console=hvc0 earlyprintk=xen" on the kernel command line give any 
clue about how far it gets before crashing?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/mm: stop allocating pmd page if failed

2012-07-25 Thread Jeremy Fitzhardinge
On 07/24/2012 06:15 AM, Yuanhan Liu wrote:
> The old code would call __get_free_page() even though previous
> allocation fail met. This is not needed.

Yeah, I guess, but its hardly worth changing.

J


>
> Signed-off-by: Yuanhan Liu 
> Cc: Jeremy Fitzhardinge 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> ---
>  arch/x86/mm/pgtable.c |   18 +-
>  1 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 8573b83..6760348 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -181,24 +181,24 @@ static void free_pmds(pmd_t *pmds[])
>  {
>   int i;
>  
> - for(i = 0; i < PREALLOCATED_PMDS; i++)
> - if (pmds[i])
> - free_page((unsigned long)pmds[i]);
> + for(i = 0; i < PREALLOCATED_PMDS; i++) {
> + if (pmds[i] == NULL)
> + break;
> + free_page((unsigned long)pmds[i]);
> + }
>  }
>  
>  static int preallocate_pmds(pmd_t *pmds[])
>  {
>   int i;
> - bool failed = false;
>  
>   for(i = 0; i < PREALLOCATED_PMDS; i++) {
> - pmd_t *pmd = (pmd_t *)__get_free_page(PGALLOC_GFP);
> - if (pmd == NULL)
> - failed = true;
> - pmds[i] = pmd;
> + pmds[i] = (pmd_t *)__get_free_page(PGALLOC_GFP);
> + if (pmds[i] == NULL)
> + break;
>   }
>  
> - if (failed) {
> + if (i < PREALLOCATED_PMDS) {
>   free_pmds(pmds);
>   return -ENOMEM;
>   }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver

2012-10-10 Thread Jeremy Fitzhardinge
On 10/09/2012 06:14 PM, Andrew Morton wrote:
> On Wed, 10 Oct 2012 00:09:12 + KY Srinivasan  wrote:
>
 +  if (!pg) {
 +  *alloc_error = true;
 +  return i * alloc_unit;
 +  }
 +
 +  totalram_pages -= alloc_unit;
>>> Well, I'd consider totalram_pages to be an mm-private thing which drivers
>>> shouldn't muck with.  Why is this done?
>> By modifying the totalram_pages, the information presented in /proc/meminfo
>> correctly reflects what is currently assigned to the guest (MemTotal).
> eh?  /proc/meminfo:MemTotal tells you the total memory in the machine. 
> The only thing which should change it after boot is memory hotplug. 
[...]
> Why on earth do balloon drivers do this?  If the amount of memory which
> is consumed by balloons is interesting then it should be exported via a
> standalone metric, not by mucking with totalram_pages.

Balloon drivers are trying to fake a form of page-by-page memory
hotplug.  When they allocate memory from the kernel, they're actually
giving the pages back to the hypervisor to redistribute to other
guests.  They reduce totalram_pages to try and reflect that the memory
is no longer the kernel (in Xen, at least, the pfns will no longer have
any physical page underlying them).

I agree this is pretty ugly; it would be nice to have some better
interface to indicate what's going on.  At one point I tried to use the
memory hotplug interfaces for larger-scale dynamic transfers of memory
between a domain and the host, but when I last looked at it, it was too
coarse grained and heavyweight to replace the balloon mechanism.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 01:14 PM, Andi Kleen wrote:
> FWIW I use the paravirt spinlock ops for adding lock elision
> to the spinlocks.

Does lock elision still use the ticketlock algorithm/structure, or are
they different?  If they're still basically ticketlocks, then it seems
to me that they're complimentary - hle handles the fastpath, and pv the
slowpath.

> This needs to be done at the top level (so the level you're removing)
>
> However I don't like the pv mechanism very much and would 
> be fine with using an static key hook in the main path
> like I do for all the other lock types.

Right.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 12:21 PM, Raghavendra K T wrote:
> x86/spinlock: Replace pv spinlocks with pv ticketlocks
>
> From: Jeremy Fitzhardinge 
I'm not sure what the etiquette is here; I did the work while at Citrix,
but jer...@goop.org is my canonical email address.  The Citrix address
is dead and bounces, so is useless for anything.  Probably best to
change it.

J

>
> Rather than outright replacing the entire spinlock implementation in
> order to paravirtualize it, keep the ticket lock implementation but add
> a couple of pvops hooks on the slow patch (long spin on lock, unlocking
> a contended lock).
>
> Ticket locks have a number of nice properties, but they also have some
> surprising behaviours in virtual environments.  They enforce a strict
> FIFO ordering on cpus trying to take a lock; however, if the hypervisor
> scheduler does not schedule the cpus in the correct order, the system can
> waste a huge amount of time spinning until the next cpu can take the lock.
>
> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
>
> To address this, we add two hooks:
>  - __ticket_spin_lock which is called after the cpu has been
>spinning on the lock for a significant number of iterations but has
>failed to take the lock (presumably because the cpu holding the lock
>has been descheduled).  The lock_spinning pvop is expected to block
>the cpu until it has been kicked by the current lock holder.
>  - __ticket_spin_unlock, which on releasing a contended lock
>(there are more cpus with tail tickets), it looks to see if the next
>cpu is blocked and wakes it if so.
>
> When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
> functions causes all the extra code to go away.
>
> Signed-off-by: Jeremy Fitzhardinge 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Tested-by: Attilio Rao 
> [ Raghavendra: Changed SPIN_THRESHOLD ]
> Signed-off-by: Raghavendra K T 
> ---
>  arch/x86/include/asm/paravirt.h   |   32 
>  arch/x86/include/asm/paravirt_types.h |   10 ++
>  arch/x86/include/asm/spinlock.h   |   53 
> +++--
>  arch/x86/include/asm/spinlock_types.h |4 --
>  arch/x86/kernel/paravirt-spinlocks.c  |   15 +
>  arch/x86/xen/spinlock.c   |8 -
>  6 files changed, 61 insertions(+), 61 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index cfdc9ee..040e72d 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
> fixed_addresses */ idx,
>  
>  #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
>  
> -static inline int arch_spin_is_locked(struct arch_spinlock *lock)
> +static __always_inline void __ticket_lock_spinning(struct arch_spinlock 
> *lock,
> + __ticket_t ticket)
>  {
> - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
> + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
>  }
>  
> -static inline int arch_spin_is_contended(struct arch_spinlock *lock)
> +static __always_inline void ticket_unlock_kick(struct arch_spinlock 
> *lock,
> + __ticket_t ticket)
>  {
> - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
> -}
> -#define arch_spin_is_contended   arch_spin_is_contended
> -
> -static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
> -{
> - PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
> -}
> -
> -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
> -   unsigned long flags)
> -{
> - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
> -}
> -
> -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
> -{
> - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
> -}
> -
> -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
> -{
> - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
> + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
>  }
>  
>  #endif
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 0db1fca..d5deb6d 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -327,13 +327,11 @@ struct pv_mmu_ops {
>  };
>  
>  struct arch_spinlock;
> +#include 
> +
>  struct pv_lock_ops {
> - int (*spin_is_locked)(st

Re: [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
(resent without HTML)

On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:
> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
>   The inline assembly for the bit operations has been changed to remove
>   explicit sizing hints on the instructions, so the assembler will pick
>   the appropriate instruction forms depending on the architecture and
>   the context.
>
> Unfortunately, GNU as does no such thing, and the AT&T syntax manual
> [1] contains no references to any such inference.  As evidenced by the
> following experiment, gas always disambiguates btr/bts to btrl/btsl.
> Feed the following input to gas:
>
>   btrl$1, 0
>   btr $1, 0
>   btsl$1, 0
>   bts $1, 0

When I originally did those patches, I was careful make sure that we
didn't give implied sizes to operations with only immediate and/or
memory operands because - in general - gas can't infer the operation
size from such operands. However, in the case of the bit test/set
operations, the memory access size is not really derived from the
operation size (the SDM is a bit vague), and even if it were it would be
an operation rather than semantic difference.  So there's no real
problem with gas choosing 'l' as a default size in the absence of any
explicit override or constraint.

> Check that btr matches btrl, and bts matches btsl in both cases:
>
>   $ as --32 -a in.s
>   $ as --64 -a in.s
>
> To avoid giving readers the illusion of such an inference, and for
> clarity, change btr/bts back to btrl/btsl.  Also, llvm-mc refuses to
> disambiguate btr/bts automatically.

That sounds reasonable for all other operations because it makes a real
semantic difference, but overly strict for bit operations.

J


> [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf
>
> Cc: Jeremy Fitzhardinge 
> Cc: Andi Kleen 
> Cc: Linus Torvalds 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: Eli Friedman 
> Cc: Jim Grosbach 
> Cc: Stephen Checkoway 
> Cc: LLVMdev 
> Signed-off-by: Ramkumar Ramachandra 
> ---
>  We discussed this pretty extensively on LLVMDev, but I'm still not
>  sure that I haven't missed something.
>
>  arch/x86/include/asm/bitops.h | 16 
>  arch/x86/include/asm/percpu.h |  2 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 6dfd019..6ed3d1e 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>   : "iq" ((u8)CONST_MASK(nr))
>   : "memory");
>   } else {
> - asm volatile(LOCK_PREFIX "bts %1,%0"
> + asm volatile(LOCK_PREFIX "btsl %1,%0"
>   : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
>   }
>  }
> @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>   */
>  static inline void __set_bit(int nr, volatile unsigned long *addr)
>  {
> - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
> + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory");
>  }
>  
>  /**
> @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
>   : CONST_MASK_ADDR(nr, addr)
>   : "iq" ((u8)~CONST_MASK(nr)));
>   } else {
> - asm volatile(LOCK_PREFIX "btr %1,%0"
> + asm volatile(LOCK_PREFIX "btrl %1,%0"
>   : BITOP_ADDR(addr)
>   : "Ir" (nr));
>   }
> @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile 
> unsigned long *addr)
>  
>  static inline void __clear_bit(int nr, volatile unsigned long *addr)
>  {
> - asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
> + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr));
>  }
>  
>  /*
> @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile 
> unsigned long *addr)
>  {
>   int oldbit;
>  
> - asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
> + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t"
>"sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
>  
>   return oldbit;
> @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile 
> unsigned long *addr)
>  {
>   int ol

Re: [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
(Resent without HTML)

On 07/14/2013 10:19 AM, Linus Torvalds wrote:
> Now, there are possible cases where you want to make the size explicit
> because you are mixing memory operand sizes and there can be nasty
> performance implications of doing a 32-bit write and then doing a
> 64-bit read of the result. I'm not actually aware of us having ever
> worried/cared about it, but it's a possible source of trouble: mixing
> bitop instructions with non-bitop instructions can have some subtle
> interactions, and you need to be careful, since the size of the
> operand affects both the offset *and* the memory access size.
The SDM entry for BT mentions that the instruction may touch 2 or 4
bytes depending on the operand size, but doesn't specifically mention
that a 64 bit operation size touches 8 bytes - and it doesn't mention
anything at all about operand size and access size in BTR/BTS/BTC
(unless it's implied as part of the discussion about encoding the MSBs
of a constant bit offset in the offset of the addressing mode). Is that
an oversight?

>  The
> access size generally is meaningless from a semantic standpoint
> (little-endian being the only sane model), but the access size *can*
> have performance implications for the write queue forwarding.

It looks like that if the base address isn't aligned then neither is the
generated access, so you could get a protection fault if it overlaps a
page boundary, which is a semantic rather than purely operational
difference.

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
On 07/14/2013 12:30 PM, Tim Northover wrote:
>> And that is why I think you should just consider "bt $x,y" to be
>> trivially the same thing and not at all ambiguous. Because there is
>> ABSOLUTELY ZERO ambiguity when people write
>>
>>bt $63, mem
>>
>> Zero. Nada. None. The semantics are *exactly* the same for btl and btq
>> in this case, so why would you want the user to specify one or the
>> other?
> I don't think you've actually tested that, have you? (x86-64)
>
> int main() {
>   long val = 0x;
>   char res;
>
>   asm("btl $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
>   printf("%d\n", res);
>
>   asm("btq $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
>   printf("%d\n", res);
> }

Blerk.  It doesn't undermine the original point - that gas can
unambiguously choose the right operation size for a constant bit offset
- but yes, the operation size is meaningful in the case of a immediate
bit offset. Its pretty nasty of Intel to hide that detail in Table 3-2,
far from the instructions which use it...

J

>
> Tim.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Replace in linux-next the xen, xen-two, xen-arm with xen/tip.git tree instead.

2013-07-30 Thread Jeremy Fitzhardinge
On 07/30/2013 12:53 PM, Konrad Rzeszutek Wilk wrote:
> Hey,
>
> I was wondering if it would be possible to remove from linux-next
> the three xen trees and instead use a combined tree, similar to the
> x86 tip (so the various maintainers share it)?
>
> The ones that would be removed are:
>
> xen   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git#upstream/xen
> xen-two   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git#linux-next
> xen-arm   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git#linux-next
>
> And instead it would be pulled from:
>
> xen-tip   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git#linux-next
>
> I presume you need Ack's from all of us (so Jeremy and Stefano) so CC-ing 
> them here.
>
> And Acked-by: Konrad Rzeszutek Wilk 
>

Ack from me.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-13 Thread Jeremy Fitzhardinge
On 08/13/2013 01:02 PM, Raghavendra K T wrote:
> * Ingo Molnar  [2013-08-13 18:55:52]:
>
>> Would be nice to have a delta fix patch against tip:x86/spinlocks, which 
>> I'll then backmerge into that series via rebasing it.
>>
> There was a namespace collision of PER_CPU lock_waiting variable when
> we have both Xen and KVM enabled. 
>
> Perhaps this week wasn't for me. Had run 100 times randconfig in a loop
> for the fix sent earlier :(. 
>
> Ingo, below delta patch should fix it, IIRC, I hope you will be folding this
> back to patch 14/14 itself. Else please let me.
> I have already run allnoconfig, allyesconfig, randconfig with below patch. 
> But will
> test again. This should apply on top of tip:x86/spinlocks.
>
> ---8<---
> From: Raghavendra K T 
>
> Fix Namespace collision for lock_waiting
>
> Signed-off-by: Raghavendra K T 
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index d442471..b8ef630 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -673,7 +673,7 @@ struct kvm_lock_waiting {
>  static cpumask_t waiting_cpus;
>  
>  /* Track spinlock on which a cpu is waiting */
> -static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
> +static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);

Has static stopped meaning static?

J

>  
>  static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>  {
> @@ -685,7 +685,7 @@ static void kvm_lock_spinning(struct arch_spinlock *lock, 
> __ticket_t want)
>   if (in_nmi())
>   return;
>  
> - w = &__get_cpu_var(lock_waiting);
> + w = &__get_cpu_var(klock_waiting);
>   cpu = smp_processor_id();
>   start = spin_time_start();
>  
> @@ -756,7 +756,7 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
> __ticket_t ticket)
>  
>   add_stats(RELEASED_SLOW, 1);
>   for_each_cpu(cpu, &waiting_cpus) {
> - const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> + const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu);
>   if (ACCESS_ONCE(w->lock) == lock &&
>   ACCESS_ONCE(w->want) == ticket) {
>   add_stats(RELEASED_SLOW_KICKED, 1);
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] MAINTAINERS: Remove Jeremy from the Xen subsystem.

2013-08-13 Thread Jeremy Fitzhardinge
On 08/05/2013 11:05 AM, Konrad Rzeszutek Wilk wrote:
> Jeremy has been a key person in making Linux work with Xen.
> He has been enjoying the last year working on something
> different so reflect that in the maintainers file.

Ack.

J
>
> CC: Jeremy Fitzhardinge 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  CREDITS | 1 +
>  MAINTAINERS | 1 -
>  2 files changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/CREDITS b/CREDITS
> index 206d0fc..646a0a9 100644
> --- a/CREDITS
> +++ b/CREDITS
> @@ -1120,6 +1120,7 @@ D: author of userfs filesystem
>  D: Improved mmap and munmap handling
>  D: General mm minor tidyups
>  D: autofs v4 maintainer
> +D: Xen subsystem
>  S: 987 Alabama St
>  S: San Francisco
>  S: CA, 94110
> diff --git a/MAINTAINERS b/MAINTAINERS
> index defc053..440af74 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9237,7 +9237,6 @@ F:  drivers/media/tuners/tuner-xc2028.*
>  
>  XEN HYPERVISOR INTERFACE
>  M:   Konrad Rzeszutek Wilk 
> -M:   Jeremy Fitzhardinge 
>  L:   xen-de...@lists.xensource.com (moderated for non-subscribers)
>  L:   virtualizat...@lists.linux-foundation.org
>  S:   Supported

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-17 Thread Jeremy Fitzhardinge

Joel Becker wrote:
  
Unfortunately that doesn't narrow down what the kernel was actually  
trying to do at the time.  Clearly a set_pte; looks like someone is  
trying to create a writable mapping of an existing pte page.


Does "console=hvc0 earlyprintk=xen" on the kernel command line give any  
clue about how far it gets before crashing?



  


I built a kernel using your .config here, but I can't reproduce the 
problem.  It makes it all the way to trying to start init (failed at 
that point because I didn't create an initrd with the xvd module to 
mount /).



Console is already hvc0, but earlyprintk gets us:

--8<-
Reserving virtual address space above 0xf57fe000
Linux version 2.6.25-rc2-bisectme ([EMAIL PROTECTED]) (gcc
version 4.1.2 20070626 (Red Hat 4.1.2-14)) #21 SMP Fri Feb 15 16:28:35
PST 2008
ACPI in unprivileged domain disabled
BIOS-provided physical RAM map:
 Xen:  - 7800 (usable)
console [xenboot0] enabled
1192MB HIGHMEM available.
727MB LOWMEM available.
Started domain ca-test58
Scan SMP from c000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f for 65536 bytes.
NX (Execute Disable) protection: active
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   186366
  HighMem186366 ->   491520
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0:0 ->   491520
-->8-

That's it.
  


I get:

Entering add_active_range(0, 0, 16384) 0 entries of 256 used
Zone PFN ranges:
 DMA 0 -> 4096
 Normal   4096 ->16384
 HighMem 16384 ->16384
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
   0:0 ->16384
On node 0 totalpages: 16384
 DMA zone: 32 pages used for memmap
 DMA zone: 0 pages reserved
 DMA zone: 4064 pages, LIFO batch:0
 Normal zone: 96 pages used for memmap
 Normal zone: 12192 pages, LIFO batch:1
 HighMem zone: 0 pages used for memmap
 Movable zone: 0 pages used for memmap
...


What happens if you give the domain less memory?

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-20 Thread Jeremy Fitzhardinge

Ian Campbell wrote:

On Tue, 2008-02-19 at 23:43 -0800, H. Peter Anvin wrote:
  

Ian Campbell wrote:


On Mon, 2008-02-18 at 02:40 -0800, Joel Becker wrote:
  

On Sun, Feb 17, 2008 at 06:49:21PM +, Ian Campbell wrote:


x86/xen: Do not scan for DMI unless the DMI region is reserved by e820.
  

This fixed it.  I'm now booting successfully.  Thank you!


Excellent. Jeremy, are you happy for this to go in?
  


I had no problem with it, but Peter's objection seems substantial enough.


As far as the actual change goes I was assuming that any machine that
has DMI/SMBIOS would easily be new enough to have an E820 which could be
expected to reserve this region. Looks like I was mistaken about how
long E820 had been around and/or how reliably it is used to reserve the
tables.

Anyway, will have to think of another solution.
  


Well, the way we've handled this kind of thing elsewhere is to just 
reserve that pseudophys address space in earlish Xen init code and fill 
it with not-DMI things (zero, I guess).  It's a bit of a waste of 
memory, but maybe we can recover it once DMI has given up and gone 
away.  This also makes it easy to insert faked-up DMI info if that turns 
out to be useful.



   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/11] xen: move arch/x86/xen/events.c undedr drivers/xen and split out arch specific part.

2008-02-21 Thread Jeremy Fitzhardinge

[EMAIL PROTECTED] wrote:

diff --git a/arch/x86/xen/events.c b/drivers/xen/events.c
similarity index 95%
rename from arch/x86/xen/events.c
rename to drivers/xen/events.c
index dcf613e..7474739 100644
--- a/arch/x86/xen/events.c
+++ b/drivers/xen/events.c
@@ -37,7 +37,9 @@
 #include 
 #include 
 
-#include "xen-ops.h"

+#ifdef CONFIG_X86
+# include "../arch/x86/xen/xen-ops.h"
+#endif


Hm.  Perhaps it would be better to move whatever definition you need 
into a header in a common place (or move xen-ops.h entirely).


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/11] xen: add missing definitions for xen grant table which ia64/xen needs.

2008-02-21 Thread Jeremy Fitzhardinge

[EMAIL PROTECTED] wrote:

Yep.  We removed the guest handle stuff for the initial upstreaming, 
since it isn't necessary on x86 and it quietened some of the reviewer 
noise.  But I expected we'd need to reintroduce it at some stage.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer

2008-02-21 Thread Jeremy Fitzhardinge

Markus Armbruster wrote:

Forgot to mention: This patch depends on

Subject: [PATCH] xen: Make xen-blkfront write its protocol ABI to xenstore
From: Markus Armbruster <>
Date: Thu, 06 Dec 2007 14:45:53 +0100

http://lkml.org/lkml/2007/12/6/132

Sorry!


Sorry, I haven't pushed this upstream yet, since there didn't seem to be 
any particular urgency.  What's the dependency?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/11] Xen arch portability patches

2008-02-21 Thread Jeremy Fitzhardinge

[EMAIL PROTECTED] wrote:
Hi. Recently the xen-ia64 community started to make efforts to merge 
xen/ia64 Linux to upstream. The first step is to merge up domU portion.

This patchset is preliminary for xen/ia64 domU linux making the current
xen/x86 domU code more arch generic and adding missing definitions and
files.
  


I haven't looked at the whole series yet, but this seems fine in 
principle.  One thing: using attachments to post makes it hard to do 
inline comments on the patches.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer driver

2008-02-21 Thread Jeremy Fitzhardinge

Markus Armbruster wrote:

This is a pair of Xen para-virtual frontend device drivers:
drivers/video/xen-fbfront.c provides a framebuffer, and
drivers/input/xen-kbdfront provides keyboard and mouse.
  


Unless they're actually inter-dependent, could you post this as two 
separate patches?  I don't know anything about these parts of the 
kernel, so it would be nice to make it very obvious which changes are fb 
vs mouse/keyboard.


(I guess input/* vs video/* should make it obvious, but it looks like 
input has a config dependency on fb, so I'll avoid making too many 
presumptions...)


(Couple of comments below)

   J


The backends run in dom0 user space.

Signed-off-by: Markus Armbruster <[EMAIL PROTECTED]>

---

 drivers/input/Kconfig|9 
 drivers/input/Makefile   |2 
 drivers/input/xen-kbdfront.c |  337 +++
 drivers/video/Kconfig|   14 
 drivers/video/Makefile   |1 
 drivers/video/xen-fbfront.c  |  550 +++

 include/xen/interface/io/fbif.h  |  124 
 include/xen/interface/io/kbdif.h |  114 
 8 files changed, 1151 insertions(+)

diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig
index 9dea14d..5f9d860 100644
--- a/drivers/input/Kconfig
+++ b/drivers/input/Kconfig
@@ -149,6 +149,15 @@ config INPUT_APMPOWER
  To compile this driver as a module, choose M here: the
  module will be called apm-power.
 
+config XEN_KBDDEV_FRONTEND

+   tristate "Xen virtual keyboard and mouse support"
+   depends on XEN_FBDEV_FRONTEND
+   default y
+   help
+ This driver implements the front-end of the Xen virtual
+ keyboard and mouse device driver.  It communicates with a back-end
+ in another domain.
+
 comment "Input Device Drivers"
 
 source "drivers/input/keyboard/Kconfig"

diff --git a/drivers/input/Makefile b/drivers/input/Makefile
index 2ae87b1..98c4f9a 100644
--- a/drivers/input/Makefile
+++ b/drivers/input/Makefile
@@ -23,3 +23,5 @@ obj-$(CONFIG_INPUT_TOUCHSCREEN)   += touchscreen/
 obj-$(CONFIG_INPUT_MISC)   += misc/
 
 obj-$(CONFIG_INPUT_APMPOWER)	+= apm-power.o

+
+obj-$(CONFIG_XEN_KBDDEV_FRONTEND)  += xen-kbdfront.o
diff --git a/drivers/input/xen-kbdfront.c b/drivers/input/xen-kbdfront.c
new file mode 100644
index 000..84f65cf
--- /dev/null
+++ b/drivers/input/xen-kbdfront.c
@@ -0,0 +1,337 @@
+/*
+ * Xen para-virtual input device
+ *
+ * Copyright (C) 2005 Anthony Liguori <[EMAIL PROTECTED]>
+ * Copyright (C) 2006-2008 Red Hat, Inc., Markus Armbruster <[EMAIL PROTECTED]>
+ *
+ *  Based on linux/drivers/input/mouse/sermouse.c
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License. See the file COPYING in the main directory of this archive for
+ *  more details.
+ */
+
+/*
+ * TODO:
+ *
+ * Switch to grant tables together with xen-fbfront.c.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct xenkbd_info {
+   struct input_dev *kbd;
+   struct input_dev *ptr;
+   struct xenkbd_page *page;
+   int evtchn, irq;
+   struct xenbus_device *xbdev;
+   char phys[32];
+};
+
+static int xenkbd_remove(struct xenbus_device *);
+static int xenkbd_connect_backend(struct xenbus_device *, struct xenkbd_info 
*);
+static void xenkbd_disconnect_backend(struct xenkbd_info *);
+
+/*
+ * Note: if you need to send out events, see xenfb_do_update() for how
+ * to do that.
+ */
+
+static irqreturn_t input_handler(int rq, void *dev_id)
+{
+   struct xenkbd_info *info = dev_id;
+   struct xenkbd_page *page = info->page;
+   __u32 cons, prod;
+
+   prod = page->in_prod;
+   if (prod == page->in_cons)
+   return IRQ_HANDLED;
+   rmb();  /* ensure we see ring contents up to prod */
+   for (cons = page->in_cons; cons != prod; cons++) {
+   union xenkbd_in_event *event;
+   struct input_dev *dev;
+   event = &XENKBD_IN_RING_REF(page, cons);
+
+   dev = info->ptr;
+   switch (event->type) {
+   case XENKBD_TYPE_MOTION:
+   input_report_rel(dev, REL_X, event->motion.rel_x);
+   input_report_rel(dev, REL_Y, event->motion.rel_y);
+   break;
+   case XENKBD_TYPE_KEY:
+   dev = NULL;
+   if (test_bit(event->key.keycode, info->kbd->keybit))
+   dev = info->kbd;
+   if (test_bit(event->key.keycode, info->ptr->keybit))
+   dev = info->ptr;
+   if (dev)
+   input_report_key(dev, event->key.keycode,
+event->key.pressed);
+   else
+   printk(KERN_WARNING
+   

[PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-21 Thread Jeremy Fitzhardinge

The below implements the getgeo hook for Xen block devices. Extracted
from the xen-unstable tree where it has been used for ages.

It is useful to have because it allows things like grub2 (used by the
Debian installer images) to work in a guest domain without having to
sprinkle Xen specific hacks around the place.

Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
From: Ian Campbell <[EMAIL PROTECTED]>
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
drivers/block/xen-blkfront.c |   18 ++
1 file changed, 18 insertions(+)

===
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -37,6 +37,7 @@

#include 
#include 
+#include 
#include 

#include 
@@ -134,6 +135,22 @@ static void blkif_restart_queue_callback
{
struct blkfront_info *info = (struct blkfront_info *)arg;
schedule_work(&info->work);
+}
+
+int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
+{
+   /* We don't have real geometry info, but let's at least return
+  values consistent with the size of the device */
+   sector_t nsect = get_capacity(bd->bd_disk);
+   sector_t cylinders = nsect;
+
+   hg->heads = 0xff;
+   hg->sectors = 0x3f;
+   sector_div(cylinders, hg->heads * hg->sectors);
+   hg->cylinders = cylinders;
+   if ((sector_t)(hg->cylinders + 1) * hg->heads * hg->sectors < nsect)
+   hg->cylinders = 0x;
+   return 0;
}

/*
@@ -946,6 +963,7 @@ static struct block_device_operations xl
.owner = THIS_MODULE,
.open = blkif_open,
.release = blkif_release,
+   .getgeo = blkif_getgeo,
};



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-21 Thread Jeremy Fitzhardinge

Ian Campbell wrote:

I'll see if I can track down where the page is getting used and have a
go at getting in there first. It must be pretty early to be allocated
already when dmi_scan_machine gets called.

It's possible that the domain builder might have already allocated a PT
at this address. I haven't checked but I think currently the domain
builder always puts PT pages after the kernel so hopefully it's only a
theoretical problem.
  


Yes, it does.  And presumably the early pagetable builder is guaranteed 
to avoid special memory like the DMI space.  But the bug definitely 
seems to be a result of the DMI code trying to make a RW mapping of a 
pagetable page, so something is amiss there.


Ooh, sleazy hack idea: make DMI always map RO, so even if it does get a 
pagetable it causes no complaint...  A bit awkward, since there doesn't 
seem to be an RO form of early_ioremap.



Another option I was thinking of was a command line option to disable
DMI, which (maybe) isn't terribly useful in itself but it introduces an
associated variable to frob with. That's similar to how the TSC was
handled in the past (well, the opposite since TSC was forced on).
  


Yep, that would work too.

Still curious about why a pagetable page is ending up in that range 
though.  Seems like it shouldn't be possible, since we shouldn't be 
allowed to allocate from those pages, at least until the DMI probe has 
happened...  Unless the early allocator is only excluded from e820 
reserved pages, which would cause a problem on systems which don't 
reserve the DMI space...  HPA?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-21 Thread Jeremy Fitzhardinge

Linus Torvalds wrote:

On Thu, 21 Feb 2008, Jeremy Fitzhardinge wrote:
  

Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
From: Ian Campbell <[EMAIL PROTECTED]>
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>



This is just wrong. The From: goes at the *top*, and if it's not there, 
my scripts won't pick it up as the author. 


OK.  Have you fixed it, or shall I resend?

Thanks,
   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-21 Thread Jeremy Fitzhardinge

H. Peter Anvin wrote:
Still curious about why a pagetable page is ending up in that range 
though.  Seems like it shouldn't be possible, since we shouldn't be 
allowed to allocate from those pages, at least until the DMI probe 
has happened...  Unless the early allocator is only excluded from 
e820 reserved pages, which would cause a problem on systems which 
don't reserve the DMI space...  HPA?




I thought the problem was a Xen-provided pagetable from before Linux 
started? 


Hm, I don't think so.  The domain-builder pagetable is put after the 
kernel, so it shouldn't be under 1M.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-21 Thread Jeremy Fitzhardinge

Linus Torvalds wrote:

On Thu, 21 Feb 2008, Jeremy Fitzhardinge wrote:
  

OK.  Have you fixed it, or shall I resend?



I'll fix it, but I want people to know so that I don't have to fix things 
like this in the future (*).


Linus

(*) I keed, I keed. Of *course* I'll have to fix things like this in the 
future too. But hopefully not quite as often.
  


Putting the From: in the Signed-off-by block is a result of two thoughts:

  1. putting it at the top makes the most sense from an email
 perspective, but it often seem to get lost by various
 patch-posting programs if it gets tangled in the Subject/summary
 part of the patch.  The result is that it needs to float in an odd
 way:

 Subject: wooble the foo

 From: Foo Woobler <[EMAIL PROTECTED]>

 Wooble foos in the appropriate manner.

 Signed-off-by: Foo Woobler <[EMAIL PROTECTED]>
 Cc: Bar Mangler <[EMAIL PROTECTED]> 
 


  2. There's already a block of email addresses which describe how
 people relate to this patch, so why not put From: there (since it
 isn't really an email From header, but a patch metadata header). 
 I'd assumed that tools which pick "Thing: Email" pairs out of a
 patch would deal with From in the same place as a Signed-off-by. 
 After all, tools deal with Cc:s there.



I'll make sure From: is in the right place in future, but I just wanted 
to point out it wasn't complete randomness.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-21 Thread Jeremy Fitzhardinge

H. Peter Anvin wrote:

Jeremy Fitzhardinge wrote:


It seems to me that those pages are being handed out as heap pages by 
the early allocator.  In the Xen case this is OK because there's 
nothing magic about them.  But if real hardware doesn't reserve these 
pages in the E820 map, then they could end up being used as regular 
memory by mistake, which is an issue.




No, they couldn't.

On real hardware they'll be memory types 0 or 2, depending on whether 
or not they're marked reserved.


Available RAM is type 1. 


OK.  Well, perhaps Ian's patch could be amended to test to see if the 
e820 map marks the ISA ROM region as normal RAM, and skip it if so?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] netvm: check for page == NULL when propogating the skb->pfmemalloc flag

2012-08-13 Thread Jeremy Fitzhardinge
On 08/13/2012 03:47 AM, Mel Gorman wrote:
> Resending to correct Jeremy's address.
>
> On Wed, Aug 08, 2012 at 03:50:46PM -0700, David Miller wrote:
>> From: Mel Gorman 
>> Date: Tue, 7 Aug 2012 09:55:55 +0100
>>
>>> Commit [c48a11c7: netvm: propagate page->pfmemalloc to skb] is responsible
>>> for the following bug triggered by a xen network driver
>>  ...
>>> The problem is that the xenfront driver is passing a NULL page to
>>> __skb_fill_page_desc() which was unexpected. This patch checks that
>>> there is a page before dereferencing.
>>>
>>> Reported-and-Tested-by: Konrad Rzeszutek Wilk 
>>> Signed-off-by: Mel Gorman 
>> That call to __skb_fill_page_desc() in xen-netfront.c looks completely bogus.
>> It's the only driver passing NULL here.
>>
>> That whole song and dance figuring out what to do with the head
>> fragment page, depending upon whether the length is greater than the
>> RX_COPY_THRESHOLD, is completely unnecessary.
>>
>> Just use something like a call to __pskb_pull_tail(skb, len) and all
>> that other crap around that area can simply be deleted.
> I looked at this for a while but I did not see how __pskb_pull_tail()
> could be used sensibly but I'm simily not familiar with writing network
> device drivers or Xen.
>
> This messing with RX_COPY_THRESHOLD seems to be related to how the frontend
> and backend communicate (maybe some fixed limitation of the xenbus). The
> existing code looks like it is trying to take the fragments received and
> pass them straight to the backend without copying by passing the fragments
> to the backend without copying. I worry that if I try converting this to
> __pskb_pull_tail() that it would either hit the limitation of xenbus or
> introduce copying where it is not wanted.
>
> I'm going to have to punt this to Jeremy and the other Xen folk as I'm not
> sure what the original intention was and I don't have a Xen setup anywhere
> to test any patch. Jeremy, xen folk? 

It's been a while since I've looked at that stuff, but as I remember,
the issue is that since the packet ring memory is shared with another
domain which may be untrustworthy, we want to make copies of the headers
before making any decisions based on them so that the other domain can't
change them after header processing but before they're actually sent. 
(The packet payload is considered less important, but of course the same
issue applies if you're using some kind of content-aware packet filter.)

So that's the rationale for always copying RX_COPY_THRESHOLD, even if
the packet is larger than that amount.  As far as I know, changing this
behaviour wouldn't break the ring protocol, but it does introduce a
potential security issue.

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 52/74] x86, lto, paravirt: Don't rely on local assembler labels

2012-08-19 Thread Jeremy Fitzhardinge
On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen 
>
> The paravirt patching code assumes that it can reference a
> local assembler label between two different top level assembler
> statements. This does not work with some experimental gcc builds,
> where the assembler code may end up in different assembler files.

Egad, what are those zany gcc chaps up to now?

J

>
> Replace it with extern / global /asm linkage labels.
>
> This also removes one redundant copy of the macro.
>
> Cc: jer...@goop.org
> Signed-off-by: Andi Kleen 
> ---
>  arch/x86/include/asm/paravirt_types.h |9 +
>  arch/x86/kernel/paravirt.c|5 -
>  2 files changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 4f262bc..6a464ba 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops;
>   _paravirt_alt(insn_string, "%c[paravirt_typenum]", 
> "%c[paravirt_clobber]")
>  
>  /* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code)  \
> - extern const char start_##ops##_##name[] __visible, \
> -   end_##ops##_##name[] __visible;   \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> +#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b 
> ":\n\t"
> +
> +#define DEF_NATIVE(ops, name, code)  \
> + __visible extern const char start_##ops##_##name[], 
> end_##ops##_##name[];   \
> + asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, 
> name))
>  
>  unsigned paravirt_patch_nop(void);
>  unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 17fff18..947255e 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -62,11 +62,6 @@ void __init default_banner(void)
>  pv_info.name);
>  }
>  
> -/* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code)  \
> - extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> -
>  /* Undefined instruction for dealing with missing ops pointers. */
>  static const unsigned char ud2a[] = { 0x0f, 0x0b };
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 53/74] x86, lto, paravirt: Make paravirt thunks global

2012-08-19 Thread Jeremy Fitzhardinge
On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen 
>
> The paravirt thunks use a hack of using a static reference to a static
> function to reference that function from the top level statement.
>
> This assumes that gcc always generates static function names in a specific
> format, which is not necessarily true.
>
> Simply make these functions global and asmlinkage. This way the
> static __used variables are not needed and everything works.

I'm not a huge fan of unstaticing all this stuff, but it doesn't
surprise me that the current code is brittle in the face of gcc changes.

J

>
> Changed in paravirt and in all users (Xen and vsmp)
>
> Cc: jer...@goop.org
> Signed-off-by: Andi Kleen 
> ---
>  arch/x86/include/asm/paravirt.h |2 +-
>  arch/x86/kernel/vsmp_64.c   |8 
>  arch/x86/xen/irq.c  |8 
>  arch/x86/xen/mmu.c  |   16 
>  4 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index a0facf3..cc733a6 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -804,9 +804,9 @@ static __always_inline void arch_spin_unlock(struct 
> arch_spinlock *lock)
>   */
>  #define PV_CALLEE_SAVE_REGS_THUNK(func)  
> \
>   extern typeof(func) __raw_callee_save_##func;   \
> - static void *__##func##__ __used = func;\
>   \
>   asm(".pushsection .text;"   \
> + ".globl __raw_callee_save_" #func " ; " \
>   "__raw_callee_save_" #func ": " \
>   PV_SAVE_ALL_CALLER_REGS \
>   "call " #func ";"   \
> diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
> index 992f890..f393d6d 100644
> --- a/arch/x86/kernel/vsmp_64.c
> +++ b/arch/x86/kernel/vsmp_64.c
> @@ -33,7 +33,7 @@
>   * and vice versa.
>   */
>  
> -static unsigned long vsmp_save_fl(void)
> +asmlinkage unsigned long vsmp_save_fl(void)
>  {
>   unsigned long flags = native_save_fl();
>  
> @@ -43,7 +43,7 @@ static unsigned long vsmp_save_fl(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl);
>  
> -static void vsmp_restore_fl(unsigned long flags)
> +asmlinkage void vsmp_restore_fl(unsigned long flags)
>  {
>   if (flags & X86_EFLAGS_IF)
>   flags &= ~X86_EFLAGS_AC;
> @@ -53,7 +53,7 @@ static void vsmp_restore_fl(unsigned long flags)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl);
>  
> -static void vsmp_irq_disable(void)
> +asmlinkage void vsmp_irq_disable(void)
>  {
>   unsigned long flags = native_save_fl();
>  
> @@ -61,7 +61,7 @@ static void vsmp_irq_disable(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable);
>  
> -static void vsmp_irq_enable(void)
> +asmlinkage void vsmp_irq_enable(void)
>  {
>   unsigned long flags = native_save_fl();
>  
> diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> index 1573376..3dd8831 100644
> --- a/arch/x86/xen/irq.c
> +++ b/arch/x86/xen/irq.c
> @@ -21,7 +21,7 @@ void xen_force_evtchn_callback(void)
>   (void)HYPERVISOR_xen_version(0, NULL);
>  }
>  
> -static unsigned long xen_save_fl(void)
> +asmlinkage unsigned long xen_save_fl(void)
>  {
>   struct vcpu_info *vcpu;
>   unsigned long flags;
> @@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);
>  
> -static void xen_restore_fl(unsigned long flags)
> +asmlinkage void xen_restore_fl(unsigned long flags)
>  {
>   struct vcpu_info *vcpu;
>  
> @@ -66,7 +66,7 @@ static void xen_restore_fl(unsigned long flags)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);
>  
> -static void xen_irq_disable(void)
> +asmlinkage void xen_irq_disable(void)
>  {
>   /* There's a one instruction preempt window here.  We need to
>  make sure we're don't switch CPUs between getting the vcpu
> @@ -77,7 +77,7 @@ static void xen_irq_disable(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
>  
> -static void xen_irq_enable(void)
> +asmlinkage void xen_irq_enable(void)
>  {
>   struct vcpu_info *vcpu;
>  
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index b65a761..9f82443 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -429,7 +429,7 @@ static pteval_t iomap_pte(pteval_t val)
>   return val;
>  }
>  
> -static pteval_t xen_pte_val(pte_t pte)
> +asmlinkage pteval_t xen_pte_val(pte_t pte)
>  {
>   pteval_t pteval = pte.pte;
>  #if 0
> @@ -446,7 +446,7 @@ static pteval_t xen_pte_val(pte_t pte)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
>  
> -static pgdval_t xen_pgd_val(pgd_t pgd)
> +asmlinkage pgdval_t xen_pgd_val(pgd_t pgd)
>  {
>   return pte_mfn_to_pfn(pgd.pgd);
>  }
> @@ 

Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

As paravirtualized xen guests won't work with !X86_PAE, change the Kconfig
accordingly.
  


!PAE is supposed to work, but it is a rarely used configuration.  How 
does it fail?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

This is with 2.6.24.2, but latest-git looks the same:
I also tried with 2.6.23 which crashes instantly, without any output of the 
guest.
  


I'm not too surprised.  Non-PAE Xen is a bit of a rarity, and it only 
gets tested rarely.  Chris Wright did spend some time on it a while ago, 
but I don't know that its had any real attention since.  I've been 
making sure non-PAE compiles, but I've been lax about testing it.


This is the first usermode exec, I guess?  The backtrace is a bit odd; 
I've never seen a problem in move_page_tables before.


Does "xm dmesg" tell you what Xen is complaining about?  You may need to 
compile with debug=y in Config.mk.



[0.599806] 1 multicall(s) failed: cpu 0
[0.599816]   call  1/2: op=26 arg=[c1051860] result=0
[0.599825]   call  2/2: op=14 arg=[bf9c7000] result=-22
[0.599841] [ cut here ]
[0.599851] kernel BUG at arch/x86/xen/multicalls.c:103!
[0.599861] invalid opcode:  [#1] SMP
[0.599871] Modules linked in:
[0.599879]
[0.599885] Pid: 1, comm: init Not tainted (2.6.24.2 #6)
[0.599895] EIP: 0061:[] EFLAGS: 00010202 CPU: 0
[0.599910] EIP is at xen_mc_flush+0x19c/0x1b0
[0.599919] EAX:  EBX: c10510a0 ECX: c1051060 EDX: c1051060
[0.599930] ESI: 0002 EDI: 0001 EBP: c2417c10 ESP: c2417be4
[0.599940]  DS: 007b ES: 007b FS: 00d8 GS:  SS: e021
[0.599951] Process init (pid: 1, ti=c2417000 task=c2416ab0 task.ti=c2417000)
[0.599960] Stack: c0443c98 0002 0002 000e bf9c7000 ffea 
c1051060 0200
[0.599984]0067 c193fffc bf9c7000 c2417c18 c0101112 c2417c5c 
c0166dfc c193ce40
[0.66]c193e5c0 c000 c193e5c0 1000 c000 c193ce40 
c198e71c c10331cc
[0.600029] Call Trace:
[0.600036]  [] show_trace_log_lvl+0x1a/0x30
[0.600050]  [] show_stack_log_lvl+0xa9/0xd0
[0.600062]  [] show_registers+0xca/0x1e0
[0.600074]  [] die+0x11a/0x250
[0.600085]  [] do_trap+0x83/0xb0
[0.600096]  [] do_invalid_op+0x88/0xa0
[0.600108]  [] error_code+0x72/0x80
[0.600121]  [] xen_leave_lazy+0x12/0x20
[0.600134]  [] move_page_tables+0x27c/0x300
[0.600149]  [] setup_arg_pages+0x162/0x2a0
[0.600162]  [] load_elf_binary+0x3d3/0x1bd0
[0.600175]  [] search_binary_handler+0x92/0x200
[0.600190]  [] load_script+0x1bf/0x200
[0.600202]  [] search_binary_handler+0x92/0x200
[0.600215]  [] do_execve+0x15b/0x180
[0.600227]  [] sys_execve+0x2e/0x80
[0.600241]  [] syscall_call+0x7/0xb
[0.600253]  ===
[0.600259] Code: 24 08 89 44 24 0c 89 74 24 04 c7 04 24 98 3c 44 c0 e8 c9 36 02 
00 8b 45 ec 83 c3 20 8b 90 00 0b 00 00 39 d6 72 c0 e9 04 ff ff ff <0f> 0b eb fe 
0f 0b eb fe 8d b6 00 00 00 00 8d bf 00 00 00 00 55
[0.600370] EIP: [] xen_mc_flush+0x19c/0x1b0 SS:ESP e021:c2417be4
[0.600393] ---[ end trace a686db401f06e173 ]---
[0.600403] Kernel panic - not syncing: Attempted to kill init!

full dmesg, config here:
http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00716.html

Best regards,
Arnd Hannemann

  


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-22 Thread Jeremy Fitzhardinge

Linus Torvalds wrote:
This isn't a problem with things like "Signed-off-by:" etc tags, because 
they have no automated meaning and don't really change the commit itself, 
but the "From:"/"Date:"/"Subject:" markers at the head of the message 
really do have real meaning, and get removed from the commit message and 
instead get put into the SCM headers.
  


It may be worth having a definitive and unambiguous Author: tag then, 
which can appear among Signed-off-by:s and is used in preference to 
anything else.  From: is a useful heuristic which seems to work well in 
general, but as you say, it gets a bit hairy when you have something 
which means different things to different parts of the software stack at 
the same time.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

Jeremy Fitzhardinge wrote:
  

Arnd Hannemann wrote:


This is with 2.6.24.2, but latest-git looks the same:
I also tried with 2.6.23 which crashes instantly, without any output
of the guest.
  
  

I'm not too surprised.  Non-PAE Xen is a bit of a rarity, and it only
gets tested rarely.  Chris Wright did spend some time on it a while ago,
but I don't know that its had any real attention since.  I've been
making sure non-PAE compiles, but I've been lax about testing it.
This is the first usermode exec, I guess?  The backtrace is a bit odd;
I've never seen a problem in move_page_tables before.



Yes its trying to execute the first script in initramfs, I also tried with 
initramdisk
and got a similar error. (move_page_tables also involved)

  

Does "xm dmesg" tell you what Xen is complaining about?  You may need to
compile with debug=y in Config.mk.



(XEN) mm.c:645:d44 Non-privileged (44) attempt to map I/O space 

I will recompile with debug=y and post the output.
If I reduce the dom0 memory with dom0_mem=20 I see something like
0080 with dom0_mem=80 I always see .
  


That's helpful.  Looks like the mfn is getting mushed to 0.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer driver

2008-02-22 Thread Jeremy Fitzhardinge

Markus Armbruster wrote:

Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:

  

Markus Armbruster wrote:


This is a pair of Xen para-virtual frontend device drivers:
drivers/video/xen-fbfront.c provides a framebuffer, and
drivers/input/xen-kbdfront provides keyboard and mouse.
  
  

Unless they're actually inter-dependent, could you post this as two
separate patches?  I don't know anything about these parts of the
kernel, so it would be nice to make it very obvious which changes are
fb vs mouse/keyboard.



I could do that do that, but the intermediate step (one driver, not
the other) is somewhat problematic: the backend in dom0 needs both
drivers, and will refuse to complete device initialization unless
they're both present.
  


That's OK.  In that case keep them together.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

 CC  arch/x86/kernel/traps_32.o
/home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/traps_32.c:59:27: error: 
asm/kmemcheck.h: No such file or directory


asm-x86/kmemcheck.h does seem to be completely missing.  Looks like 
8db0acefb3025795abe3f37669354677a03de680 "x86: add hooks for kmemcheck" 
should have added the file.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Vegard Nossum <[EMAIL PROTECTED]> wrote:

  
 asm-x86/kmemcheck.h does seem to be completely missing.  Looks like 
 8db0acefb3025795abe3f37669354677a03de680 "x86: add hooks for 
 kmemcheck" should have added the file.
  
Hm. This is x86#testing, no? I don't think there's any kmemcheck code 
whatsoever in other branches.


The file should be added with this commit:

kmemcheck: add the kmemcheck core 
http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commit;h=c83d05d69382945c92a2e7a2b168c1cc2aa77c29



yes, x86.git looks fine here too:

 ~/linux.trees.git> git-checkout -b tmp x86/testing
 Branch tmp set up to track remote branch refs/remotes/x86/testing.
 Switched to a new branch "tmp"
 ~/linux.trees.git> cd include/asm-x86/
 ~/linux.trees.git/include/asm-x86> ls -l kmemcheck.h
 -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 kmemcheck.h
 ~/linux.trees.git/include/asm-x86> cd ..
 ~/linux.trees.git/include> cd ..
 ~/linux.trees.git> ls -ldt include/asm-x86/kmemcheck.h
 -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 include/asm-x86/kmemcheck.h
 ~/linux.trees.git> git-log | head -1
 commit c9d2f5489cec70f814bf64033290e5f05b4d7f33


I'm using #mm.  Should I be using #testing?

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

Jeremy, you might want to start tracking x86.git#testing:

  http://people.redhat.com/mingo/x86.git/README

if you want to follow the latest & greatest x86.git code.
  


Right, will do.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-26 Thread Jeremy Fitzhardinge

Mark McLoughlin wrote:

@@ -371,6 +372,9 @@ void __init dmi_scan_machine(void)
}
}
else {
+   if (e820_all_mapped(0xF, 0xF+0x1, E820_RAM))
+   goto out;



One issue with using the e820 map for this is that a Xen Dom0 will also
have this region marked as RAM in the e820 map, but will set up a fixmap
for it, allowing dmi_scan_machine() to map the region.
  


Would it be easier to just fake up a mapping so that window points to 
the real dmi area, and mark E820 accordingly?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] FW: proposal for systems that do not require security

2001-04-20 Thread Jeremy Fitzhardinge

On Tue, Apr 10, 2001 at 02:35:52PM +0200, Heusden, Folkert van wrote:
> So, I was wondering: isn't it a nice idea to have a switch in the
> configuration menu to disable entropy-gathering in the interrupt-routines,
> have some simplistic routine (like x'=(x * m + a) % p) which returns a non-
> cryptographic value, and something similar symplistic for the network-
> traffic routines?

No, that's a very bad idea.  If you think it's a problem, just remove
the random driver altogether.  It's much better for something to get
ENXIO rather than thinking it's getting real randomness.

You can still get TCP sequence numbers by sampling the cycle counter or
something.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Fix for SMP deadlock in autofs4

2001-04-20 Thread Jeremy Fitzhardinge

This is a fix for a potential deadlock in autofs4's expire routine.
It tries to use dput() while holding the dcache_lock.  This isn't a
problem in principle since dput() should only try to take the dcache_lock
when the counter makes a transition to zero, which can't happen in
this case.  Unfortunately the generic (and only) implementation of
atomic_dec_and_lock always takes the lock, so deadlocks.

Obviously, this only effects SMP.  UP's wise avoidance of spinlocks
saves it once again.

The simple solution is simply to replace dput() with atomic_dec().
The count can't reach zero because we did a dget_locked() and held
dcache_lock the whole time, so we never need to worry about the rest of
the dput() logic.

--- ../2.4/fs/autofs4/expire.c  Wed Jan 31 00:20:50 2001
+++ fs/autofs4/expire.c Fri Apr 20 01:29:53 2001
@@ -223,7 +223,8 @@
mntput(p);
return dentry;
}
-   dput(d);
+
+   atomic_dec(&d->d_count); /* dput(), but we'll never hit zero */
mntput(p);
}
spin_unlock(&dcache_lock);

J

 PGP signature


Re: Fix for SMP deadlock in autofs4

2001-04-20 Thread Jeremy Fitzhardinge

On Fri, Apr 20, 2001 at 05:00:04AM -0400, Alexander Viro wrote:
> Frankly, I'd rather add dput_locked() in dcache.c. The bug is real and
> since autofs4 is not the only place like that... I'll look into that
> stuff.

Sounds fine.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Fix for SMP deadlock in autofs4

2001-04-20 Thread Jeremy Fitzhardinge

On Fri, Apr 20, 2001 at 10:59:43PM -0700, Linus Torvalds wrote:
> It's untested, but looks fairly obvious. It removes the increment, and
> changes autofs4_expire() to properly bump the count of the returned dentry
> (and callers will dput() it when done). This may be unnecessarily careful,
> but it's the RightThing(tm) to do.

I suppose so.  It is pretty paranoid, because of autofs4's extra reference it
can't (shouldn't) ever drop to zero until the filesystem allows it to drop to
zero.  In other words, if it helps, it's hiding another bug.  But you're right,
if this were a general routine, it should definitely return with an elevated
count.

> Jeremy, would you mind verifying that this WorksForYou(tm)?

Looks fine to me.  I'll give it a spin.

J

 PGP signature


Re: Fix for SMP deadlock in autofs4

2001-04-20 Thread Jeremy Fitzhardinge

On Sat, Apr 21, 2001 at 02:21:38AM -0400, Alexander Viro wrote:
> Looks sane for me. However, I would add check for dentry being hashed and
> would skip the unhashed ones. Otherwise you can get a directory that
> had been removed but is still busy - doesn't look like a right thing to
> do. Jeremy?

It wouldn't hurt.  It can't happen in practice since unlink/rmdir happen
in very controlled ways (only the automount daemon is allowed to perform
those ops, so it will keep them in sync).

J

 PGP signature


Re: Fix for SMP deadlock in autofs4

2001-04-20 Thread Jeremy Fitzhardinge

On Fri, Apr 20, 2001 at 03:53:45PM -0400, Alexander Viro wrote:
> > Why are we doing the mntget/dget at all? We hold the spinlock, so we know
> > they are not going away. Not doing the mntget/dget means that we (a) run
> > faster and (b) don't have the bug, because we don't need to put the damn
> > things.
> > 
> > Comments?
> 
> It looks like you are right, but I wonder how the hell did that code
> happen at all. Looks like somewhere around 2.4.0-test10-pre* dcache_lock
> was moved out of is_tree_busy() and covered dget/dput. Hmm... Might be
> my fault - I don't remember doing that, but...

I did it.  I couldn't see a point in continiously taking and releasing
the dcache lock, since it just increased complexity and expire is not
a performance-critical path (ie, it happens rarely).

I kept the dget/put out caution and ignorance, but they're clearly
problematic.  I'm happy to drop them if holding dcache_lock is enough
to keep the tree stable while I traverse it.

> Removing that will require an obvious change in is_tree_busy() (shift
> count by 1). However, the real question is WTF are we trying to 
> get in autofs4_expire() - it returns dentry without grabbing a
> reference to it. The only thing that saves us is that we have a
> ramfs-style situation (dentries are pinned until we rmdir) and
> everything up to the point where we silently forget about dentry
> is covered by BKL. Since ->rmdir() is under BKL too it's enough,
> but... Eww... 

The dentry it returns is always an autofs4 dentry, and autofs4 always
keeps a refcount on its dentries like ramfs (because like ramfs,
autofs4 exists only in the dcache).

> Jeremy, what are you really trying to do there? is_tree_busy()
> seems to be written in assumption that mnt/dentry is not a
> mountpoint but root of a subtree with something mounted on its
> leaves. And autofs4_expire() traverses the list of root's
> subdirectories, picks one that has nothing busy mounted in
> _its_ subdirectories and essentially pass the name to caller.
> Which sends that name (of first-level subdirectory) to
> userland.

Exactly right.

> Is that what you really want there? It looks very odd - why don't we pass
> the names of actual mountpoints? What's wrong with the case when foo/bar
> is busy, but foo/baz is not?

Say for example you have an autofs4 filesystem mounted on /net.  When you
do a "cd /net/host", all of host's exported NFS filesystems are mounted
on the directory /net/host; obviously the mountpoint /net/host is an
autofs4 directory.

autofs4_expire traverses the directories in its root and finds the ones
which are currently unused and have been idle for some time.  Since all
the filesystems mounted under /net/host are part of the same logical
tree, it examines them as a single unit so they can be umounted as a
single unit.

Note that /net/host may not itself be a mountpoint.  If host doesn't
export / but only, say, /home and /usr/local, then there'll be a tree of
skeleton directories in the autofs4 filesystem to create the paths
up to the mountpoints (so there'll be /net/host/{home,usr/local}).
But because everything under /net/host is treated as a single unit, it's
only correct for autofs4_expire to return /net/host, not /net/host/home
or /net/host/usr/local.

The simplifying assumption I make is that there's a single root directory
with a number of sub-directories; each subdirectory is treated as
a single unit.  They general case would be to mark some directories
as being the root of an atomic set, and other directories simply being
structural, but the need has never come up (and it can be worked around by
having nested autofs filesystems).

J

 PGP signature


Re: IDE disk slow? There's help...

2000-10-20 Thread Jeremy Fitzhardinge

On Fri, Oct 20, 2000 at 03:16:14PM -0400, safemode wrote:
> That's what i was thinking, but 30MB/s seems to be quite an exaggeration.

I reliably get 30MB/s with my IBM 30G 7200rpm ATA66 drive, using a
Via VT82C586 controller.  2.4.0-test9.  Modern drives are really fast.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Update to autofs4 for new(-ish) VFS stuff

2000-10-22 Thread Jeremy Fitzhardinge

Ever since the addition of struct vfs_mount, autofs4 has got the "is this
filesystem busy" test wrong.  This patch against 2.4.0-test9 makes it
smarter.

J


--- linux.orig/CREDITS  Tue Oct  3 17:30:15 2000
+++ linux/CREDITS   Tue Oct  3 17:56:04 2000
@@ -795,13 +795,16 @@
 S: Germany
 
 N: Jeremy Fitzhardinge
-E: [EMAIL PROTECTED]
+E: [EMAIL PROTECTED]
+W: http://www.goop.org/~jeremy
+D: author of userfs filesystem
 D: Improved mmap and munmap handling
 D: General mm minor tidyups
-S: 67 Surrey St.
-S: Darlinghurst, Sydney
-S: New South Wales 2010
-S: Australia
+D: autofs v4 filesystem rework
+S: 987 Alabama St
+S: San Francisco
+S: SA, 94110
+S: USA
 
 N: Ralf Flaxa
 E: [EMAIL PROTECTED]
diff -x *.o -x *~ -x *.flags -x .depend -x .hdepend -u 2.3/fs/autofs4/expire.c 
local-2.3/fs/autofs4/expire.c
--- linux.orig/fs/autofs4/expire.c  Wed Sep  6 18:02:29 2000
+++ linux/fs/autofs4/expire.c   Sat Oct 21 19:07:24 2000
@@ -3,7 +3,7 @@
  * linux/fs/autofs/expire.c
  *
  *  Copyright 1997-1998 Transmeta Corporation -- All Rights Reserved
- *  Copyright 1999 Jeremy Fitzhardinge <[EMAIL PROTECTED]>
+ *  Copyright 1999-2000 Jeremy Fitzhardinge <[EMAIL PROTECTED]>
  *
  * This file is part of the Linux kernel and is made available under
  * the terms of the GNU General Public License, version 2, or at your
@@ -15,46 +15,139 @@
 
 /*
  * Determine if a subtree of the namespace is busy.
+ *
+ * mnt is the mount tree under the autofs mountpoint
  */
-static int is_tree_busy(struct vfsmount *mnt)
+static inline int is_vfsmnt_tree_busy(struct vfsmount *mnt)
 {
struct vfsmount *this_parent = mnt;
struct list_head *next;
int count;
 
-   spin_lock(&dcache_lock);
-   count = atomic_read(&mnt->mnt_count) - 2;
-   if (!is_autofs4_dentry(mnt->mnt_mountpoint))
-   count--;
+   count = atomic_read(&mnt->mnt_count) - 1;
+
 repeat:
next = this_parent->mnt_mounts.next;
+   DPRINTK(("is_vfsmnt_tree_busy: mnt=%p, this_parent=%p, next=%p\n",
+mnt, this_parent, next));
 resume:
-   while (next != &this_parent->mnt_mounts) {
-   struct list_head *tmp = next;
-   struct vfsmount *p = list_entry(tmp, struct vfsmount,
+   for( ; next != &this_parent->mnt_mounts; next = next->next) {
+   struct vfsmount *p = list_entry(next, struct vfsmount,
mnt_child);
-   next = tmp->next;
-   /* Decrement count for unused children */
-   count += atomic_read(&p->mnt_count) - 2;
+
+   /* -1 for struct vfs_mount's normal count, 
+  -1 to compensate for child's reference to parent */
+   count += atomic_read(&p->mnt_count) - 1 - 1;
+
+   DPRINTK(("is_vfsmnt_tree_busy: p=%p, count now %d\n",
+p, count));
+
if (!list_empty(&p->mnt_mounts)) {
this_parent = p;
goto repeat;
}
/* root is busy if any leaf is busy */
-   if (atomic_read(&p->mnt_count) > 1) {
-   spin_unlock(&dcache_lock);
+   if (atomic_read(&p->mnt_count) > 1)
return 1;
-   }
}
-   /*
-* All done at this level ... ascend and resume the search.
-*/
+
+   /* All done at this level ... ascend and resume the search. */
if (this_parent != mnt) {
next = this_parent->mnt_child.next; 
this_parent = this_parent->mnt_parent;
goto resume;
}
-   spin_unlock(&dcache_lock);
+
+   DPRINTK(("is_vfsmnt_tree_busy: count=%d\n", count));
+   return count != 0; /* remaining users? */
+}
+
+/* Traverse a dentry's list of vfsmounts and return the number of
+   non-busy mounts */
+static int check_vfsmnt(struct vfsmount *mnt, struct dentry *dentry)
+{
+   int ret = 0;
+   struct list_head *tmp;
+
+   list_for_each(tmp, &dentry->d_vfsmnt) {
+   struct vfsmount *vfs = list_entry(tmp, struct vfsmount, 
+ mnt_clash);
+   DPRINTK(("check_vfsmnt: mnt=%p, dentry=%p, tmp=%p, vfs=%p\n",
+mnt, dentry, tmp, vfs));
+   if (vfs->mnt_parent != mnt || /* don't care about busy-ness of other 
+namespaces */
+   !is_vfsmnt_tree_busy(vfs))
+   ret++;
+   }
+
+   DPRINTK(("check_vfsmnt: ret=%d\n", ret));
+   return ret;
+}
+
+/* Check dentry tree for busyness.  If a dentry appears to be busy
+   because it is a mountpoint, check to see if the mounted
+   filesystem is busy. */
+static int is_tree_busy(stru

Re: IDE disk slow? There's help...

2000-10-22 Thread Jeremy Fitzhardinge

On Fri, Oct 20, 2000 at 01:22:59PM -0700, Andre Hedrick wrote:
> On Fri, 20 Oct 2000 [EMAIL PROTECTED] wrote:
> 
> > [EMAIL PROTECTED] wrote..
> > 
> > > I reliably get 30MB/s with my IBM 30G 7200rpm ATA66 drive, using a
> > > Via VT82C586 controller.  2.4.0-test9.  Modern drives are really fast.
> > 
> > Hmm, I'm confused here.
> > VIA 586 can only do up to UDMA 2, which should return speeds less than
> > that. My system has an identical configuration, and I get ~12MB/s
> 
> No the are the pci device ide but different guts.  This is the ugliness
> that most never see.

I think you left some words out.  Are you saying that this is one of those
chips which change PCI id in order to give the appearance of backwards
compatability?  That it's not really a VT82C586?

Thanks,
J

 PGP signature


Re: Request for info on proc system update frequency

2000-10-22 Thread Jeremy Fitzhardinge

On Wed, Oct 18, 2000 at 04:48:48PM +0100, Stephen Tweedie wrote:
> On Tue, Oct 17, 2000 at 12:31:24AM -0400, John Kacur wrote:
> > I'm trying to understand how the proc file system works. In particular
> > I'd like to know more about the algorithm by which the information is
> > updated and how frequently.
> 
> It is "live": the file contents are generated on demand when you read
> them.  A very few proc files include time-averaged data (such as the
> load average); everything else is absolutely uptodate.

...at the instant you read it.  It may be out of date a nanosecond later.
[Yes, a nit-pick, but worth making clear to the original poster.]

J

 PGP signature


Re: The zen of kernel virtual addresses

2000-10-23 Thread Jeremy Fitzhardinge

On Sat, Oct 21, 2000 at 01:37:26PM -0600, Jonathan Corbet wrote:
> physical address
>   An address as known by the low-level hardware.  In the modern
>   world, these can be 64-bit quantities, even on 32-bit systems.
>   These are the addresses used by /dev/mem - which appears to work
>   only for low memory.

A phyical address is the address the CPU uses to talk to memory.  It is not
necessarily the same kind of address a device uses to see memory: they use
bus addresses.  The simple case is bus address == physical address, but 
there are many variations.  Systems with an IOMMU (or equiv) present devices
with a completely different view of system memory.

J

 PGP signature


[PATCH] address-space identification for /proc

2000-10-26 Thread Jeremy Fitzhardinge

Hi,

/proc has no way to indicate whether tasks share an address space.
This one-liner patch adds a new ASID: field to /proc//status so
there's some way to see address-space sharing between tasks.

While this is hardly a bug-fix, it is a pretty useful thing to know
which is otherwise completely absent.

J


--- ../2.3/fs/proc/array.c  Mon Oct  9 17:03:53 2000
+++ linux/fs/proc/array.c   Thu Oct 26 15:20:52 2000
@@ -294,6 +294,7 @@
for(line=0;(len=sprintf_regs(line,buffer,task,NULL,NULL))!=0;line++)
buffer+=len;
 #endif
+   buffer += sprintf("ASID: %p\n", mm);
return buffer - orig;
 }
 

 PGP signature


Re: [PATCH] address-space identification for /proc

2000-10-26 Thread Jeremy Fitzhardinge

On Thu, Oct 26, 2000 at 03:45:27PM -0700, I wrote:
> + buffer += sprintf("ASID: %p\n", mm);

Obviously, this should be:

+   buffer += sprintf("ASID:\t%p\n", mm);

for consistency.

J

 PGP signature


Re: [PATCH] address-space identification for /proc

2000-10-26 Thread Jeremy Fitzhardinge

On Thu, Oct 26, 2000 at 07:01:26PM -0400, Johannes Erdfelt wrote:
> and even more obvious:
> 
> + buffer += sprintf(buffer, "ASID:\t%p\n", mm);
> 
> Actually putting it into the buffer would be useful as well :)

That serves me right for hand-editing patches.

J
--
Repeat to self: I am not Linus

 PGP signature


Re: Linux-2.4.0-test10

2000-10-31 Thread Jeremy Fitzhardinge

On Tue, Oct 31, 2000 at 08:55:13PM +, Alan Cox wrote:
>   Does autofs4 work yet

Autofs4 was fixed in 2.4.0-test10-pre6 or so.  Autofs4 for 2.2.x has
been working for some time, though I just updated the 2.2 patch so it
doesn't stomp on autofs (v3).

J

 PGP signature


Re: Status of ReiserFS + Journalling

2000-10-05 Thread Jeremy Fitzhardinge

On Thu, Oct 05, 2000 at 11:33:30AM +0200, Helge Hafting wrote:
> A power failure might leave you with a corrupt disk block.  That is
> detectable (read failure) and you may then reconstruct it using the
> rest of the stripe.  This will get you data from either before 
> or after the update was supposed to happen.

How would you be able to tell which disk contains the bad stripe?
RAID reconstruction relies on knowing which disk to reconstruct because
it's obviously bad - there's out of band information in the form
of I/O errors.  If you only have an incompletely updated stripe on
a disk, you don't know which data to reconstruct from parity.

I think the only way of doing this properly is to either have
battery-backed cache, or by having journalling at the RAID level.

J

 PGP signature


User-mode linux stack overflow: could be generic problem

2000-10-07 Thread Jeremy Fitzhardinge

Hi,

I've been playing with user-mode linux (2.4.0-pre9).  It works well on
one machine, but on my laptop I'm consistently getting stack overflows
just as init is started.

The backtrace (from a breakpoint at panic()):

(gdb) bt
#0  panic (fmt=0x10112e00 "Stack overflowed onto current_task page")
at panic.c:54
#1  0x100a244d in check_stack_overflow (ptr=0x5015ccc8) at process_kern.c:715
#2  0x1009ddc9 in set_signals (enable=0) at signal_user.c:50
#3  0x100050b0 in __wake_up (q=0x5014afc8, mode=35) at sched.c:714
#4  0x10020ea5 in end_buffer_io_sync (bh=0x5014af80, uptodate=1)
at /home/jeremy/uml/2.3/include/linux/locks.h:34
#5  0x100607a4 in end_that_request_first (req=0x500e8f00, uptodate=1, 
name=0x1011390d "User-mode block device") at ll_rw_blk.c:1000
#6  0x100a3d48 in ubd_finish () at /home/jeremy/uml/2.3/include/linux/blk.h:396
#7  0x100a3dd5 in ubd_handler () at ubd.c:222
#8  0x100a3e00 in ubd_intr (irq=3, dev=0x1012c0a0, unused=0x5015cd88)
at ubd.c:229
#9  0x1009c6bf in handle_IRQ_event (irq=3, regs=0x5015cd88, action=0x500573c0)
at irq.c:148
#10 0x1009c85f in do_IRQ (irq=3, user_mode=0) at irq.c:313
#11 0x1009cf8d in sigio_handler (sig=29) at irq_user.c:53
#12 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#13 0x1009de50 in set_signals (enable=3) at signal_user.c:65
#14 0x1005fb46 in generic_unplug_device (data=0x10160650) at ll_rw_blk.c:364
#15 0x100204ab in __wait_on_buffer (bh=0x5014af80)
at /home/jeremy/uml/2.3/include/linux/tqueue.h:120
#16 0x100212be in bread (dev=25088, block=92508, size=1024)
at /home/jeremy/uml/2.3/include/linux/locks.h:20
#17 0x10041a40 in ext2_get_block (inode=0x5013c0a0, iblock=288, 
bh_result=0x5014ae00, create=0) at inode.c:250
#18 0x10021edd in block_read_full_page (page=0x50008b74, 
get_block=0x10041978 ) at buffer.c:1613
#19 0x10042014 in ext2_readpage (file=0x500de1e0, page=0x50008b74)
at inode.c:659
#20 0x10013eb1 in read_cluster_nonblocking (file=0x500de1e0, offset=77, 
filesize=78) at filemap.c:440
#21 0x1001525c in filemap_nopage (area=0x500d2c60, address=134832128, 
no_share=2) at filemap.c:1391
#22 0x1001209d in do_no_page (mm=0x500d41c0, vma=0x500d2c60, 
address=134832392, write_access=2, page_table=0x50159258) at memory.c:1150
#23 0x100121d4 in handle_mm_fault (mm=0x500d41c0, vma=0x500d2c60, 
address=134832392, write_access=2) at memory.c:1207
#24 0x100a01b3 in segv (address=134832392, ip=268665530, is_write=2, is_user=0)
at trap_kern.c:89
#25 0x100a0902 in segv_handler (sig=11) at trap_user.c:258
#26 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#27 0x100397d4 in load_elf_binary (bprm=0x5015db24, regs=0x0)
at binfmt_elf.c:714
#28 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0)
at exec.c:809
#29 0x10038226 in load_script (bprm=0x5015db24, regs=0x0) at binfmt_script.c:92
#30 0x100287e8 in search_binary_handler (bprm=0x5015db24, regs=0x0)
at exec.c:809
#31 0x100289cd in do_execve (filename=0x500df000 "/etc/rc.d/rc.sysinit", 
argv=0xbf7ffb14, envp=0x804f2c0, regs=0x0) at exec.c:902
#32 0x1009c3fc in execve1 (file=0x500df000 "/etc/rc.d/rc.sysinit", 
argv=0xbf7ffb14, env=0x804f2c0) at exec_kern.c:77
#33 0x1009c474 in sys_execve (file=0xbf7ffa88 "", argv=0xbf7ffb14, 
env=0x804f2c0) at exec_kern.c:101
#34 0x1009eab7 in execute_syscall (syscall=11, args=0x5015dcf8)
at syscall_kern.c:340
#35 0x1009eeb8 in syscall_handler (unused=0) at syscall_user.c:113
#36 0x1009bf03 in fork_handler (sig=10) at process.c:96
#37 0x100a7318 in __restore ()
at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127

This is a pretty deep stack, but there's nothing unexpected there.
I would guess that some kind of very fast disk drive would also cause
this kind of deep stack on real hardware, if it can complete the
I/O and interrupt before the reschedule.

I tried adding some inlines to make the stack use a little shallower, but
it didn't help.

Any suggestions on how to get this working? 

Thanks,
J

 PGP signature


Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 12:35:48AM -0500, Jeff Dike wrote:
> I've been waiting for someone to send me that stack.  There aren't any real 
> smoking guns there.  I'm guessing that the difference between your laptop and 
> the machine it works on is that your laptop is running a fairly recent kernel 
> (2.4.0-testx) and the other isn't.

Yep, that's right.

> The sigcontext struct greatly increased in 
> size (to ~800 bytes IIRC) to accomodate the MMX registers or something.  There 
> are three signals on your stack, so those frames by themselves are taking up 
> half the stack page.
> 
> Anyway, the patch below removes 256 bytes from the set_signals frame.  It 
> ought to alleviate things a bit.  I'll be looking for other things I can do, 
> as well. Let me know how it works for you.

I'm afraid this doesn't help.  The stack still overflows at the same point.
It looks like each signal frame is ~760 bytes.  Even with this patch, the
overflow is 808 bytes (without the patch it's 1232 bytes).

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote:
> [EMAIL PROTECTED] said:
> > Even with this patch, the overflow is 808 bytes (without the patch
> > it's 1232 bytes).
> 
> I was mulling over some other changes that would have saved another 256 bytes, 
> but those don't look like they would help.  Try the patch below.  It 
> essentially gives up and lets the stack occupy half of the lower page.

Well, that sweeps the problem under the carpet enough to make progress...
 
> Also, could you look at the stack pointer at each frame, to see if you are 
> encountering any stack hogs in the generic kernel?  In a different situation, 
> I found devfs putting a 3K structure on the stack.

OK, I'll look into it.

J

 PGP signature


Re: User-mode linux stack overflow: could be generic problem

2000-10-08 Thread Jeremy Fitzhardinge

On Sun, Oct 08, 2000 at 11:21:01AM -0500, Jeff Dike wrote:
> Also, could you look at the stack pointer at each frame, to see if you are 
> encountering any stack hogs in the generic kernel?  In a different situation, 
> I found devfs putting a 3K structure on the stack.

OK, top candidates on that stack trace are:

__restore():764
do_execve:  340
load_elf_binary:324
segv:   180
sigio_handler:  176
load_script:172
ext2_get_block: 160
set_signals:156
block_read_full_page:   124

Looks like do_execve should be pretty easy to shrink: most of the stack
is in a local of type struct linux_binprm (308 bytes), which could be
kmalloced.  I guess this would have some cost in speed, so I don't suppose
this could be a generic patch.  Anyway, it isn't a solution in itself.

load_elf_binary is harder to deal with, since it just has lots of locals,
each relatively small.

segv is mostly a local struct siginfo (128 bytes).

sigio_handler is mostly an fd_set (128 bytes).

load_script has a local buffer for remembering the interpreter (128 bytes).

All up, there's about 660 bytes of stack which can be relatively easily
saved by converting locals to kmalloced memory, which still isn't enough
to solve the problem.

I haven't looked into UML's interrupt handling, but perhaps another approach
is to try and avoid recursive interrupts/exceptions and do some kind of
tail-recursion optimisation in the exception/signal handler.  I don't
know if this would cause problems (deadlocks?).

Alternatively, could you just use a bigger stack?

J

 PGP signature


2.6.12-rc2-mm1: ieee1394 process hang

2005-04-07 Thread Jeremy Fitzhardinge
I'm having problems with 1394 in 2.6.12-rc2-mm1.  When I connect my
Apple iSight camera, it is not detected; repeated
connections/disconnections don't help.  When I tried to rmmod all the
appropriate modules (rmmod video1394 raw1394 ohci1394 ieee1394), the
rmmod command hung.  Alt-Sysreq-t shows this:

rmmod D F75593C0 0  7206   7193 (NOTLB)
e43fbda0 0086 e43fbdd0 f75593c0   f78bbd20 29b325f2
   04d2 0848 2ec09330 04d2 e0556560 e0556688 f792f258 e43fbdd4
   e43fb000 e43fbdf4 c02ade0e  e0556560 c01142b0  
Call Trace:
 [] wait_for_completion+0x6e/0xc0
 [] device_del+0x16/0x70
 [] device_unregister+0xb/0x20
 [] nodemgr_remove_ne+0x6d/0x90 [ieee1394]
 [] __nodemgr_remove_host_dev+0xb/0x10 [ieee1394]
 [] device_for_each_child+0x29/0x50
 [] nodemgr_remove_host_dev+0x15/0x40 [ieee1394]
 [] __unregister_host+0x75/0xb0 [ieee1394]
 [] highlevel_remove_host+0x2d/0x60 [ieee1394]
 [] hpsb_remove_host+0x3b/0x60 [ieee1394]
 [] ohci1394_pci_remove+0x8b/0x250 [ohci1394]
 [] pci_device_remove+0x2c/0x40
 [] device_release_driver+0x7c/0x80
 [] __remove_driver+0x8/0x10
 [] driver_for_each_device+0x43/0x70
 [] driver_detach+0x16/0x18
 [] bus_remove_driver+0x26/0x40
 [] driver_unregister+0xe/0x20
 [] pci_unregister_driver+0xe/0x20
 [] sys_delete_module+0x14d/0x160
 [] sysenter_past_esp+0x54/0x75

This last worked for me in 2.6.12-rc1-mm3; I didn't have a chance to
test -rc1-mm4.

.config attached and lspci attached.

J
00:00.0 Host bridge: Intel Corp. 82855PM Processor to I/O Controller (rev 03)
Subsystem: IBM: Unknown device 0529
Flags: bus master, fast devsel, latency 0
Memory at d000 (32-bit, prefetchable) [size=256M]
Capabilities: [e4] Vendor Specific Information
Capabilities: [a0] AGP version 2.0

00:01.0 PCI bridge: Intel Corp. 82855PM Processor to AGP Controller (rev 03) 
(prog-if 00 [Normal decode])
Flags: bus master, 66Mhz, fast devsel, latency 96
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: 3000-3fff
Memory behind bridge: c010-c01f
Prefetchable memory behind bridge: e000-e7ff

00:1d.0 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB 
UHCI Controller #1 (rev 01) (prog-if 00 [UHCI])
Subsystem: IBM: Unknown device 052d
Flags: bus master, medium devsel, latency 0, IRQ 11
I/O ports at 1800 [size=32]

00:1d.1 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB 
UHCI Controller #2 (rev 01) (prog-if 00 [UHCI])
Subsystem: IBM: Unknown device 052d
Flags: bus master, medium devsel, latency 0, IRQ 5
I/O ports at 1820 [size=32]

00:1d.2 USB Controller: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB 
UHCI Controller #3 (rev 01) (prog-if 00 [UHCI])
Subsystem: IBM: Unknown device 052d
Flags: bus master, medium devsel, latency 0, IRQ 9
I/O ports at 1840 [size=32]

00:1d.7 USB Controller: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI 
Controller (rev 01) (prog-if 20 [EHCI])
Subsystem: IBM: Unknown device 052e
Flags: bus master, medium devsel, latency 0, IRQ 5
Memory at c000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] Debug port

00:1e.0 PCI bridge: Intel Corp. 82801 Mobile PCI Bridge (rev 81) (prog-if 00 
[Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=02, subordinate=08, sec-latency=64
I/O behind bridge: 4000-8fff
Memory behind bridge: c020-cfff
Prefetchable memory behind bridge: e800-efff

00:1f.0 ISA bridge: Intel Corp. 82801DBM (ICH4-M) LPC Interface Bridge (rev 01)
Flags: bus master, medium devsel, latency 0

00:1f.1 IDE interface: Intel Corp. 82801DBM (ICH4-M) IDE Controller (rev 01) 
(prog-if 8a [Master SecP PriP])
Subsystem: IBM: Unknown device 052d
Flags: bus master, medium devsel, latency 0, IRQ 9
I/O ports at 
I/O ports at 
I/O ports at 
I/O ports at 
I/O ports at 1860 [size=16]
Memory at 4000 (32-bit, non-prefetchable) [size=1K]

00:1f.3 SMBus: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus 
Controller (rev 01)
Subsystem: IBM: Unknown device 052d
Flags: medium devsel, IRQ 10
I/O ports at 1880 [size=32]

00:1f.5 Multimedia audio controller: Intel Corp. 82801DB/DBL/DBM 
(ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
Subsystem: IBM: Unknown device 0534
Flags: bus master, medium devsel, latency 0, IRQ 10
I/O ports at 1c00 [size=256]
I/O ports at 18c0 [size=64]
Memory at cc00 (32-bit, non-prefetchable) [size=512]
Memory at c800 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management 

Re: [PATCH] symlink.c

2001-06-12 Thread Jeremy Fitzhardinge

Quoting John Martin <[EMAIL PROTECTED]>:
> this patch adds a check to make sure memory was allocated, returns an
> error code otherwise.

autofs4_dentry_ino doesn't allocate memory; it just extracts the fsdata pointer
from the dentry structure.  If it's returning NULL, then there's something else
wrong and you're papering over the symptoms.  Are you seeing this happen?

Linus, please don't apply this.

 J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   3   4   5   6   7   8   9   10   >