Re: Status of buffered write path (deadlock fixes)

2006-12-12 Thread Nick Piggin
Mark Fasheh wrote: On Tue, Dec 12, 2006 at 02:52:26AM +1100, Nick Piggin wrote: Nick Piggin wrote: Hmm, doesn't look like we can do this either because at least GFS2 uses BH_New for its own special things. Also, I don't know if the trick of only walking over BH_New buffers will w

Re: Status of buffered write path (deadlock fixes)

2006-12-12 Thread Nick Piggin
Trond Myklebust wrote: On Wed, 2006-12-13 at 11:53 +1100, Nick Piggin wrote: Not silly -- I guess that is the main sticking point. Luckily *most* !uptodate pages will be ones that we have newly allocated so will not be in pagecache yet. If it is in pagecache, we could do one of a number of

Re: [PATCH 1/2] WorkStruct: Add assign_bits() to give an atomic-bitops safe assignment

2006-12-12 Thread Nick Piggin
Linus Torvalds wrote: On Tue, 12 Dec 2006, Russell King wrote: This seems to be a very silly question (and I'm bound to be utterly wrong as proven in my last round) but why are we implementing a new set of atomic primitives which effectively do the same thing as our existing set? Why can't we

Re: Status of buffered write path (deadlock fixes)

2006-12-12 Thread Nick Piggin
Trond Myklebust wrote: On Wed, 2006-12-13 at 12:56 +1100, Nick Piggin wrote: Note that these pages should be *really* rare. Definitely even for normal filesystems I think RMW would use too much bandwidth if it were required for any significant number of writes. If file "foo" exi

[rfc][patch] fix buffered write deadlocks with extra copy (and a way out?)

2006-12-17 Thread Nick Piggin
My attempt to fix this problem by modifying the prepare_write/commit_write API fell on its face because it ended up breaking filesystems and the buffer layer in various interesting ways, but probably more importantly the logic was getting complex and fragile and the fixes would have made that even

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Nick Piggin
Linus Torvalds wrote: [ Replying to myself - a sure sign that I don't get out enough ] On Sun, 17 Dec 2006, Linus Torvalds wrote: So I don't actually see any serialization at all that would keep a random page from being paged back in. We do actually serialize, but we do it _after_ the page

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that matters is that the page eventually gets marked dirty. But the p

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: I think the problem Andrew identified is real. I don't. In fact I don't think I described any problem (well, I tried to, but then I contradicted myself). By saying that there

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Sun, 17 Dec 2006 21:50:43 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Mon, 18 Dec 2006 18:22:42 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: >>Yes I could believe it the corruption is caused by something else >>completely. > > > Think so. We do have a problem here, but only on threaded apps, I believe. >

Re: Linux disk performance.

2006-12-18 Thread Nick Piggin
Manish Regmi wrote: On 12/18/06, Arjan van de Ven <[EMAIL PROTECTED]> wrote: if you want truely really smooth writes you'll have to work for it, since "bumpy" writes tend to be better for performance so naturally the kernel will favor those. to get smooth writes you'll need to do a threaded se

Re: [RFC][PATCH] Fix area->nr_free-- went (-1) issue in buddy system

2006-12-18 Thread Nick Piggin
Aubrey wrote: Hi all, When I setup two zones (NORMAL and DMA) in my system, I got the following wired result from /proc/buddyinfo. - root:~> cat /proc/buddyinfo Node 0, zone DMA 2 1 2

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Mon, 18 Dec 2006, Peter Zijlstra wrote: This should be safe; page_mkclean walks the rmap and flips the pte's under the pte lock and records the dirty state while iterating. Concurrent faults will either do set_page_dirty() before we get around to doing it or vice versa,

Re: -mm merge plans for 2.6.20

2006-12-18 Thread Nick Piggin
Dave Jones wrote: Eeek! page_mapcount(page) went negative! (-2) Hmm, probably happened once before, too. page->flags = 404 What's that? PG_referenced|PG_reserved? So I'd say it is likely that some driver has got its refcounting wrong. Unfortunately, this debugging output is almost usele

Re: [RFC][PATCH] Fix area->nr_free-- went (-1) issue in buddy system

2006-12-18 Thread Nick Piggin
Hi Aubery! Aubrey wrote: Hi Nick, Thanks for your reply again, ;-). On 12/19/06, Nick Piggin <[EMAIL PROTECTED]> wrote: This should not happen because the pages are checked to ensure they are from the same zone before merging. How? page_is_buddy() only check if the buddy has the

Re: Linux disk performance.

2006-12-18 Thread Nick Piggin
Manish Regmi wrote: Nick Piggin: but they look like they might be a (HZ quantised) delay coming from block layer plugging. Sorry i didn´t understand what you mean. When you submit a request to an empty block device queue, it can get "plugged" for a number of timer ticks before

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Tue, 19 Dec 2006, Nick Piggin wrote: We never want to drop dirty data! (ignoring the truncate case, which is handled privately by truncate anyway) Bzzt. SURE we do. We absolutely do want to drop dirty data in the writeout path. How do you think dirty data ever

Re: -mm merge plans for 2.6.20

2006-12-18 Thread Nick Piggin
Dave Jones wrote: On Tue, Dec 19, 2006 at 04:20:37PM +1100, Nick Piggin wrote: > Dave Jones wrote: > > > Eeek! page_mapcount(page) went negative! (-2) > > Hmm, probably happened once before, too. You're right. Going back further in the log, I noticed that it had

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Peter Zijlstra wrote: On Tue, 2006-12-19 at 15:36 +1100, Nick Piggin wrote: plain text document attachment (fs-fix.patch) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c 2006-12-19 15:15:46.0 +1100

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Linus Torvalds wrote: On Tue, 19 Dec 2006, Nick Piggin wrote: Anyway it has the same issues as the others. See what happens when you run two test_clear_page_dirty_sync_ptes() consecutively, you still loose PG_dirty even though the page might actually be dirty. How can this happen? We&#x

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Andrew Morton wrote: On Tue, 19 Dec 2006 20:56:50 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: Linus Torvalds wrote: NOTICE? First you make a BIG DEAL about how dirty bits should never get lost, but THE VERY SAME FUNCTION actually very much on purpose DOES drop the dirty bit for whe

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Andrew Morton wrote: On Tue, 19 Dec 2006 20:56:50 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: I think it could be very likely that indeed the bug is a latent one in a clear_page_dirty caller, rather than dirty-tracking itself. The only callers are try_to_free_buffers(), truncate and

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Peter Zijlstra wrote: On Tue, 2006-12-19 at 02:32 -0800, Andrew Morton wrote: Well it used to be. After 2.6.19 it can do the wrong thing for mapped pages. But it turns out that we don't feed it mapped pages, apart from pagevec_strip() and possibly races against pagefaults. So how about th

Re: open(O_DIRECT) on a tmpfs?

2007-01-04 Thread Nick Piggin
Denis Vlasenko wrote: On Thursday 04 January 2007 17:19, Bill Davidsen wrote: Hugh Dickins wrote: In many cases the use of O_DIRECT is purely to avoid impact on cache used by other applications. An application which writes a large quantity of data will have less impact on other applications b

Re: VM: Fix nasty and subtle race in shared mmap'ed page writeback

2007-01-04 Thread Nick Piggin
Andrea Gelmini wrote: On Thu, Jan 04, 2007 at 05:03:43PM +1100, Nick Piggin wrote: Anyway that leaves us with the question of why Andrea's database is getting corrupted. Hopefully he can give us a minimal test-case. yep, I can give you a complete image of my machine, or a root a

Re: [PATCHSET 1][PATCH 0/6] Filesystem AIO read/write

2007-01-04 Thread Nick Piggin
Suparna Bhattacharya wrote: On Thu, Jan 04, 2007 at 05:50:11PM +1100, Nick Piggin wrote: OK, but I think that after IO submission, you do not run sync_page to unplug the block device, like the normal IO path would (via lock_page, before the explicit plug patches). In the buffered AIO case

Re: [PATCH 0/4] Improve swap page error handling

2007-01-10 Thread Nick Piggin
Richard Purdie wrote: No, not this way, I'm afraid. Sorry, I don't remember the prior discussion on LKML, must have flooded past when my attention was elsewhere. I think you were cc'd on some of it but you never commented. Anyhow, I've reworked this patch series based on your comments. The h

Re: [REGRESSION] 2.6.19/2.6.20-rc3 buffered write slowdown

2007-01-10 Thread Nick Piggin
David Chinner wrote: On Wed, Jan 10, 2007 at 03:04:15PM -0800, Christoph Lameter wrote: On Thu, 11 Jan 2007, David Chinner wrote: The performance and smoothness is fully restored on 2.6.20-rc3 by setting dirty_ratio down to 10 (from the default 40), so something in the VM is not working as w

Re: page_mapcount(page) went negative

2007-01-10 Thread Nick Piggin
Dave Jones wrote: On Tue, Dec 19, 2006 at 04:20:37PM +1100, Nick Piggin wrote: > IMO the pattern is much too consistent to be able to attribute > them all to hardware problems. And considering it takes so long > for these things to appear, can we get something like the attached

Re: 2.6.20-rc4: known unfixed regressions (v2)

2007-01-10 Thread Nick Piggin
Vladimir V. Saveliev wrote: Hello On Tuesday 09 January 2007 21:30, Linus Torvalds wrote: On Tue, 9 Jan 2007, Malte Schröder wrote: So something interesting is definitely going on, but I don't know exactly what it is. Why does reiserfs do the truncate as part of a close, if the same inode is

Re: [REGRESSION] 2.6.19/2.6.20-rc3 buffered write slowdown

2007-01-10 Thread Nick Piggin
David Chinner wrote: On Thu, Jan 11, 2007 at 10:13:55AM +1100, Nick Piggin wrote: David Chinner wrote: On Wed, Jan 10, 2007 at 03:04:15PM -0800, Christoph Lameter wrote: On Thu, 11 Jan 2007, David Chinner wrote: The performance and smoothness is fully restored on 2.6.20-rc3 by setting

Re: O_DIRECT question

2007-01-10 Thread Nick Piggin
Linus Torvalds wrote: On Wed, 10 Jan 2007, Linus Torvalds wrote: So don't use O_DIRECT. Use things like madvise() and posix_fadvise() instead. Side note: the only reason O_DIRECT exists is because database people are too used to it, because other OS's haven't had enough taste to tell them

Re: 2.6.20-rc4: known unfixed regressions (v3)

2007-01-10 Thread Nick Piggin
Adrian Bunk wrote: Subject: BUG: at fs/inotify.c:172 set_dentry_child_flags() References : http://bugzilla.kernel.org/show_bug.cgi?id=7785 Submitter : Cijoml Cijomlovic Cijomlov <[EMAIL PROTECTED]> Handled-By : John McCutchan <[EMAIL PROTECTED]> Status : problem is being debugged I'm

Re: O_DIRECT question

2007-01-10 Thread Nick Piggin
Andrew Morton wrote: On Thu, 11 Jan 2007 14:45:12 +0800 Aubrey <[EMAIL PROTECTED]> wrote: In the interim you could do the old "echo 3 > /proc/sys/vm/drop_caches" thing, but that's terribly crude - drop_caches is really only for debugging and benchmarking. Yes. This method can drop caches, b

Re: O_DIRECT question

2007-01-11 Thread Nick Piggin
Aubrey wrote: On 1/11/07, Nick Piggin <[EMAIL PROTECTED]> wrote: What you _really_ want to do is avoid large mallocs after boot, or use a CPU with an mmu. I don't think nommu linux was ever intended to be a simple drop in replacement for a normal unix kernel. Is there a positio

Re: O_DIRECT question

2007-01-11 Thread Nick Piggin
Roy Huang wrote: There is already an EMBEDDED option in config, so I think linux is also supporting embedded system. There are many developers working on embedded system runing linux. They also hope to contribute to linux, then other embeded developers can share it. Yes, but we don't like to ap

Re: [REGRESSION] 2.6.19/2.6.20-rc3 buffered write slowdown

2007-01-11 Thread Nick Piggin
Thanks. BTW. You didn't cc this to the list, so I won't either in case you want it kept private. David Chinner wrote: On Thu, Jan 11, 2007 at 12:08:10PM +1100, Nick Piggin wrote: Ahh, sorry to be unclear, I meant: cat /proc/vmstat > pre run_test cat /proc/vmstat > post

Re: [REGRESSION] 2.6.19/2.6.20-rc3 buffered write slowdown

2007-01-11 Thread Nick Piggin
David Chinner wrote: On Thu, Jan 11, 2007 at 12:08:10PM +1100, Nick Piggin wrote: So, what I've attached is three files which have both 'vmstat 5' output and 'iostat 5 |grep dm-' output in them. Ahh, sorry to be unclear, I meant: cat /proc/vmstat > pre run_t

Re: 2.6.20-rc4: known unfixed regressions (v2)

2007-01-11 Thread Nick Piggin
Vladimir V. Saveliev wrote: Hello On Thursday 11 January 2007 04:00, Nick Piggin wrote: That's racy, unfortunately :P Sorry, please, explain what is racy. reiserfs_truncate and reiserfs_release call that function after they have inode's mutex locked. Calling truncate inside

Re: [REGRESSION] 2.6.19/2.6.20-rc3 buffered write slowdown

2007-01-11 Thread Nick Piggin
Christoph Lameter wrote: On Thu, 11 Jan 2007, Nick Piggin wrote: You're not turning on zone_reclaim, by any chance, are you? It is not a NUMA system so zone reclaim is not available. Ah yes... Can't you force it on if you have a NUMA complied kernel? zone reclaim was already

Re: [PATCH/RFC 2.6.20-rc4 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-01-11 Thread Nick Piggin
Jaya Kumar wrote: On 1/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote: That's all very interesting. Please don't dump a bunch of new implementation concepts like this on us with no description of what it does, why it does it and why it does it in this particular manner. Hi Andrew, Actually

Re: Replace nopage() / nopfn() with fault()

2007-01-11 Thread Nick Piggin
On Tue, Jan 09, 2007 at 04:02:08PM +0100, Thomas Hellström wrote: > Nick, > > We're working to slowly get the new DRM memory manager into the > mainstream kernel. > This means we have a need for the page fault handler patch you wrote > some time ago. > I guess we could take the no_pfn() route, b

Re: O_DIRECT question

2007-01-11 Thread Nick Piggin
Aubrey wrote: On 1/11/07, Roy Huang <[EMAIL PROTECTED]> wrote: On a embedded systerm, limiting page cache can relieve memory fragmentation. There is a patch against 2.6.19, which limit every opened file page cache and total pagecache. When the limit reach, it will release the page cache overrun

Re: O_DIRECT question

2007-01-11 Thread Nick Piggin
Bill Davidsen wrote: Nick Piggin wrote: Aubrey wrote: Exactly, and the *real* fix is to modify userspace not to make > PAGE_SIZE mallocs[*] if it is to be nommu friendly. It is the kernel hacks to do things like limit cache size that are the bandaids. Tuning the system to w

Re: [PATCH 05/05] Linux Kernel Markers, non optimised architectures

2007-01-11 Thread Nick Piggin
Mathieu Desnoyers wrote: +#define MARK(name, format, args...) \ + do { \ + static marker_probe_func *__mark_call_##name = \ + __mark_empty_function; \ + volatile static char __marker_enable_##name = 0; \ + stat

Re: O_DIRECT question

2007-01-11 Thread Nick Piggin
Linus Torvalds wrote: On Fri, 12 Jan 2007, Nick Piggin wrote: We are talking about about fragmentation. And limiting pagecache to try to avoid fragmentation is a bandaid, especially when the problem can be solved (not just papered over, but solved) in userspace. It's not clear tha

Re: O_DIRECT question

2007-01-11 Thread Nick Piggin
Nick Piggin wrote: Linus Torvalds wrote: Very basic issue: the perfect is the enemy of the good. Claiming that there is a "proper solution" is usually a total red herring. Quite often there isn't, and the "paper over" is actually not papering over, it's qu

Re: [PATCH 05/05] Linux Kernel Markers, non optimised architectures

2007-01-11 Thread Nick Piggin
Mathieu Desnoyers wrote: * Nick Piggin ([EMAIL PROTECTED]) wrote: Mathieu Desnoyers wrote: +#define MARK(name, format, args...) \ + do { \ + static marker_probe_func *__mark_call_##name = \ + __mark_empty_function

[patch] sched: avoid div in rebalance_tick

2007-01-11 Thread Nick Piggin
Just noticed this while looking at a bug. -- Avoid an expensive integer divide 3 times per CPU per tick. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/s

Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Nick Piggin
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote: > On Fri, 12 Jan 2007 07:02:13 +0100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Just noticed this while looking at a bug. > > Avoid an expensive integer divide 3 times per CPU per tick. > > Integ

[patch 0/7] fault vs truncate/invalidate race fix

2007-01-12 Thread Nick Piggin
The following set of patches fix the fault vs invalidate and fault vs truncate_range race for filemap_nopage mappings, plus those and fault vs truncate race for nonlinear mappings. Hasn't changed since I last submitted it, when it was rejected because it made one of the buffered write deadlocks ea

[patch 2/7] mm: simplify filemap_nopage

2007-01-12 Thread Nick Piggin
Identical block is duplicated twice: contrary to the comment, we have been re-reading the page *twice* in filemap_nopage rather than once. If any retry logic or anything is needed, it belongs in lower levels anyway. Only retry once. Linus agrees. Signed-off-by: Nick Piggin <[EMAIL PROTEC

[patch 1/7] mm: debug check for the fault vs invalidate race

2007-01-12 Thread Nick Piggin
Add a bugcheck for Andrea's pagefault vs invalidate race. This is triggerable for both linear and nonlinear pages with a userspace test harness (using direct IO and truncate, respectively). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/m

[patch 3/7] mm: fix fault vs invalidate race for linear mappings

2007-01-12 Thread Nick Piggin
on is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signe

[patch 6/7] mm: merge nopfn into fault

2007-01-12 Thread Nick Piggin
Remove ->nopfn and reimplement the only existing handler using ->fault Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/drivers/char/mspec.c === --- linux-2.6.orig/drivers/char/mspec.c +++ linux-2.6/

[patch 5/7] mm: add vm_insert_pfn

2007-01-12 Thread Nick Piggin
Add a vm_insert_pfn helper, so that ->fault handlers can have nopfn functionality by installing their own pte and returning NULL. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/mm.h === ---

[patch 4/7] mm: merge populate and nopage into fault (fixes nonlinear)

2007-01-12 Thread Nick Piggin
d with ->fault, and no users have hit mainline yet. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/mm.h === --- linux-2.6.orig/include/linux/mm.h +++ linux-2.6/include/linux/mm.h @@ -168,11 +1

[patch 7/7] mm: remove legacy cruft

2007-01-12 Thread Nick Piggin
- mm/fremap.c| 71 ++- mm/memory.c| 37 ++ 4 files changed, 21 insertions(+), 291 deletions(-) Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linu

Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Nick Piggin
Ravikiran G Thirumalai wrote: Hi, We noticed high interrupt hold off times while running some memory intensive tests on a Sun x4600 8 socket 16 core x86_64 box. We noticed softlockups, [...] We did not use any lock debugging options and used plain old rdtsc to measure cycles. (We disable cp

Re: O_DIRECT question

2007-01-12 Thread Nick Piggin
Bill Davidsen wrote: The point is that if you want to be able to allocate at all, sometimes you will have to write dirty pages, garbage collect, and move or swap programs. The hardware is just too limited to do something less painful, and the user can't see memory to do things better. Linus is

Re: tuning/tweaking VM settings for low memory (preventing OOM)

2007-01-12 Thread Nick Piggin
Kumar Gala wrote: I'm working on an embedded PPC setup with 64M of memory and no swap. I'm trying to figure out how best to tune the VM for an OOM situation I'm running into. I'm running a 2.6.16.35 kernel and have a bittorrent app that appears to be initializing a large file for it to do

Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Nick Piggin
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote: > On Fri, 12 Jan 2007 07:02:13 +0100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Just noticed this while looking at a bug. > > Avoid an expensive integer divide 3 times per CPU per tick. > > Integ

Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Nick Piggin
Ravikiran G Thirumalai wrote: On Sat, Jan 13, 2007 at 03:39:45PM +1100, Nick Piggin wrote: What is the "CS time"? Critical Section :). This is the maximal time interval I measured from t2 above to the time point we release the spin lock. This is the hold time I guess. I

[patch 0/10] buffered write deadlock fix

2007-01-12 Thread Nick Piggin
The following set of patches attempt to fix the buffered write locking problems (and there are a couple of peripheral patches and cleanups there too). This does pass the write deadlock tests that otherwise fail. Has survived a few hours of fsx-linux on ext2 and 3. Patches against 2.6.20-rc4. I d

[patch 1/10] fs: libfs buffered write leak fix

2007-01-12 Thread Nick Piggin
simple_prepare_write and nobh_prepare_write leak uninitialised kernel data. Fix the former, make a note of the latter. Several other filesystems seem to be iffy here, too. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/fs/l

[patch 2/10] mm: revert "generic_file_buffered_write(): handle zero length iovec segments"

2007-01-12 Thread Nick Piggin
From: Andrew Morton <[EMAIL PROTECTED]> Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6. This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we also revert. Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> I

[patch 3/10] mm: revert "generic_file_buffered_write(): deadlock on vectored write"

2007-01-12 Thread Nick Piggin
e fixing the deadlock by other means. Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Nick says: also it only ever actually papered over the bug, because after faulting in the pages, they might be unmapped or reclaimed. Signed-off-by: Nick Piggin <[EMAIL PROTECTED

[patch 7/10] mm: cleanup pagecache insertion operations

2007-01-12 Thread Nick Piggin
very short time, in contrast with the per-CPU pagevecs that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required to add the pages to pagecache for a bulk write (in 4K chunks). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-

[patch 8/10] mm: generic_file_buffered_write cleanup more

2007-01-12 Thread Nick Piggin
No need to do the confusing switch of variables from copied into status. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1

[patch 5/10] mm: debug write deadlocks

2007-01-12 Thread Nick Piggin
: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1894,6 +1894,7 @@ generic_file_buffered_write(struct kiocb if (maxlen &

[patch 9/10] mm: generic_file_buffered_write iovec cleanup

2007-01-12 Thread Nick Piggin
Hide some of the open-coded nr_segs tests into the iovec helpers. This is all to simplify generic_file_buffered_write, because that gets more complex in the next patch. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/fil

[patch 10/10] mm: fix pagecache write deadlocks

2007-01-12 Thread Nick Piggin
data via the kernel address space. (also, rename maxlen to seglen, because it was confusing) Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/m

[patch 6/10] mm: be sure to trim blocks

2007-01-12 Thread Nick Piggin
If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails, then we may have failed the write operation despite prepare_write having instantiated blocks past i_size. Fix this, and consolidate the trimming into one place. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index:

[patch 4/10] mm: generic_file_buffered_write cleanup

2007-01-12 Thread Nick Piggin
From: Andrew Morton <[EMAIL PROTECTED]> Clean up buffered write code. Rename some variables and fix some types. Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linu

Re: [patch 10/10] mm: fix pagecache write deadlocks

2007-01-13 Thread Nick Piggin
Nick Piggin wrote: @@ -1878,31 +1889,88 @@ generic_file_buffered_write(struct kiocb break; } + /* +* non-uptodate pages cannot cope with short copies, and we +* cannot take a pagefault with the destination page locked

Re: 2.6.19-rc4-mm1: writev() _functional_ regression

2006-11-26 Thread Nick Piggin
Andrew Morton wrote: On Sun, 12 Nov 2006 17:30:24 -0500 Nick Orlov <[EMAIL PROTECTED]> wrote: Andrew, Somewhere in between 2.6.18-mm3 and 2.6.19-rc4-mm1 writev() got screwed. It does not accept zero-length segments anymore. Bad thing that it is extremely easy to trigger (even w/o explicit wr

Re: The VFS cache is not freed when there is not enough free memory to allocate

2006-11-26 Thread Nick Piggin
Aubrey wrote: On 11/22/06, Peter Zijlstra <[EMAIL PROTECTED]> wrote: The lack of a MMU on your system makes it very hard not to rely on higher order allocations, because even user-space allocs need to be physically contiguous. But please take that into consideration when writing software. W

Re: Slab: Remove kmem_cache_t

2006-11-28 Thread Nick Piggin
Linus Torvalds wrote: So typedefs are good for - "u8"/"u16"/"u32"/"u64" kind of things, where the underlying types really are potentially different on different architectures. - "sector_t"-like things which may be 32-bit or 64-bit depending on some CONFIG_LBD option or other. - a

Re: [patch] Mark rdtsc as sync only for netburst, not for core2

2006-11-29 Thread Nick Piggin
Arjan van de Ven wrote: Zhang, Yanmin wrote: If it's a single processor, the go backwards issue doesn't exist. Below is my patch based on Arjan's. It's against 2.6.19-rc5-mm2. Hi, this patch is incorrect --- linux-2.6.19-rc5-mm2_arjan/arch/x86_64/kernel/setup.c 2006-11-29 10:41:21.

Re: The VFS cache is not freed when there is not enough free memory to allocate

2006-11-29 Thread Nick Piggin
Aubrey wrote: On 11/29/06, Sonic Zhang <[EMAIL PROTECTED]> wrote: Forward to the mailing list. > On 11/27/06, Nick Piggin <[EMAIL PROTECTED]> wrote: >> I haven't actually written any nommu userspace code, but it is obvious >> that you must try to keep malloc

[patch 2/3] mm: pagecache write deadlocks stale holes fix

2006-11-29 Thread Nick Piggin
If the data copy within a prepare_write can potentially allocate blocks to fill holes, so if the page copy fails then new blocks must be zeroed so uninitialised data cannot be exposed with a subsequent read. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/fil

[patch 1/3] mm: pagecache write deadlocks zerolength fix

2006-11-29 Thread Nick Piggin
writev with a zero-length segment is a noop, and we shouldn't return EFAULT. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/pagemap.h === --- linux-2.6.orig/include/linux/pagemap.h ++

[patch 3/3] fs: fix cont vs deadlock patches

2006-11-29 Thread Nick Piggin
-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -2004,19 +2004,20 @@ int block_read_full_page(struct page *pa return 0; } -/* utility fu

[patch 0/3] more buffered write fixes

2006-11-29 Thread Nick Piggin
Sorry, I should give some background. The following patches attempt to fix the problems people have identified with buffered write deadlock patches. Against 2.6.19 + the previous patchset dropped from -mm. Comments? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the

Re: [patch 1/3] mm: pagecache write deadlocks zerolength fix

2006-11-30 Thread Nick Piggin
On Thu, Nov 30, 2006 at 11:15:39AM +0100, Andreas Schwab wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > > writev with a zero-length segment is a noop, and we shouldn't return EFAULT. > > AFAICS the callers of these functions never pass a zero length. They can

Re: [patch 1/3] mm: pagecache write deadlocks zerolength fix

2006-11-30 Thread Nick Piggin
On Thu, Nov 30, 2006 at 11:30:33AM +0100, Andreas Schwab wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > > On Thu, Nov 30, 2006 at 11:15:39AM +0100, Andreas Schwab wrote: > >> Nick Piggin <[EMAIL PROTECTED]> writes: > >> > >> > writev

Re: [patch 3/3] fs: fix cont vs deadlock patches

2006-11-30 Thread Nick Piggin
code. Converts fat over to the new cont scheme. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -2004,19 +2004,20

Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

2006-11-30 Thread Nick Piggin
Evgeniy Polyakov wrote: On Thu, Nov 30, 2006 at 08:35:04AM +0100, Ingo Molnar ([EMAIL PROTECTED]) wrote: Doesn't the provided solution is just a in-kernel variant of the SCHED_FIFO set from userspace? Why kernel should be able to mark some users as having higher priority? What if workload of t

Re: The VFS cache is not freed when there is not enough free memory to allocate

2006-11-30 Thread Nick Piggin
Aubrey wrote: On 11/29/06, Nick Piggin <[EMAIL PROTECTED]> wrote: That was the order-9 allocation failure. Which is not going to be solved properly by just dropping caches. But Sonic apparently saw failures with 4K allocations, where the caches weren't getting shrunk properly. Th

Re: [PATCH] deny partial write for loop dev fd

2007-06-17 Thread Nick Piggin
On Sat, Jun 16, 2007 at 07:39:17PM +0400, Dmitriy Monakhov wrote: > Partial write can be easily supported by LO_CRYPT_NONE mode, but > it is not easy in LO_CRYPT_CRYPTOAPI case, because of its block nature. > I don't know who still used cryptoapi, but theoretically it is possible. > So let's leave

[RFC] fsblock

2007-06-23 Thread Nick Piggin
I'm announcing "fsblock" now because it is quite intrusive and so I'd like to get some thoughts about significantly changing this core part of the kernel. fsblock is a rewrite of the "buffer layer" (ding dong the witch is dead), which I have been working on, on and off and is now at the stage whe

[patch 2/3] block_dev: convert to fsblock

2007-06-23 Thread Nick Piggin
Convert block_dev mostly to fsblocks. --- fs/block_dev.c | 204 +++- fs/buffer.c | 113 ++-- fs/super.c |2 include/linux/buffer_head.h |9 - include/linux/fs.h | 29 ++

[patch 3/3] minix: convert to fsblock

2007-06-23 Thread Nick Piggin
Convert minix from buffer head to fsblock. --- fs/minix/bitmap.c | 148 +-- fs/minix/file.c |6 - fs/minix/inode.c| 172 ++-- fs/minix/itree_common.c | 227 f

Re: [RFC] fsblock

2007-06-23 Thread Nick Piggin
Just clarify a few things. Don't you hate rereading a long work you wrote? (oh, you're supposed to do that *before* you press send?). On Sun, Jun 24, 2007 at 03:45:28AM +0200, Nick Piggin wrote: > > I'm announcing "fsblock" now because it is quite intrusive and so

Re: [RFC] fsblock

2007-06-23 Thread Nick Piggin
On Sat, Jun 23, 2007 at 11:07:54PM -0400, Jeff Garzik wrote: > Nick Piggin wrote: > >- No deadlocks (hopefully). The buffer layer is technically deadlocky by > > design, because it can require memory allocations at page writeout-time. > > It also has one path that c

vm/fs meetup in september?

2007-06-23 Thread Nick Piggin
I'd just like to take the chance also to ask about a VM/FS meetup some time around kernel summit (maybe take a big of time during UKUUG or so). I was thinking about trying to arrange a proper mini summit thing, but it's a bit difficult and we could talk this year about doing it for subsequent year

Re: [PATCH] slob: poor man's NUMA support.

2007-06-24 Thread Nick Piggin
Paul Mundt wrote: This adds preliminary NUMA support to SLOB, primarily aimed at systems with small nodes (tested all the way down to a 128kB SRAM block), whether asymmetric or otherwise. Fine by me as well, FWIW. My points about per-cpu/node queues were not to say that I'm really opposed to ge

Re: [patch 10/26] SLUB: Faster more efficient slab determination for __kmalloc.

2007-06-24 Thread Nick Piggin
Andrew Morton wrote: On Tue, 19 Jun 2007 15:38:01 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: Ok and BUILD_BUG_ON really works? Had some bad experiences with it. hm, I don't recall any problems, apart from its very obscure error reporting. But if it breaks, we get an opportuni

Re: [patch 1/3] add the fsblock layer

2007-06-25 Thread Nick Piggin
Andi Kleen wrote: Nick Piggin <[EMAIL PROTECTED]> writes: [haven't read everything, just commenting on something that caught my eye] +struct fsblock { + atomic_tcount; + union { + struct { + unsigned long flags; /* XXX:

Re: [RFC] fsblock

2007-06-24 Thread Nick Piggin
Chris Mason wrote: On Sun, Jun 24, 2007 at 05:47:55AM +0200, Nick Piggin wrote: My gut feeling is that there are several problem areas you haven't hit yet, with the new code. I would agree with your gut :) Without having read the code yet (light reading for monday morning ;), ext

<    8   9   10   11   12   13   14   15   16   17   >