Hi Paul, Paul Szabo wrote:
> I changed the proposed patch accordingly, scripts/checkpatch.pl produces > just a few warnings. I had my patch in use for a while now, so I believe > it is suitably tested. Thanks much --- this is easier to read. Here's a quick review from a non-expert (i.e., me). When the patch seems ready to you, please send it inline in a message to linux...@kvack.org, cc-ing linux-ker...@vger.kernel.org and Ben and me (or this bug log) so we can track it. The mm folks will probably have more feedback, and we can take it from there. > Avoid OOM when filesystem caches fill lowmem and are not reclaimed, > doing drop_caches at that point. The issue is easily reproducible on > machines with over 32GB RAM. The patch correctly protects against OOM. > The added call to drop_caches has been observed to trigger "needlessly" > but on quite rare occasions only. Sounds like a reasonable strategy. > Also included are several minor fixes: Those should go in separate patches, but if you're just seeking comments then lumping them with the main change in a patch with [RFC/PATCH] in the subject line is fine. [...] > Signed-off-by: Paul Szabo <p...@maths.usyd.edu.au> Thanks. > --- fs/drop_caches.c.old 2012-10-17 13:50:15.000000000 +1100 > +++ fs/drop_caches.c 2013-01-04 21:52:47.000000000 +1100 > @@ -65,3 +65,10 @@ int drop_caches_sysctl_handler(ctl_table > } > return 0; > } > + > +/* Easy call: do "echo 3 > /proc/sys/vm/drop_caches" */ I don't understand this comment. What does "Easy call" mean? > +void easy_drop_caches(void) > +{ > + iterate_supers(drop_pagecache_sb, NULL); > + drop_slab(); > +} This should be declared in some appropriate header (e.g., include/linux/mm.h). > --- mm/page-writeback.c.old 2012-10-17 13:50:15.000000000 +1100 > +++ mm/page-writeback.c 2013-01-06 21:54:59.000000000 +1100 > @@ -39,7 +39,8 @@ > /* > * Sleep at most 200ms at a time in balance_dirty_pages(). > */ > -#define MAX_PAUSE max(HZ/5, 1) > +/* Might as well be max(HZ/5,4) to ensure max_pause/4>0 always */ > +#define MAX_PAUSE max(HZ/5, 4) This is one of the "while at it"s rather than the main point of the patch, right? [...] > @@ -343,11 +344,26 @@ static unsigned long highmem_dirtyable_m > unsigned long determine_dirtyable_memory(void) > { > unsigned long x; > + int y = 0; > + extern int min_free_kbytes; Probably should be declared in a header like mm/internal.h, but there's precedent for the ad-hoc declaration, so meh. [...] > + /* > + * Seems that highmem_is_dirtyable is only used here, in the > + * calculation of limits and threshholds of dirtiness, not in deciding > + * where to put dirty things. Is that so? Is that as should be? > + * What is the recommended setting of highmem_is_dirtyable? > + */ > if (!vm_highmem_is_dirtyable) > x -= highmem_dirtyable_memory(x); I dunno. See 195cf453d2c3 (mm/page-writeback: highmem_is_dirtyable option, 2008-02-04) and the thread surrounding <http://thread.gmane.org/gmane.linux.alsa.devel/49438/focus=603793>. > + /* Subtract min_free_kbytes */ > + if (min_free_kbytes > 0) > + y = min_free_kbytes >> (PAGE_SHIFT - 10); Why the "if"? If I were doing it, I think I'd unconditionally set 'y' here, for clarity and to avoid bugs if some later patch reuses the var for something else. y = min_free_kbytes >> (PAGE_SHIFT - 10); x -= min(x, y); [...] > @@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio( > * => fast response on large errors; small oscillation near setpoint > */ > setpoint = (freerun + limit) / 2; > - x = div_s64((setpoint - dirty) << RATELIMIT_CALC_SHIFT, > + x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT, > limit - setpoint + 1); Why are 'setpoint' and 'dirty' of type unsigned long instead of long in the first place? What happens if one of their values overflows a long? > pos_ratio = x; > pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT; > @@ -995,6 +1014,13 @@ static unsigned long bdi_max_pause(struc > * The pause time will be settled within range (max_pause/4, max_pause). > * Apply a minimal value of 4 to get a non-zero max_pause/4. > */ > + /* > + * On large machine it seems we always return 4, > + * on smaller desktop machine mostly return 5 (rarely 9 or 14). > + * Are those too small? Should we return something fixed e.g. > + return (HZ/10); > + * instead of this wasted/useless calculation? > + */ > return clamp_val(t, 4, MAX_PAUSE); Another "while at it", I guess. > } > > @@ -1109,6 +1135,11 @@ static void balance_dirty_pages(struct a > } > pause = HZ * pages_dirtied / task_ratelimit; > if (unlikely(pause <= 0)) { > + /* > + * Not unlikely: often we get zero. > + * Seems we always get 0 on large machine. > + * Should not do a pause of 1 here? > + */ > trace_balance_dirty_pages(bdi, "git log -S'if (unlikely(pause <= 0))' -- mm/page-writeback.c" tells me this is from 57fc978cfb61 (writeback: control dirty pause time, 2011-06-11), in case that helps. [...] > --- mm/vmscan.c.old 2012-10-17 13:50:15.000000000 +1100 > +++ mm/vmscan.c 2013-01-06 09:50:49.000000000 +1100 [...] > @@ -2726,9 +2731,87 @@ loop_again: > nr_slab = shrink_slab(&shrink, sc.nr_scanned, > lru_pages); > sc.nr_reclaimed += > reclaim_state->reclaimed_slab; > total_scanned += sc.nr_scanned; > + if (unlikely( > + i == 1 && > + nr_slab < 10 && > + (reclaim_state->reclaimed_slab) < 10 && > + zone_page_state(zone, NR_SLAB_RECLAIMABLE) > > 10 && > + !zone_watermark_ok_safe(zone, order, > + high_wmark_pages(zone), > end_zone, 0))) { > + /* > + * We are stressed (desperate), better This is getting really deeply nested. Would it be possible to split out a function so this code could be more easily contemplated in isolation? Thanks for taking this on and hope that helps. Regards, Jonathan -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130107012024.GC3823@elie.Belkin