Dear Jonathan, > Here's a quick review from a non-expert (i.e., me). ...
Thanks! > ... When the patch > seems ready to you, please send it inline in a message to > linux...@kvack.org, cc-ing linux-ker...@vger.kernel.org and Ben and me > (or this bug log) so we can track it. The mm folks will probably have > more feedback, and we can take it from there. I was hoping that Ben would take that on; but OK, will do (first waiting for maybe Ben's comments, and your reply to my questions below). >> Also included are several minor fixes: > Those should go in separate patches, but if you're just seeking > comments then lumping them with the main change in a patch with > [RFC/PATCH] in the subject line is fine. In a sense, I am seeking comments on why lowmem is ever polluted with caches and why slabs are not cleared up without dropping pagecache, as I feel that the "real bug" is lurking there. - I admit that I do not understand MM, it seems a mess to my un-educatedf eyes. - Some of those comments are questions, some others I am sure are bugs. OK, will use [RFC/PATCH] in the subject. >> +/* Easy call: do "echo 3 > /proc/sys/vm/drop_caches" */ > > I don't understand this comment. What does "Easy call" mean? > >> +void easy_drop_caches(void) >> +{ >> + iterate_supers(drop_pagecache_sb, NULL); >> + drop_slab(); >> +} > > This should be declared in some appropriate header (e.g., > include/linux/mm.h). I added one new call, so I could easily call it from elsewhere without having to (cross?)-declare many things; in fact would have preferred to call iterate_supers() and drop_slab() directly. Yes, easy_drop_caches() might be declared in some include file, but should that be <linux/mm.h> or <linux/fs.h>? Would add another file to those changed, seemed more direct this way. Should I change something? >> --- mm/page-writeback.c.old 2012-10-17 13:50:15.000000000 +1100 >> +++ mm/page-writeback.c 2013-01-06 21:54:59.000000000 +1100 >> @@ -39,7 +39,8 @@ >> /* >> * Sleep at most 200ms at a time in balance_dirty_pages(). >> */ >> -#define MAX_PAUSE max(HZ/5, 1) >> +/* Might as well be max(HZ/5,4) to ensure max_pause/4>0 always */ >> +#define MAX_PAUSE max(HZ/5, 4) > > This is one of the "while at it"s rather than the main point of > the patch, right? Yes, it was one of the unimportant oddities I noticed. > [...] >> @@ -343,11 +344,26 @@ static unsigned long highmem_dirtyable_m >> unsigned long determine_dirtyable_memory(void) >> { >> unsigned long x; >> + int y = 0; >> + extern int min_free_kbytes; > > Probably should be declared in a header like mm/internal.h, but > there's precedent for the ad-hoc declaration, so meh. Yes, I noticed some precedents. > [...] >> + /* >> + * Seems that highmem_is_dirtyable is only used here, in the >> + * calculation of limits and threshholds of dirtiness, not in deciding >> + * where to put dirty things. Is that so? Is that as should be? >> + * What is the recommended setting of highmem_is_dirtyable? >> + */ >> if (!vm_highmem_is_dirtyable) >> x -= highmem_dirtyable_memory(x); > > I dunno. See 195cf453d2c3 (mm/page-writeback: highmem_is_dirtyable > option, 2008-02-04) and the thread surrounding > <http://thread.gmane.org/gmane.linux.alsa.devel/49438/focus=603793>. That made my head spin... Thanks for the pointers, will go through them sometime. >> + /* Subtract min_free_kbytes */ >> + if (min_free_kbytes > 0) >> + y = min_free_kbytes >> (PAGE_SHIFT - 10); > > Why the "if"? Sanity-check paranoia: because I do not trust the input values. Maybe should have BUG_ON(min_free_kbytes < 0); but that does not belong there. > If I were doing it, I think I'd unconditionally set 'y' here, for > clarity and to avoid bugs if some later patch reuses the var for > something else. > > y = min_free_kbytes >> (PAGE_SHIFT - 10); > x -= min(x, y); Do you think I should change my patch? To me, it seemed clearer the way I had it. > [...] >> @@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio( >> * => fast response on large errors; small oscillation near setpoint >> */ >> setpoint = (freerun + limit) / 2; >> - x = div_s64((setpoint - dirty) << RATELIMIT_CALC_SHIFT, >> + x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT, >> limit - setpoint + 1); > > Why are 'setpoint' and 'dirty' of type unsigned long instead of long > in the first place? What happens if one of their values overflows a > long? I guess because most other such things are also unsigned long; and I guess they were so when they meant memory addresses. These are now page counts, cannot overflow (with current limit of 64GB RAM). >> @@ -995,6 +1014,13 @@ static unsigned long bdi_max_pause(struc >> * The pause time will be settled within range (max_pause/4, max_pause). >> * Apply a minimal value of 4 to get a non-zero max_pause/4. >> */ >> + /* >> + * On large machine it seems we always return 4, >> + * on smaller desktop machine mostly return 5 (rarely 9 or 14). >> + * Are those too small? Should we return something fixed e.g. >> + return (HZ/10); >> + * instead of this wasted/useless calculation? >> + */ >> return clamp_val(t, 4, MAX_PAUSE); > > Another "while at it", I guess. Yes. On one hand the code sometimes pauses for HZ/10 or so, and on the other hand we have this routine working so hard to return the minimum possible value. >> @@ -1109,6 +1135,11 @@ static void balance_dirty_pages(struct a >> } >> pause = HZ * pages_dirtied / task_ratelimit; >> if (unlikely(pause <= 0)) { >> + /* >> + * Not unlikely: often we get zero. >> + * Seems we always get 0 on large machine. >> + * Should not do a pause of 1 here? >> + */ >> trace_balance_dirty_pages(bdi, > > "git log -S'if (unlikely(pause <= 0))' -- mm/page-writeback.c" tells > me this is from 57fc978cfb61 (writeback: control dirty pause time, > 2011-06-11), in case that helps. Will try to look it up sometime. - I had printk() to tell me the value of pause, and mostly I got zero. Wonder how others measured it. > [...] >> --- mm/vmscan.c.old 2012-10-17 13:50:15.000000000 +1100 >> +++ mm/vmscan.c 2013-01-06 09:50:49.000000000 +1100 > [...] >> @@ -2726,9 +2731,87 @@ loop_again: >> nr_slab = shrink_slab(&shrink, sc.nr_scanned, >> lru_pages); >> sc.nr_reclaimed += >> reclaim_state->reclaimed_slab; >> total_scanned += sc.nr_scanned; >> + if (unlikely( >> + i == 1 && >> + nr_slab < 10 && >> + (reclaim_state->reclaimed_slab) < 10 && >> + zone_page_state(zone, NR_SLAB_RECLAIMABLE) >> > 10 && >> + !zone_watermark_ok_safe(zone, order, >> + high_wmark_pages(zone), >> end_zone, 0))) { >> + /* >> + * We are stressed (desperate), better > > This is getting really deeply nested. Would it be possible to split > out a function so this code could be more easily contemplated in > isolation? Hmm... I would much prefer to leave it as is. Thanks, Paul Paul Szabo p...@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/ School of Mathematics and Statistics University of Sydney Australia -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201301070303.r0733hs0026...@como.maths.usyd.edu.au