On 2017-01-19 Michal Hocko wrote: > On Thu 19-01-17 03:48:50, Trevor Cordes wrote: > > On 2017-01-17 Michal Hocko wrote: > > > On Tue 17-01-17 14:21:14, Mel Gorman wrote: > > > > On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko > > > > wrote: > > > > > On Mon 16-01-17 11:09:34, Mel Gorman wrote: > > > > > [...] > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > > index 532a2a750952..46aac487b89a 100644 > > > > > > --- a/mm/vmscan.c > > > > > > +++ b/mm/vmscan.c > > > > > > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct > > > > > > zonelist *zonelist, struct scan_control *sc) continue; > > > > > > > > > > > > if (sc->priority != DEF_PRIORITY && > > > > > > + !buffer_heads_over_limit && > > > > > > !pgdat_reclaimable(zone->zone_pgdat)) > > > > > > continue; /* Let > > > > > > kswapd poll it */ > > > > > > > > > > I think we should rather remove pgdat_reclaimable here. This > > > > > sounds like a wrong layer to decide whether we want to reclaim > > > > > and how much. > > > > > > > > I had considered that but it'd also be important to add the > > > > other 32-bit patches you have posted to see the impact. Because > > > > of the ratio of LRU pages to slab pages, it may not have an > > > > impact but it'd need to be eliminated. > > > > > > OK, Trevor you can pull from > > > git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git tree > > > fixes/highmem-node-fixes branch. This contains the current mmotm > > > tree > > > + the latest highmem fixes. I also do not expect this would help > > > much in your case but as Mel've said we should rule that out at > > > least. > > > > Hi! The git tree above version oom'd after < 24 hours (3:02am) so > > it doesn't solve the bug. If you need a oom messages dump let me > > know. > > Yes please.
The first oom from that night attached. Note, the oom wasn't as dire with your mhocko/4.9.0+ as it usually is with stock 4.8.x: my oom detector and reboot script was able to do its thing cleanly before the system became unusable. I'll await further instructions and test right away. Maybe I'll try a few tuning ideas until then. Thanks! > > Let me know what to try next, guys, and I'll test it out. > > > > > > Before prototyping such a thing, I'd like to hear the outcome of > > > > this heavy hack and then add your 32-bit patches onto the list. > > > > If the problem is still there then I'd next look at taking slab > > > > pages into account in pgdat_reclaimable() instead of an > > > > outright removal that has a much wider impact. If that doesn't > > > > work then I'll prototype a heavy-handed forced slab reclaim > > > > when lower zones are almost all slab pages. > > > > I don't think I've tried the "heavy hack" patch yet? It's not in > > the mhocko tree I just tried? Should I try the heavy hack on top > > of mhocko git or on vanilla or what? > > > > I also want to mention that these PAE boxes suffer from another > > problem/bug that I've worked around for almost a year now. For some > > reason it keeps gnawing at me that it might be related. The disk > > I/O goes to pot on this/these PAE boxes after a certain amount of > > disk writes (like some unknown number of GB, around 10-ish maybe). > > Like writes go from 500MB/s to 10MB/s!! Reboot and it's magically > > 500MB/s again. I detail this here: > > https://muug.ca/pipermail/roundtable/2016-June/004669.html > > My fix was to mem=XG where X is <8 (like 4 or 6) to force the PAE > > kernel to be more sane about highmem choices. I never filed a bug > > because I read a ton of stuff saying Linus hates PAE, don't use over > > 4G, blah blah. But the other fix is to: > > set /proc/sys/vm/highmem_is_dirtyable to 1 > > Yes this sounds like a dirty memory throttling and there were some > changes in that area. I do not remember when exactly. I think my PAE-slow-IO bug started way back in Fedora 22 (4.0?), hard to know exactly when as I didn't discover the bug for maybe a year as I didn't realize IO was the problem right away. Too late to bisect that one, I guess. I guess it's not related so we can ignore my tangent! > > I'm not bringing this up to get attention to a new bug, I bring > > this up because it smells like it might be related. If something > > slowly eats away at the box's vm to the point that I/O gets > > horribly slow, perhaps it's related to the slab and high/lomem > > issue we have here? And if related, it may help to solve the oom > > bug. If I'm way off base here, just ignore my tangent! > > >From your OOM reports so far it doesn't really seem related because > >you > never had large number of pages under the writeback when OOM. > > The situation with the PAE kernel is unfortunate but it is really hard > to do anything about that considering that the kernel and most its > allocations have to live in a small and scarce lowmem memory. Moreover > the more memory you have to more you have to allocated from that > memory. You're for sure right that the IO-slow bug was definitely worse the more ram was in a system! (The mem=4G really helps alleviate this bug and is good enough for me.) > This is why not only Linus hates 32b systems on a large memory > systems. Completely off-topic: it would be great if rather than pretending PAE should work with large RAM (which seems more broken every day), the kernel guys put out an officially stated policy of a maximum RAM you can use, and try to have the kernel behave for <= that size, and then people could use more RAM but clearly "at your own risk, don't bug us about problems!". Other than a few posts about Linus hating it, there's nothing official I can find about it in documentation, etc. It gives the (mis)impression that it's perfectly fine to run PAE on a zillion GB modern system. Then we later learn the hard way :-)
oom3
Description: Binary data