[PATCH] mmap.2: document new MAP_FIXED_NOREPLACE flag

2018-04-11 Thread mhocko
From: Michal Hocko 4.17+ kernels offer a new MAP_FIXED_NOREPLACE flag which allows the caller to atomicaly probe for a given address range. [wording heavily updated by John Hubbard ] Signed-off-by: Michal Hocko --- Hi, Andrew's sent the MAP_FIXED_NOREPLACE to Linus for the upcoming merge window

Re: [mmots-2016-06-09-16-49] sleeping function called from slab_alloc()

2016-06-10 Thread mhocko
On 2016-06-10 11:50, Sergey Senozhatsky wrote: Hello, forked from http://marc.info/?l=linux-mm&m=146553910928716&w=2 new_slab()->BUG->die()->exit_signals() can be called from atomic context: local IRQs disabled in slab_alloc(). I have sent a patch to drop the BUG() from that path today. It is

[PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim

2015-11-16 Thread mhocko
From: Michal Hocko __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if __GFP_NOFAIL is requested. This is fragile because we are basically relying on somebody else to make the reclaim (be it the direct reclaim or OOM killer) for us. The caller might be holding resources (e.g. l

[PATCH 0/2] get rid of __alloc_pages_high_priority

2015-11-16 Thread mhocko
Hi, this has been posted http://lkml.kernel.org/r/1447343618-19696-1-git-send-email-mhocko%40kernel.org last week. David has requested to split the patch into two parts one to removed and opencode __alloc_pages_high_priority without any functional changes and the other one which changes the retry

[PATCH 1/2] mm: get rid of __alloc_pages_high_priority

2015-11-16 Thread mhocko
From: Michal Hocko __alloc_pages_high_priority doesn't do anything special other than it calls get_page_from_freelist and loops around GFP_NOFAIL allocation until it succeeds. It would be better if the first part was done in __alloc_pages_slowpath where we modify the zonelist because this would b

[PATCH] mm: get rid of __alloc_pages_high_priority

2015-11-12 Thread mhocko
From: Michal Hocko __alloc_pages_high_priority doesn't do anything special other than it calls get_page_from_freelist and loops around GFP_NOFAIL allocation until it succeeds. It would be better if the first part was done in __alloc_pages_slowpath where we modify the zonelist because this would b

[PATCH] mm: Allow GFP_IOFS for page_cache_read page cache allocation

2015-11-11 Thread mhocko
e. Reported-by: Tetsuo Handa Signed-off-by: Michal Hocko --- Hi, this has been posted previously as a part of larger GFP_NOFS related patch set (http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org) but I think it makes sense to discuss it even out of that scope. I wou

[PATCH] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves

2015-11-11 Thread mhocko
-send-email-mhocko%40kernel.org) but Andrea was asking basically the same thing at LSF early this year (I cannot seem to find it in any public archive though). I think the patch makes some sense on its own. Comments? mm/page_alloc.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion

[PATCH] jbd2: get rid of superfluous __GFP_REPEAT

2015-11-06 Thread mhocko
From: Michal Hocko jbd2_alloc is explicit about its allocation preferences wrt. the allocation size. Sub page allocations go to the slab allocator and larger are using either the page allocator or vmalloc. This is all good but the logic is unnecessarily complex. Requests larger than order-3 are d

[PATCH 0/3] __GFP_REPEAT cleanup

2015-11-05 Thread mhocko
Hi, while working on something unrelated I've checked the current usage of __GFP_REPEAT in the tree. It seems that a good half of it is and always has been bogus because __GFP_REPEAT has always been about high order allocations while we are using it for order-0 or very small orders very often. It s

[PATCH 2/3] tree wide: get rid of __GFP_REPEAT for small order requests

2015-11-05 Thread mhocko
From: Michal Hocko __GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. Yet we have users which require this flag even though they are doing order-0 or small order allocation in the end: * arc: pte_alloc_one_kernel

[PATCH 3/3] jbd2: get rid of superfluous __GFP_REPEAT

2015-11-05 Thread mhocko
From: Michal Hocko jbd2_alloc is explicit about its allocation preferences wrt. the allocation size. Sub page allocations go to the slab allocator and larger are using either the page allocator or vmalloc. This is all good but the logic is unnecessarily complex. Requests larger than order-3 are d

[PATCH 1/3] tree wide: get rid of __GFP_REPEAT for order-0 allocations part I

2015-11-05 Thread mhocko
From: Michal Hocko __GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. Yet we have the full kernel tree with its usage for apparently order-0 allocations. This is really confusing because __GFP_REPEAT is explicitly

RFC: OOM detection rework v1

2015-10-29 Thread mhocko
Hi, as pointed by Linus [1][2] relying on zone_reclaimable as a way to communicate the reclaim progress is rater dubious. I tend to agree, not only it is really obscure, it is not hard to imagine cases where a single page freed in the loop keeps all the reclaimers looping without getting any progre

[RFC 1/3] mm, oom: refactor oom detection

2015-10-29 Thread mhocko
From: Michal Hocko __alloc_pages_slowpath has traditionally relied on the direct reclaim and did_some_progress as an indicator that it makes sense to retry allocation rather than declaring OOM. shrink_zones had to rely on zone_reclaimable if shrink_zone didn't make any progress to prevent from pr

[RFC 3/3] mm: use watermak checks for __GFP_REPEAT high order allocations

2015-10-29 Thread mhocko
From: Michal Hocko __alloc_pages_slowpath retries costly allocations until at least order worth of pages were reclaimed or the watermark check for at least one zone would succeed after all reclaiming all pages if the reclaim hasn't made any progress. The first condition was added by a41f24ea9fd6

[RFC 2/3] mm: throttle on IO only when there are too many dirty and writeback pages

2015-10-29 Thread mhocko
From: Michal Hocko wait_iff_congested has been used to throttle allocator before it retried another round of direct reclaim to allow the writeback to make some progress and prevent reclaim from looping over dirty/writeback pages without making any progress. We used to do congestion_wait before 0e

[PATCH] memcg: Fix thresholds for 32b architectures.

2015-10-27 Thread mhocko
From: Michal Hocko 424cdc141380 ("memcg: convert threshold to bytes") has fixed a regression introduced by 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") where thresholds were silently converted to use page units rather than bytes when interpreting the user input. The fix is not complet

[PATCH] mm, fs: Obey gfp_mapping for add_to_page_cache

2015-09-25 Thread mhocko
From: Michal Hocko 6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache allocation paths) has caught some users of hardcoded GFP_KERNEL used in the page cache allocation paths. This, however, wasn't complete and there were others which went unnoticed. Dave Chinner has reported the fol

[PATCH] uio: fix false positive __might_sleep warning splat

2015-09-07 Thread mhocko
From: Michal Hocko Andy has reported a __might_sleep warning [ 5174.883617] WARNING: CPU: 0 PID: 1532 at /home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90() [ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at [] uio_read+0x91/0x170 [uio] [ 5174.8851

[PATCH] scsi: fix scsi_error_handler vs. scsi_host_dev_release race

2015-08-27 Thread mhocko
From: Michal Hocko b9d5c6b7ef57 ("[SCSI] cleanup setting task state in scsi_error_handler()") has introduced a race between scsi_error_handler and scsi_host_dev_release resulting in the hang when the device goes away because scsi_error_handler might miss a wake up: CPU0

[PATCH] net, skbuff: Get rid of unused skb_propagate_pfmemalloc

2015-08-27 Thread mhocko
From: Michal Hocko It seems that skb_propagate_pfmemalloc has never had a user since it was introduced by 0614002bb5f7 ("netvm: propagate page->pfmemalloc from skb_alloc_page to skb"). Remove it. Signed-off-by: Michal Hocko --- Hi, this has been noticed while working on 2f064f3485cd ("mm: make

[PATCH 2/2] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio

2015-08-19 Thread mhocko
From: Michal Hocko alloc_btrfs_bio relies on GFP_NOFS allocation when committing the transaction but this allocation context is rather weak wrt. reclaim capabilities. The page allocator currently tries hard to not fail these allocations if they are small (<=PAGE_ALLOC_COSTLY_ORDER) but it can sti

[PATCH 1/2] btrfs: Prevent from early transaction abort

2015-08-19 Thread mhocko
From: Michal Hocko Btrfs relies on GFP_NOFS allocation when committing the transaction but this allocation context is rather weak wrt. reclaim capabilities. The page allocator currently tries hard to not fail these allocations if they are small (<=PAGE_ALLOC_COSTLY_ORDER) so this is not a problem

[PATCH 0/2] btrfs: fortification for GFP_NOFS allocations

2015-08-19 Thread mhocko
Hi, these two patches were sent as a part of a larger RFC which aims at allowing GFP_NOFS allocations to fail to help sort out memory reclaim issues bound to the current behavior (http://marc.info/?l=linux-mm&m=143876830616538&w=2). It is clear that move to the GFP_NOFS behavior change is a long t

[PATCH] mm: make page pfmemalloc check more robust

2015-08-13 Thread mhocko
From: Michal Hocko The patch c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added the checks for page->pfmemalloc to __skb_fill_page_desc(): if (page->pfmemalloc && !page->mapping) skb->pfmemalloc = true; It assumes page->mapping == NULL implies that page->pfm

[RFC 2/8] mm: Allow GFP_IOFS for page_cache_read page cache allocation

2015-08-05 Thread mhocko
From: Michal Hocko page_cache_read has been historically using page_cache_alloc_cold to allocate a new page. This means that mapping_gfp_mask is used as the base for the gfp_mask. Many filesystems are setting this mask to GFP_NOFS to prevent from fs recursion issues. page_cache_read is, however,

[RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM

2015-08-05 Thread mhocko
From: Johannes Weiner GFP_NOFS allocations are not allowed to invoke the OOM killer since their reclaim abilities are severely diminished. However, without the OOM killer available there is no hope of progress once the reclaimable pages have been exhausted. Don't risk hanging these allocations.

[RFC 5/8] ext4: Do not fail journal due to block allocator

2015-08-05 Thread mhocko
From: Michal Hocko Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM" memory allocator doesn't endlessly loop to satisfy low-order allocations and instead fails them to allow callers to handle them gracefully. Some of the callers are not yet prepared for this behavior though. e

[RFC 6/8] ext3: Do not abort journal prematurely

2015-08-05 Thread mhocko
From: Michal Hocko journal_get_undo_access is relying on GFP_NOFS allocation yet it is essential for the journal transaction: [ 83.256914] journal_get_undo_access: No memory for committed data [ 83.258022] EXT3-fs: ext3_free_blocks_sb: aborting transaction: Out of memory in __ext3_journal_ge

[RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio

2015-08-05 Thread mhocko
From: Michal Hocko alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is allowed to fail which can lead to [ 37.928625] kernel BUG at fs/btrfs/extent_io.c:4045 This is clearly undesirable and the nofail behavio

[RFC 7/8] btrfs: Prevent from early transaction abort

2015-08-05 Thread mhocko
From: Michal Hocko Btrfs relies on GFP_NOFS allocation when commiting the transaction but since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM" those allocations are allowed to fail which can lead to a pre-mature transaction abort: [ 55.328093] Call Trace: [ 55.328890] [] dum

[RFC 0/8] Allow GFP_NOFS allocation to fail

2015-08-05 Thread mhocko
Hi, small GFP_NOFS, like GFP_KERNEL, allocations have not been not failing traditionally even though their reclaim capabilities are restricted because the VM code cannot recurse into filesystems to clean dirty pages. At the same time these allocation requests do not allow to trigger the OOM killer

[RFC 1/8] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves

2015-08-05 Thread mhocko
From: Michal Hocko __GFP_NOFAIL is a big hammer used to ensure that the allocation request can never fail. This is a strong requirement and as such it also deserves a special treatment when the system is OOM. The primary problem here is that the allocation request might have come with some locks

[RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure

2015-08-05 Thread mhocko
From: Michal Hocko Journal transaction might fail prematurely because the frozen_buffer is allocated by GFP_NOFS request: [ 72.440013] do_get_write_access: OOM for frozen_buffer [ 72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_wri