From: Michal Hocko
4.17+ kernels offer a new MAP_FIXED_NOREPLACE flag which allows the caller to
atomicaly probe for a given address range.
[wording heavily updated by John Hubbard ]
Signed-off-by: Michal Hocko
---
Hi,
Andrew's sent the MAP_FIXED_NOREPLACE to Linus for the upcoming merge
window
On 2016-06-10 11:50, Sergey Senozhatsky wrote:
Hello,
forked from http://marc.info/?l=linux-mm&m=146553910928716&w=2
new_slab()->BUG->die()->exit_signals() can be called from atomic
context: local IRQs disabled in slab_alloc().
I have sent a patch to drop the BUG() from that path today. It
is
From: Michal Hocko
__alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
__GFP_NOFAIL is requested. This is fragile because we are basically
relying on somebody else to make the reclaim (be it the direct reclaim
or OOM killer) for us. The caller might be holding resources (e.g.
l
Hi,
this has been posted
http://lkml.kernel.org/r/1447343618-19696-1-git-send-email-mhocko%40kernel.org
last week. David has requested to split the patch into two parts
one to removed and opencode __alloc_pages_high_priority without
any functional changes and the other one which changes the retry
From: Michal Hocko
__alloc_pages_high_priority doesn't do anything special other than it
calls get_page_from_freelist and loops around GFP_NOFAIL allocation
until it succeeds. It would be better if the first part was done in
__alloc_pages_slowpath where we modify the zonelist because this would
b
From: Michal Hocko
__alloc_pages_high_priority doesn't do anything special other than it
calls get_page_from_freelist and loops around GFP_NOFAIL allocation
until it succeeds. It would be better if the first part was done in
__alloc_pages_slowpath where we modify the zonelist because this would
b
e.
Reported-by: Tetsuo Handa
Signed-off-by: Michal Hocko
---
Hi,
this has been posted previously as a part of larger GFP_NOFS related
patch set
(http://lkml.kernel.org/r/1438768284-30927-1-git-send-email-mhocko%40kernel.org)
but I think it makes sense to discuss it even out of that scope.
I wou
-send-email-mhocko%40kernel.org)
but Andrea was asking basically the same thing at LSF early this year
(I cannot seem to find it in any public archive though). I think the
patch makes some sense on its own.
Comments?
mm/page_alloc.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion
From: Michal Hocko
jbd2_alloc is explicit about its allocation preferences wrt. the
allocation size. Sub page allocations go to the slab allocator
and larger are using either the page allocator or vmalloc. This
is all good but the logic is unnecessarily complex. Requests larger
than order-3 are d
Hi,
while working on something unrelated I've checked the current usage
of __GFP_REPEAT in the tree. It seems that a good half of it is
and always has been bogus because __GFP_REPEAT has always been about
high order allocations while we are using it for order-0 or very small
orders very often. It s
From: Michal Hocko
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations. Yet we have
users which require this flag even though they are doing order-0 or
small order allocation in the end:
* arc: pte_alloc_one_kernel
From: Michal Hocko
jbd2_alloc is explicit about its allocation preferences wrt. the
allocation size. Sub page allocations go to the slab allocator
and larger are using either the page allocator or vmalloc. This
is all good but the logic is unnecessarily complex. Requests larger
than order-3 are d
From: Michal Hocko
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations. Yet we have
the full kernel tree with its usage for apparently order-0 allocations.
This is really confusing because __GFP_REPEAT is explicitly
Hi,
as pointed by Linus [1][2] relying on zone_reclaimable as a way to
communicate the reclaim progress is rater dubious. I tend to agree,
not only it is really obscure, it is not hard to imagine cases where a
single page freed in the loop keeps all the reclaimers looping without
getting any progre
From: Michal Hocko
__alloc_pages_slowpath has traditionally relied on the direct reclaim
and did_some_progress as an indicator that it makes sense to retry
allocation rather than declaring OOM. shrink_zones had to rely on
zone_reclaimable if shrink_zone didn't make any progress to prevent
from pr
From: Michal Hocko
__alloc_pages_slowpath retries costly allocations until at least
order worth of pages were reclaimed or the watermark check for at least
one zone would succeed after all reclaiming all pages if the reclaim
hasn't made any progress.
The first condition was added by a41f24ea9fd6
From: Michal Hocko
wait_iff_congested has been used to throttle allocator before it retried
another round of direct reclaim to allow the writeback to make some
progress and prevent reclaim from looping over dirty/writeback pages
without making any progress. We used to do congestion_wait before
0e
From: Michal Hocko
424cdc141380 ("memcg: convert threshold to bytes") has fixed a
regression introduced by 3e32cb2e0a12 ("mm: memcontrol: lockless page
counters") where thresholds were silently converted to use page units
rather than bytes when interpreting the user input.
The fix is not complet
From: Michal Hocko
6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache
allocation paths) has caught some users of hardcoded GFP_KERNEL
used in the page cache allocation paths. This, however, wasn't complete
and there were others which went unnoticed.
Dave Chinner has reported the fol
From: Michal Hocko
Andy has reported a __might_sleep warning
[ 5174.883617] WARNING: CPU: 0 PID: 1532 at
/home/agrover/git/kernel/kernel/sched/core.c:7389 __might_sleep+0x7d/0x90()
[ 5174.884407] do not call blocking ops when !TASK_RUNNING; state=1 set at
[] uio_read+0x91/0x170 [uio]
[ 5174.8851
From: Michal Hocko
b9d5c6b7ef57 ("[SCSI] cleanup setting task state in
scsi_error_handler()") has introduced a race between scsi_error_handler
and scsi_host_dev_release resulting in the hang when the device goes
away because scsi_error_handler might miss a wake up:
CPU0
From: Michal Hocko
It seems that skb_propagate_pfmemalloc has never had a user since it was
introduced by 0614002bb5f7 ("netvm: propagate page->pfmemalloc from
skb_alloc_page to skb"). Remove it.
Signed-off-by: Michal Hocko
---
Hi,
this has been noticed while working on 2f064f3485cd ("mm: make
From: Michal Hocko
alloc_btrfs_bio relies on GFP_NOFS allocation when committing the
transaction but this allocation context is rather weak wrt. reclaim
capabilities. The page allocator currently tries hard to not fail these
allocations if they are small (<=PAGE_ALLOC_COSTLY_ORDER) but it can
sti
From: Michal Hocko
Btrfs relies on GFP_NOFS allocation when committing the transaction but
this allocation context is rather weak wrt. reclaim capabilities. The
page allocator currently tries hard to not fail these allocations if
they are small (<=PAGE_ALLOC_COSTLY_ORDER) so this is not a problem
Hi,
these two patches were sent as a part of a larger RFC which aims at
allowing GFP_NOFS allocations to fail to help sort out memory reclaim
issues bound to the current behavior
(http://marc.info/?l=linux-mm&m=143876830616538&w=2).
It is clear that move to the GFP_NOFS behavior change is a long t
From: Michal Hocko
The patch c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb")
added the checks for page->pfmemalloc to __skb_fill_page_desc():
if (page->pfmemalloc && !page->mapping)
skb->pfmemalloc = true;
It assumes page->mapping == NULL implies that page->pfm
From: Michal Hocko
page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page. This means that mapping_gfp_mask is used as the
base for the gfp_mask. Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
however,
From: Johannes Weiner
GFP_NOFS allocations are not allowed to invoke the OOM killer since
their reclaim abilities are severely diminished. However, without the
OOM killer available there is no hope of progress once the reclaimable
pages have been exhausted.
Don't risk hanging these allocations.
From: Michal Hocko
Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.
Some of the callers are not yet prepared for this behavior though. e
From: Michal Hocko
journal_get_undo_access is relying on GFP_NOFS allocation yet it is
essential for the journal transaction:
[ 83.256914] journal_get_undo_access: No memory for committed data
[ 83.258022] EXT3-fs: ext3_free_blocks_sb: aborting transaction: Out
of memory in __ext3_journal_ge
From: Michal Hocko
alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
allowed to fail which can lead to
[ 37.928625] kernel BUG at fs/btrfs/extent_io.c:4045
This is clearly undesirable and the nofail behavio
From: Michal Hocko
Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:
[ 55.328093] Call Trace:
[ 55.328890] [] dum
Hi,
small GFP_NOFS, like GFP_KERNEL, allocations have not been not failing
traditionally even though their reclaim capabilities are restricted
because the VM code cannot recurse into filesystems to clean dirty
pages. At the same time these allocation requests do not allow to
trigger the OOM killer
From: Michal Hocko
__GFP_NOFAIL is a big hammer used to ensure that the allocation
request can never fail. This is a strong requirement and as such
it also deserves a special treatment when the system is OOM. The
primary problem here is that the allocation request might have
come with some locks
From: Michal Hocko
Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[ 72.440013] do_get_write_access: OOM for frozen_buffer
[ 72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction:
Out of memory in __ext4_journal_get_wri
35 matches
Mail list logo