On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
> Calling a kmalloc_node on a possible node which is not yet onlined can
> lead to panic. Currently node_present_pages() doesn't verify the node is
> online before accessing the pgdat for the node. However pgdat struct may
> not be available resulting in a crash.
> 
> NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
> LR [c0000000003d5b94] __slab_alloc+0x34/0x60
> Call Trace:
> [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
> [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
> [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
> [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
> [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
> [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
> [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
> [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
> [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
> [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
> [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
> [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
> 
> Fix this by verifying the node is online before accessing the pgdat
> structure. Fix the same for node_spanned_pages() too.
> 
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: Mel Gorman <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Sachin Sant <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christopher Lameter <[email protected]>
> Cc: [email protected]
> Cc: Joonsoo Kim <[email protected]>
> Cc: Kirill Tkhai <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Srikar Dronamraju <[email protected]>
> Cc: Bharata B Rao <[email protected]>
> Cc: Nathan Lynch <[email protected]>
> 
> Reported-by: Sachin Sant <[email protected]>
> Tested-by: Sachin Sant <[email protected]>
> Signed-off-by: Srikar Dronamraju <[email protected]>
> ---
>  include/linux/mmzone.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f3f264826423..88078a3b95e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>       atomic_long_t           vm_stat[NR_VM_NODE_STAT_ITEMS];
>  } pg_data_t;
>  
> -#define node_present_pages(nid)      (NODE_DATA(nid)->node_present_pages)
> -#define node_spanned_pages(nid)      (NODE_DATA(nid)->node_spanned_pages)
> +#define node_present_pages(nid)              \
> +     (node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
> +#define node_spanned_pages(nid)              \
> +     (node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)

I believe this is a wrong approach. We really do not want to special
case all the places which require NODE_DATA. Can we please go and
allocate pgdat for all possible nodes?

The current state of memory less hacks subtle bugs poping up here and
there just prove that we should have done that from the very begining
IMHO.

>  #ifdef CONFIG_FLAT_NODE_MEM_MAP
>  #define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr))
>  #else
> -- 
> 2.18.1

-- 
Michal Hocko
SUSE Labs

Reply via email to