test_pages_in_a_zone does not account for the possibility of missing sections
in the given pfn range. Since pfn_valid_within always returns 1 when
CONFIG_HOLES_IN_ZONE is not set, invalid pfns from missing sections
will pass the test, resulting in a kernel oops. This is remedied by simply
checking for the presence of the pfn's section. We don't have to remove
the pfn_valid_within optimization.

The patch also prevents a crash from offlining memory devices with missing
sections. Despite this, it's probably best to keep

[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks withmissing 
sections

because missing sections may indicate other problems, like overlapping mem
blocks and who knows what else (see the discussion at BZ 107781).

---
 mm/memory_hotplug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 67d488a..74f5bcd 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1383,6 +1383,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, 
unsigned long end_pfn)
             pfn < end_pfn;
             pfn += MAX_ORDER_NR_PAGES) {
                i = 0;
+               /* Make sure the memory section is present */
+               if (!present_section_nr(pfn_to_section_nr(pfn)))
+                       continue;
                /* This is just a CONFIG_HOLES_IN_ZONE check.*/
                while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i))
                        i++;
-- 
1.7.12.4


On 12/02/2015 04:45 PM, Andrew Morton wrote:
> On Wed,  2 Dec 2015 09:07:01 -0600 Seth Jennings <sjenni...@variantweb.net> 
> wrote:
> 
>> bdee237c and 982792c7 introduced large block sizes for x86.
>> This made it possible to have multiple sections per memory
>> block where previously, there was a only every one section
>> per block.
>>
>> Since blocks consist of contiguous ranges of section, there
>> can be holes in the blocks where sections are not present.
>> If one attempts to offline such a block, a crash occurs since
>> the code is not designed to deal with this.
>>
>> This patch is a quick fix to gaurd against the crash by
>> not allowing blocks with non-present sections to be offlined.
>>
>> ...
>>
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -303,6 +303,10 @@ static int memory_subsys_offline(struct device *dev)
>>      if (mem->state == MEM_OFFLINE)
>>              return 0;
>>  
>> +    /* Can't offline block with non-present sections */
>> +    if (mem->section_count != sections_per_block)
>> +            return -EINVAL;
>> +
>>      return memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
>>  }
> 
> [3/3] fixes a kernel crash so I've tagged it for -stable and shall move
> it ahead of [1/2] and [2/2], which are merely cleanups.
> 
> This assumes that [3/3] is independent of the other two patches.  I'll
> eat my hat if it isn't.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to