Hi! > > When the machine doesn't well handle the e820 persistent when hibernate > > resuming, then it may causes page fault when writing image to snapshot > > buffer: > > > > [ 17.929495] BUG: unable to handle kernel paging request at > > ffff880069d4f000 > > [ 17.933469] IP: [<ffffffff810a1cf0>] load_image_lzo+0x810/0xe40 > > [ 17.933469] PGD 2194067 PUD 77ffff067 PMD 2197067 PTE 0 > > [ 17.933469] Oops: 0002 [#1] SMP > > ... > > > > The ffff880069d4f000 page is in e820 reserved region of resume boot > > kernel: > > > > [ 0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff] > > reserved > > ... > > [ 0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff] > > > > So snapshot.c mark the pfn to forbidden pages map. But, this > > page is also in the memory bitmap in snapshot image because it's an > > original page used by image kernel, so it will also mark as an > > unsafe(free) page in prepare_image(). > > > > That means the page in e820 when resuming mark as "forbidden" and > > "free", it causes get_buffer() treat it as an allocated unsafe page. > > Then snapshot_write_next() return this page to load_image, load_image > > writing content to this address, but this page didn't really allocated > > . So, we got page fault. > > > > Although the root cause is from BIOS, I think aggressive check and > > significant message in kernel will better then a page fault for > > issue tracking, especially when serial console unavailable. > > > > This patch adds code in mark_unsafe_pages() for check does free pages in > > nosave region. If so, then it print message and return fault to stop whole > > S4 resume process: > > > > [ 8.166004] PM: Image loading progress: 0% > > [ 8.658717] PM: 0x6796c000 in e820 nosave region: [mem > > 0x6796c000-0x6796cfff] > > [ 8.918737] PM: Read 2511940 kbytes in 1.04 seconds (2415.32 MB/s) > > [ 8.926633] PM: Error -14 resuming > > [ 8.933534] PM: Failed to load hibernation image, recovering. > > > > v2: > > + removed empty check of nosave_regions list. > > + fixed the typo of "region" in code for error message and patch comment. > > > > Cc: "Rafael J. Wysocki" <r...@rjwysocki.net> > > Cc: Len Brown <len.br...@intel.com> > > Cc: Takashi Iwai <ti...@suse.de> > > Acked-by: Pavel Machek <pa...@ucw.cz> > > Signed-off-by: Lee, Chun-Yi <j...@suse.com> > > I discussed with Vojtech Pavlik for this patch, he raised a situation is: > > Maybe e820 changed but image kernel original pages do not fall into new e820 > region. > Then the hibernate will recovery success, but later kernel drivers may got > problem > when accessing memory. > > My idea is hashing the start/end pfn of each nosave region sequentially, put > this > nosave region digest to hibernate header then compare e820 digest in > check_header() > when hibernate resuming. > > I am developing patch, then we don't need check unsafe page should not in > unsave(e820) > regions.
Actually, if you are doing such a check... it makes sense to check for _all_ the regions, nosave or not. If e820 map changed at all, it is not safe to resume. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/