On Friday, July 19, 2013 10:16:15 pm Yuri wrote: > On 07/19/2013 14:04, John Baldwin wrote: > > Hmm, that definitely looks like garbage. How are you with gdb scripting? > > You could write a script that walks the PQ_ACTIVE queue and see if this > > pointers ends up in there. It would then be interesting to see if the > > previous page's next pointer is corrupted, or if the pageq.tqe_prev > > references > > that page then it could be that this vm_page structure has been stomped on > > instead. > > As you suggested, I printed the list of pages. Actually, iteration in > frame 8 goes through PQ_INACTIVE pages. So I printed those. > <...skipped...> > ### page#2245 ### > $4492 = (struct vm_page *) 0xfffffe00b5a27658 > $4493 = {pageq = {tqe_next = 0xfffffe00b5a124d8, tqe_prev = > 0xfffffe00b5b79038}, listq = {tqe_next = 0x0, tqe_prev = > 0xfffffe00b5a276e0}, > left = 0x0, right = 0x0, object = 0xfffffe005e3f7658, pindex = 5, > phys_addr = 1884901376, md = {pv_list = {tqh_first = 0xfffffe005e439ce8, > tqh_last = 0xfffffe00795eacc0}, pat_mode = 6}, queue = 0 '\0', > segind = 2 '\002', hold_count = 0, order = 13 '\r', pool = 0 '\0', > cow = 0, wire_count = 0, aflags = 1 '\001', flags = 64 '@', oflags = > 0, act_count = 9 '\t', busy = 0 '\0', valid = 255 '�', dirty = 255 '�'} > ### page#2246 ### > $4494 = (struct vm_page *) 0xfffffe00b5a124d8 > $4495 = {pageq = {tqe_next = 0xfffffe00b460abf8, tqe_prev = > 0xfffffe00b5a27658}, listq = {tqe_next = 0x0, tqe_prev = > 0xfffffe005e3f7cf8}, > left = 0x0, right = 0x0, object = 0xfffffe005e3f7cb0, pindex = 1, > phys_addr = 1881952256, md = {pv_list = {tqh_first = 0xfffffe005e42dd48, > tqh_last = 0xfffffe007adb03a8}, pat_mode = 6}, queue = 0 '\0', > segind = 2 '\002', hold_count = 0, order = 13 '\r', pool = 0 '\0', > cow = 0, wire_count = 0, aflags = 1 '\001', flags = 64 '@', oflags = > 0, act_count = 9 '\t', busy = 0 '\0', valid = 255 '�', dirty = 255 '�'} > ### page#2247 ### > $4496 = (struct vm_page *) 0xfffffe00b460abf8 > $4497 = {pageq = {tqe_next = 0xfe26, tqe_prev = 0xfffffe00b5a124d8}, > listq = {tqe_next = 0xfffffe0081ad8f70, tqe_prev = 0xfffffe0081ad8f78}, > left = 0x6, right = 0xd00000201, object = 0x100000000, pindex = > 4294901765, phys_addr = 18446741877712530608, md = {pv_list = { > tqh_first = 0xfffffe00b460abc0, tqh_last = 0xfffffe00b5579020}, > pat_mode = -1268733096}, queue = 72 'H', segind = -85 '�', > hold_count = -19360, order = 0 '\0', pool = 254 '�', cow = 65535, > wire_count = 0, aflags = 0 '\0', flags = 0 '\0', oflags = 0, > act_count = 0 '\0', busy = 176 '�', valid = 208 '�', dirty = 126 '~'} > ### page#2248 ### > $4498 = (struct vm_page *) 0xfe26 > > The page #2247 is the same that caused the problem in frame 8. tqe_next > is apparently invalid, so iteration stopped here. > It appears that this structure has been stomped on. This page is > probably supposed to be a valid inactive page.
Yes, it's phys_addr is also way off. I think you might even be able to figure out which phys_addr it is supposed to have based on the virtual address (see PHYS_TO_VM_PAGE() in vm/vm_page.c) by using the vm_page address and phys_addr of the prior entries to establish the relative offset. It is certainly a page "earlier" in the array. > > Ultimately I think you will need to look at any malloc/VM/page operations > > done in the suspend and resume paths to see where this happens. It might > > be slightly easier if the same page gets trashed every time as you could > > print out the relevant field periodically during suspend and resume to > > narrow down where the breakage occurs. > > I am thinking to put code walking through all page queues and verifying > that they are not damaged in this way into the code when each device is > waking up from sleep. > dev/acpica/acpi.c has acpi_EnterSleepState, which, as I understand, > contains top-level code for S3 sleep. Before sleep it invokes event > 'power_suspend' on all devices, and after sleep it calls 'power_resume' > on devices. So maybe I will call the page check procedure after > 'power_suspend' and 'power_resume'. > > But it is possible that memory gets damaged somewhere else after > power_resume happens. > Do you have any thought/suggestions? Well, I think you should try what you've suggeseted above first. If that doesn't narrow it down then we can brainstorm some other places to inspect. -- John Baldwin _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"