Re: bin/176713: [patch] nc(1) closes network socket too soon

2013-07-22 Thread Adrian Chadd
Right. Yes, I had a typo. I meant that it shouldn't die on seeing a
read EOF after closing the write side of the socket.

So, what you're saying is:

* nc sees EOF on stdin
* nc decides to abort before seeing the rest of the data come in from
the remote socket (and then trying to write it, and aborting if it
sees EOF on stdin _and_ EOF/error on writing to stdout)

Right?


-adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Kernel crashes after sleep: how to debug?

2013-07-22 Thread John Baldwin
On Friday, July 19, 2013 10:16:15 pm Yuri wrote:
> On 07/19/2013 14:04, John Baldwin wrote:
> > Hmm, that definitely looks like garbage.  How are you with gdb scripting?
> > You could write a script that walks the PQ_ACTIVE queue and see if this
> > pointers ends up in there.  It would then be interesting to see if the
> > previous page's next pointer is corrupted, or if the pageq.tqe_prev 
> > references
> > that page then it could be that this vm_page structure has been stomped on
> > instead.
> 
> As you suggested, I printed the list of pages. Actually, iteration in 
> frame 8 goes through PQ_INACTIVE pages. So I printed those.
> <...skipped...>
> ### page#2245 ###
> $4492 = (struct vm_page *) 0xfe00b5a27658
> $4493 = {pageq = {tqe_next = 0xfe00b5a124d8, tqe_prev = 
> 0xfe00b5b79038}, listq = {tqe_next = 0x0, tqe_prev = 
> 0xfe00b5a276e0},
>left = 0x0, right = 0x0, object = 0xfe005e3f7658, pindex = 5, 
> phys_addr = 1884901376, md = {pv_list = {tqh_first = 0xfe005e439ce8,
>tqh_last = 0xfe00795eacc0}, pat_mode = 6}, queue = 0 '\0', 
> segind = 2 '\002', hold_count = 0, order = 13 '\r', pool = 0 '\0',
>cow = 0, wire_count = 0, aflags = 1 '\001', flags = 64 '@', oflags = 
> 0, act_count = 9 '\t', busy = 0 '\0', valid = 255 '�', dirty = 255 '�'}
> ### page#2246 ###
> $4494 = (struct vm_page *) 0xfe00b5a124d8
> $4495 = {pageq = {tqe_next = 0xfe00b460abf8, tqe_prev = 
> 0xfe00b5a27658}, listq = {tqe_next = 0x0, tqe_prev = 
> 0xfe005e3f7cf8},
>left = 0x0, right = 0x0, object = 0xfe005e3f7cb0, pindex = 1, 
> phys_addr = 1881952256, md = {pv_list = {tqh_first = 0xfe005e42dd48,
>tqh_last = 0xfe007adb03a8}, pat_mode = 6}, queue = 0 '\0', 
> segind = 2 '\002', hold_count = 0, order = 13 '\r', pool = 0 '\0',
>cow = 0, wire_count = 0, aflags = 1 '\001', flags = 64 '@', oflags = 
> 0, act_count = 9 '\t', busy = 0 '\0', valid = 255 '�', dirty = 255 '�'}
> ### page#2247 ###
> $4496 = (struct vm_page *) 0xfe00b460abf8
> $4497 = {pageq = {tqe_next = 0xfe26, tqe_prev = 0xfe00b5a124d8}, 
> listq = {tqe_next = 0xfe0081ad8f70, tqe_prev = 0xfe0081ad8f78},
>left = 0x6, right = 0xd0201, object = 0x1, pindex = 
> 4294901765, phys_addr = 18446741877712530608, md = {pv_list = {
>tqh_first = 0xfe00b460abc0, tqh_last = 0xfe00b5579020}, 
> pat_mode = -1268733096}, queue = 72 'H', segind = -85 '�',
>hold_count = -19360, order = 0 '\0', pool = 254 '�', cow = 65535, 
> wire_count = 0, aflags = 0 '\0', flags = 0 '\0', oflags = 0,
>act_count = 0 '\0', busy = 176 '�', valid = 208 '�', dirty = 126 '~'}
> ### page#2248 ###
> $4498 = (struct vm_page *) 0xfe26
> 
> The page #2247 is the same that caused the problem in frame 8. tqe_next 
> is apparently invalid, so iteration stopped here.
> It appears that this structure has been stomped on. This page is 
> probably supposed to be a valid inactive page.

Yes, it's phys_addr is also way off. I think you might even be able to
figure out which phys_addr it is supposed to have based on the virtual
address (see PHYS_TO_VM_PAGE() in vm/vm_page.c) by using the vm_page
address and phys_addr of the prior entries to establish the relative
offset.  It is certainly a page "earlier" in the array.

> > Ultimately I think you will need to look at any malloc/VM/page operations
> > done in the suspend and resume paths to see where this happens.  It might
> > be slightly easier if the same page gets trashed every time as you could
> > print out the relevant field periodically during suspend and resume to
> > narrow down where the breakage occurs.
> 
> I am thinking to put code walking through all page queues and verifying 
> that they are not damaged in this way into the code when each device is 
> waking up from sleep.
> dev/acpica/acpi.c has acpi_EnterSleepState, which, as I understand, 
> contains top-level code for S3 sleep. Before sleep it invokes event 
> 'power_suspend' on all devices, and after sleep it calls 'power_resume' 
> on devices. So maybe I will call the page check procedure after 
> 'power_suspend' and 'power_resume'.
> 
> But it is possible that memory gets damaged somewhere else after 
> power_resume happens.
> Do you have any thought/suggestions?

Well, I think you should try what you've suggeseted above first.  If that
doesn't narrow it down then we can brainstorm some other places to inspect.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: UFS related panic (daily <-> find)

2013-07-22 Thread John Baldwin
On Friday, July 19, 2013 1:45:11 pm rank1see...@gmail.com wrote:
> I had 2 panics: (Both occured at 3 AM, so had to be daily task)
> 
> First (Jul  2 03:06:50 2013):
> --
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x19
> fault code  = supervisor read, page not present
> instruction pointer = 0x20:0xc06caf34
> stack pointer   = 0x28:0xe76248fc
> frame pointer   = 0x28:0xe7624930
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 76562 (find)
> trap number = 12
> panic: page fault
> Uptime: 23h0m41s
> Physical memory: 1014 MB
> Dumping 186 MB: 171 155 139 123 107 91 75 59 43 27 11
> 
> #7  0xc06caf34 in cache_lookup_times (dvp=0xc784a990, vpp=0xe7624ae8,
> cnp=0xe7624afc, tsp=0x0, ticksp=0x0) at 
/usr/src/sys/kern/vfs_cache.c:547

Can you go up to this frame and do 'l'?

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: bin/176713: [patch] nc(1) closes network socket too soon

2013-07-22 Thread Ronald F. Guilmette

In message 
Adrian Chadd  wrote:

>Right. Yes, I had a typo. I meant that it shouldn't die on seeing a
>read EOF after closing the write side of the socket.
>
>So, what you're saying is:
>
>* nc sees EOF on stdin

Yes.

>* nc decides to abort before seeing the rest of the data come in from
>the remote socket

Yes.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: bin/176713: [patch] nc(1) closes network socket too soon

2013-07-22 Thread Adrian Chadd
Right, and your patch just stops the shutdown(), right? Rather than
teaching nc to correctly check BOTH socket states before deciding to
close things.

I'd personally rather see nc taught to check to see whether it can
possibly make ANY more progress before deciding to shut things down.



-adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"