On Sun Nov 17 17:32:22 EST 2013, j...@corpus-callosum.com wrote:
> Has anyone else experienced new builds of the sources arm tree getting hung 
> up with semacquire?
> 
>  99:     httpd pc     cac0 dbgpc     cac0  Semacquire (Wakeme) ut 1 st 2 bss 
> 168000 qpc 608157d8 nl 0 nd 0 lpc 608758c4 pri 10
> 
> 
> 
> acid: lstk()
> semacquire()+0xc /sys/src/libc/9syscall/semacquire.s:6
> lock(l=0x31208)+0x20 /sys/src/libc/port/lock.c:10
> plock()+0x8 /sys/src/libc/port/malloc.c:80
>       pv=0x31208
> poolalloc(p=0x35a24,n=0x2c)+0xc /sys/src/libc/port/pool.c:1223
>       v=0xd970
> mallocz(size=0x24,clr=0x1)+0x18 /sys/src/libc/port/malloc.c:221
>       v=0x5ffffd39
> getnetconninfo(fd=0xffffffff,dir=0x5ffffeec)+0x78 
> /sys/src/libc/9sys/getnetconninfo.c:59
>       path=0x0
>       nci=0xb
>       spec=0x0
>       d=0x0
>       netname=0x28
> dolisten(address=0xd16dc)+0x134 /sys/src/cmd/ip/httpd/httpd.c:291
>       spotchk=0x1
>       dir=0x74656e2f
>       ctl=0xa
>       ndir=0x74656e2f
>       nctl=0xb
>       swamped=0x0
>       nci=0x161c40
>       data=0x313aa
>       conn=0x73
>       scheme=0xd16e6
>       c=0x38898
>       t=0x5ffffeb4
>       ok=0xa284
> main(argc=0x0,argv=0x5fffff9c)+0x1c0 /sys/src/cmd/ip/httpd/httpd.c:138
>       address=0x38846
>       _argc=0x0
>       _args=0x0
> _main+0x28 /sys/src/libc/arm/main9.s:19
> 
> 
> I see this on the second http request, the first completes successfully, and 
> don’t yet know if it’s a dns configuration error or something else.

this is clearly a case of deadlock.

on each allocation the pool library locks the pool lock.  for
the duration, and releases it before returning.  for some reason,
the pool lock already appears locked, you go to the contended
case, which in the standard distribution calls semacquire, and
wait forever.

so there are just a few possibilities
1.  either the code was always broken, and the old locking scheme
got lucky every time.  (i don't think this is likely.)
2.  there's a bug in implementation of lock.
3.  there is a bug in locking that's been introduced that's architecture-
specific.

i haven't been using the semaphore-based locks because they are slow.
this is because wakeup() takes about 100-1000x as long as sleep(0)
which is just sched(), and this is hard to make up without doing some
hard thinking that hasn't been done yet.  even better schedulers don't
fully fix this.

but still, were i a betting man, my money would be on door #3.

- erik

Reply via email to