On Mon, Sep 17, 2012 at 5:52 AM, Daniel Gruno <rum...@cord.dk> wrote:
> Hello, happy people,
>
> Lately, I've been wrapping my head around Traffic Server 3.2/3.3 not
> running well on FreeBSD. The exact issue is described in TS-993 as well:
> 1) When starting TS, it runs up a hefty CPU bill (100% cpu used at all
> times), even when idling.
> 2) It crashes and burns when compiled with --enable-debug, complaining:
>
>    FATAL: ../../lib/ts/ink_thread.h:267: failed assert
>    `pthread_cond_wait(cp, mp) == 0`
>
> After giving up on doing a git bisect (my computer is simply too slow
> for all those recompiles), I tried running it through callgrind to
> analyze the function calls being made, and discovered that
> LogObjectManager::flush_buffers() was being called about 11 million
> times during the first few minutes, which is not good. So I opened up
> Log.cc, and discovered, to my surprise, that, apart from flushing
> buffers in a loop there, we are calling ink_cond_wait without any
> apparent locking of the flush_mutex we are supposed to release while
> waiting for the condition. On FreeBSD at least, this results in an EPERM
> error (caller does not own the thread being released), which in turn
> means that there will be no waiting, it's just one big cpu sink.
>
> The addition of "ink_mutex_try_acquire(&flush_mutex);" before the
> ink_cond_wait, seems to have fixed this problem, and TS starts fine,
> doesn't use 100% while idling, and doesn't complain when running in
> debug mode,ie an apparent win-win situation for my FreeBSD machines.
>
> However - and because Igor told me to - since this doesn't seem to be an
> issue on Linux, I was wondering...does the mutex in question lock
> somewhere else that I am unaware of, or did we simply forget to lock it
> and are lucky that Linux somehow takes care of this blunder for us?
>
> In any case, I don't think adding ink_mutex_try_acquire could hurt
> anything, and since it does seem to fix the FreeBSD/OpenBSD issue at
> hand, I am mostly interested in any comments you lot would have about it
> before I go and commit the fix (if it is a fix, that's what I'm asking ;).
>
> With regards,
> Daniel.

>From the pthread_cond_wait man page:

"The pthread_cond_timedwait() and pthread_cond_wait() functions shall
block on a condition variable. They shall be called with mutex locked
by the calling thread  or  undefined  behavior
       results."

So I think you found out what "undefined behavior" means on FreeBSD.
>From what I can see that mutex is used nowhere else, so it seems it's
a happy coincidence that it works on linux. I say +1 for adding the
fix. Also, good work on tracking that down!

Reply via email to