On Mon, Sep 17, 2012 at 5:52 AM, Daniel Gruno <rum...@cord.dk> wrote: > Hello, happy people, > > Lately, I've been wrapping my head around Traffic Server 3.2/3.3 not > running well on FreeBSD. The exact issue is described in TS-993 as well: > 1) When starting TS, it runs up a hefty CPU bill (100% cpu used at all > times), even when idling. > 2) It crashes and burns when compiled with --enable-debug, complaining: > > FATAL: ../../lib/ts/ink_thread.h:267: failed assert > `pthread_cond_wait(cp, mp) == 0` > > After giving up on doing a git bisect (my computer is simply too slow > for all those recompiles), I tried running it through callgrind to > analyze the function calls being made, and discovered that > LogObjectManager::flush_buffers() was being called about 11 million > times during the first few minutes, which is not good. So I opened up > Log.cc, and discovered, to my surprise, that, apart from flushing > buffers in a loop there, we are calling ink_cond_wait without any > apparent locking of the flush_mutex we are supposed to release while > waiting for the condition. On FreeBSD at least, this results in an EPERM > error (caller does not own the thread being released), which in turn > means that there will be no waiting, it's just one big cpu sink. > > The addition of "ink_mutex_try_acquire(&flush_mutex);" before the > ink_cond_wait, seems to have fixed this problem, and TS starts fine, > doesn't use 100% while idling, and doesn't complain when running in > debug mode,ie an apparent win-win situation for my FreeBSD machines. > > However - and because Igor told me to - since this doesn't seem to be an > issue on Linux, I was wondering...does the mutex in question lock > somewhere else that I am unaware of, or did we simply forget to lock it > and are lucky that Linux somehow takes care of this blunder for us? > > In any case, I don't think adding ink_mutex_try_acquire could hurt > anything, and since it does seem to fix the FreeBSD/OpenBSD issue at > hand, I am mostly interested in any comments you lot would have about it > before I go and commit the fix (if it is a fix, that's what I'm asking ;). > > With regards, > Daniel.
>From the pthread_cond_wait man page: "The pthread_cond_timedwait() and pthread_cond_wait() functions shall block on a condition variable. They shall be called with mutex locked by the calling thread or undefined behavior results." So I think you found out what "undefined behavior" means on FreeBSD. >From what I can see that mutex is used nowhere else, so it seems it's a happy coincidence that it works on linux. I say +1 for adding the fix. Also, good work on tracking that down!