Hi,

We're hitting an issue in a deployment where all udp receivers are sitting in FUTEX_WAIT caused by save() -> lock_udomain() and seem to have deadlocked themselves every couple of days.

Looking at the code, enable_gruu in registrar is active by default, and in lookup there is a code path

        /* temp-gruu lookup */
        res = ul.get_urecord_by_ruid(_d, ahash, &inst, &r, &ptr);

but no lock_udomain is obtained. However, when the execution falls through to the "done:" marker, it does

        ul.unlock_udomain(_d, &aor);

without having called ul.lock_udomain first.

1.) Could someone please review this part? Looks a bit suspicious, although I don't know what implicitly happens in this case. If it were a semaphore and you decrease it to -1 by decrementing it without prior increment, it's essentially causing a dead-lock, but the current locking implementation might work completely different.

2.) Since I have no clue how gruu is supposed to work in detail, and since in our config we don't explicitly handle gruu (no lookup in loose-route, but gruu is enabled by default in registrar and we don't explicitly turned it off), I'm not even sure if we ever hit this code path. I only see that the ruid column in the location table is filled, but in order to get to this part, the ";gr" flag needs to be set in the R-URI for a lookup(), which I don't know whether that happened somehow in some call flows (we only log $ru, which I don't think logs these parameters, right?).

Some input is highly appreciated!

Andreas

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

Reply via email to