On 14.11.2012, at 11:50, Markus Gebert <markus.geb...@hostpoint.ch> wrote:

> On 14.11.2012, at 08:21, Konstantin Belousov <kostik...@gmail.com> wrote:
> 
>> On Wed, Nov 14, 2012 at 01:41:04AM +0100, Markus Gebert wrote:
>>> 
>>> On 13.11.2012, at 19:30, Markus Gebert <markus.geb...@hostpoint.ch> wrote:
>>> 
>>>> To me it looks like the unix socket GC is triggered way too often and/or 
>>>> running too long, which uses cpu and worse, causes a lot of contention 
>>>> around the unp_list_lock which in turn causes delays for all processes 
>>>> relaying on unix sockets for IPC.
>>>> 
>>>> I don't know why the unp_gc() is called so often and what's triggering 
>>>> this.
>>> 
>>> I have a guess now. Dovecot and relayd both use unix sockets heavily. 
>>> According to dtrace uipc_detach() gets called quite often by dovecot 
>>> closing unix sockets. Each time uipc_detach() is called unp_gc_task is 
>>> taskqueue_enqueue()d if fds are inflight.
>>> 
>>> in uipc_detach():
>>> 682         if (local_unp_rights)   
>>> 683                 taskqueue_enqueue(taskqueue_thread, &unp_gc_task);
>>> 
>>> We use relayd in a way that keeps the source address of the client when 
>>> connecting to the backend server (transparent load balancing). This 
>>> requires IP_BINDANY on the socket which cannot be set by unprivileged 
>>> processes, so relayd sends the socket fd to the parent process just to set 
>>> the socket option and send it back. This means an fd gets transferred twice 
>>> for every new backend connection.
>>> 
>>> So we have dovecot calling uipc_detach() often and relayd making it likely 
>>> that fds are inflight (unp_rights > 0). With a certain amount of load this 
>>> could cause unp_gc_task to be added to the thread taskq too often, slowing 
>>> everything unix socket related down by holding global locks in unp_gc().
>>> 
>>> I don't know if the slowdown can even cause a negative feedback loop at 
>>> some point by inreasing the chance of fds being inflight. This would 
>>> explain why sometimes the condition goes away by itself and sometimes 
>>> requires intervention (taking load away for a moment).
>>> 
>>> I'll look into a way to (dis)prove all this tomorrow. Ideas still welcome 
>>> :-).
>>> 
>> 
>> If the only issue is indeed too aggressive scheduling of the taskqueue,
>> than the postpone up to the next tick could do it. The patch below
>> tries to schedule the taskqueue for gc to the next tick if it is not yet
>> scheduled. Could you try it ?
> 
> Sounds like a good idea, thanks! I'm testing the patch right now. It could 
> take a few days to know it works for sure. I'll get back to you soon.


We haven't had any problems since I booted the patched kernel. So the 
assumption that the gc gets scheduled too often in that situation seems correct.

I realize we're creating an edge case with relayd passing around so many fds. 
On the other hand, I think the patch makes the unix socket code more robust 
without hurting anyone. So do you see any chance to get it commited?


Markus

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to