Guard relay here appears to have come under steadily increasing abuse over the 
last several months.  Belive the two previous threads relate to the same issue:

   Failing because we have 4063 connections already
   // Number of file descriptors

   DoS attacks are real

Several times a day a large burst of circuit extends are attempted resulting in 
log flooding with

   [Warning] assign_to_cpuworker failed. Ignoring.

where the above indicates a circuit-launch failed due to a full circuit request 
queue.  Presently the guard runs on an old system lacking AES-NI, and the 
operation is expansive rather than trivial.  Originally thought the events were 
very brief, but after reducing MaxClientCircuitsPending from a larger value to 
the default it appears they last between five and ten minutes.

The abuser also contrives to create huge circuit queues, which resulted in an 
OOM kill of the daemon a couple of days back.  Lowered MaxMemInQueues to 1G, 
set vm.overcommit_memory=2 with vm.overcommit_ratio=X (X such that 
/proc/meminfo:CommitLimit is comfortably less than physical memory) and now 
instead of a dameon take-out see

   [Warning] We're low on memory.  Killing circuits with over-long
   queues. (This behavior is controlled by MaxMemInQueues.)
   
   Removed 1060505952 bytes by killing 1 circuits;
   19k circuits remain alive. Also killed 0 non-
   linked directory connections.

As you can see the one circuit was consuming all of MaxMemInQueues.

And today this showed up in the middle of a "assign_to_cpuworker failed" blast:

   [Warning] Failing because we have Y connections already. . .

Digging into the source, the message indicates ENOMEM/ENOBUFS was returned from 
an attempt to create a socket.  Socket max on the system is much higher than Y 
so kernel memory exhaustion is the cause.  Implication is a burst of client 
connections associated with the events, but haven't verified that.

An old server was dusted off after a hardware fail and the machine is a bit 
underpowered, but certainly up to the load that corresponds with the connection 
speed and assigned consensus weight.  AFAICT normal Tor clients experience 
acceptable performance.  The less-than-blazing current hardware illuminates the 
abuse/attack incidents and inspired the writing of this post.

_______________________________________________
tor-relays mailing list
tor-relays@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Reply via email to