Okay - makes sense on up to 2 Tor relays per physical core with the goal of not
wasting CPU cycles, given the fluctuations of Tor and the expensive hardware.
No, don't have all the network capacity covered and agree everything is costly.
For 10 Gbps unmetered, not many options under $600/mo comfortable with Tor
relays so I'm alternating between colocation and dedicated bare metal servers,
depending on location, hardware availability, price, support to bring my own
IPv4 and announce my ASN, some semblance of ASN and geographic diversity for
Tor, etc.
For 40 Gbps unmetered, not seeing much under $2k/mo.
Open to suggestions / guidance on network capacity, colocation, and bare metal
servers. Don't know what I don't know so maybe I should be asking other
questions?
When doing colocation, any suggestions on how best to set everything up? Router
or only layer 3 switch or put compute node directly on internet connection?
Worth using a transparent firewall/bridging, DMZ or NAT?
On Friday, February 21st, 2025 at 2:40 AM, mail--- via tor-relays
<tor-relays@lists.torproject.org> wrote:
> Hi,
>
>> Summary from your email - did I miss anything?
>
> Yes, with the general disclaimer (not to sound like a lawyer) that your
> mileage may vary. For example we run everything bare metal on FreeBSD and run
> a mix of guard/middle/exit relays. Running the same workload virtualized or
> on another operating system may impact the performance/overhead (either
> positively or negatively). Also your RAM budget of 4 GB per relay may be a
> bit on the safe side, I don't think it would hurt to lower this.
>
>> What are the primary factors that justify running up to two Tor relays per
>> physical core (leveraging SMT) versus a one-to-one mapping?
>
> Tor relays sadly don't scale well. They fluctuate on a daily basis (the Tor
> network as a whole does) and even their general utilization is kind of
> unpredictable. So I think there are two approaches to this:
>
> 1) Run 1 relay per physical core, accepting that your CPU will idle a large
> amount of the time (50%+ in our case).
>
> 2) Run multiple Tor relays per physical core until you saturate 90-95% of
> your CPU cycles, accepting additional system overhead/congestion.
>
> There is no right or wrong here. In our case we went with running multiple
> relays per core because we want to utilize the (very expensive) hardware we
> run on as much as possible. Every CPU cycle not spent on privacy enhancing
> services is a wasted CPU cycle from our point of view ;).
>
>> Is one-to-one mapping of Tor relay to core/thread the most compute- and
>> system-efficient approach?
>
> Yes, this should lower the amount of congestion (interrupts and stuff). In
> this sense it can also be beneficial to lock your NIC/irq threads to specific
> cores.
>
>> Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0
>> Ghz)?
>
> Base indeed. No CPU is able to consistently maintain their turbo speed on
> this many cores. When all cores are utilized, the base speed pretty much is
> the max speed in practice.
>
>> From your real world scenario #2 and advice for the "fastest/best hardware",
>> would this type of server work well for a $20k budget?
>
> Looks like a capable server. That CPU looks powerful enough but keep in mind
> that it has a rather low clockspeed, so you will be running many medium speed
> relays. Nothing wrong with that since CPUs with this many cores simply
> don't/can't have high base clocks. Also I think 512 GB of RAM would be enough
> unless you run a *lot* of relays on it (which may be a viable strategy to
> utilize your CPU fully).
>
> Just a note: in my experience the Epyc platform (especially when self-build)
> provides a bit more bang for your buck. For example a AMD Epyc 9969 with 192
> cores/384 threads@2.25 Ghz baseclock will probably outperform the Intel 6980P
> considerably (for Tor workloads at least), while being much cheaper (listing
> price at least). But of course this greatly depends on where you buy the
> server or parts so your mileage may vary. When I look around here locally a
> complete self-build system with the 192 core Epyc, 512 GB RAM and a 100 Gb/s
> NIC would cost ~12k excluding VAT before any tax benefits. But your proposed
> server will work perfectly fine as well so if you prefer a brand, go for it
> :).
>
>> Assuming one relay per core/thread, would this setup be capable of
>> saturating 40 Gbps, given that 10 Gbps typically saturated with ~31.5
>> physical cores + ~31.5 SMT threads at 2 Ghz and thus 256 relays vs ~63
>> relays translates roughly to 4×10 Gbps = 40 Gbps?
>
> Assuming you get the Tor relays to saturate their cores, yes. That CPU should
> be able to push 40 Gb/s of Tor traffic. Our ~2019 64 core/128 thread Epyc
> pushes 10 Gb/s of Tor traffic on a bit less than half it's capacity. And your
> CPU is newer (better IPC hopefully if Intel finally stepped up their game
> since 2019), has much more cache and runs on DDR5 while having a bit lower
> base clock. So it should perform at least similar but probably better than
> ours.
>
> Do you have the network capacity covered already? If you plan to do 40 Gb/s,
> then you also need enough peers/upstream capacity. The required networking
> equipment and connections themselves for this can also be costly.
>
> Cheers,
>
> tornth
>
> Feb 20, 2025, 08:42 by t...@1aeo.com:
>
>> Excellent information, especially the real world scenarios! Exactly what I
>> was looking for!
>>
>> Summary from your email - did I miss anything?
>>
>> To saturate 10 Gbps connection:
>> 1) IPv4 Allocation: Use between 5 and 20. Much lower than 256 in a /24!
>> 2) Tor Relay Count: Run roughly ~40 to ~150, depending on CPU clock speed,
>> i.e. faster clock, fewer relays needed.
>> 3) CPU Utilization: 1 Tor relay per physical core preferred but okay to
>> scale to 1 Tor relay per threads/SMT as well, up to 2x Tor relays per
>> core/thread
>> 4) RAM requirements: Maintain a 4:1 RAM-to-core/relay ratio (4GB per
>> core/relay), including extra 32GB per server to cover DoS, OS, networking,
>> etc. overheads
>>
>> In general, some ideals but not required:
>> CPU clock speed: Higher CPU clock speed, better relay performance
>> RAM: Fewer relays, lower RAM requirements
>> RAM: Add ~32GB to overall RAM capacity sizing for OS, DNS, networking, DoS,
>> etc.
>> IPv4: One IPv4 per relay with common traffic ports
>>
>> Scaling: Start with 1 Tor relay per physical core, then add 1 Tor relay per
>> thread/SMT and stop at 2 Tor relays per each core / thread.
>>
>> What are the primary factors that justify running up to two Tor relays per
>> physical core (leveraging SMT) versus a one-to-one mapping?
>> Ex: 37-74 for ~18.5 physical + SMT and 63-126 for ~31.5 physical + SMT.
>>
>> Is one-to-one mapping of Tor relay to core/thread the most compute- and
>> system-efficient approach?
>>
>> Are the clock speeds you listed base or turbo numbers (3.4, 2.25 and 2.0
>> Ghz)? I'm assuming base. Not sure if anybody has data on how these impact
>> Tor relays?
>>
>> From your real world scenario #2 and advice for the "fastest/best hardware",
>> would this type of server work well for a $20k budget?
>> A single-socket Xeon 6980P (128 physical cores, 256 threads, base clock 2.0
>> GHz, turbo up to 3.9 GHz) with 1024GB DDR5 (maintaining a 4:1 ratio) and an
>> AIOM Mellanox NIC be optimal? Assuming one relay per core/thread, would this
>> setup be capable of saturating 40 Gbps, given that 10 Gbps typically
>> saturated with ~31.5 physical cores + ~31.5 SMT threads at 2 Ghz and thus
>> 256 relays vs ~63 relays translates roughly to 4×10 Gbps = 40 Gbps?
>>
>> For those curious, according to ChatGPT o3-mini-high deep research:
>> 1) 20% cap on bandwidth contributions for exit relays is roughly 50+ Gbit/s
>> for the largest operators.
>> 2) 10% of Tor's consensus weight in terms of bandwidth for 2025 is roughly
>> 90-95 Gbps of sustained bandwidth. In 2022, it would have been ~68 Gbps.
>> I don't plan to have this issue any time soon, but good to be aware!
>>
>> Screenshots of the lengthy responses below and attached.
>>
>> [image.png]
>>
>> [image.png]
>> [image.png]
>> [image.png]
>>
>> Sent with[Proton Mail](https://proton.me/mail/home)secure email.
>>
>> On Tuesday, February 18th, 2025 at 2:23 PM, m...@nothingtohide.nl
>> <m...@nothingtohide.nl> wrote:
>>
>>> Hi,
>>>
>>> Many people already replied, but here are my (late) two cents.
>>>
>>>> 1) If a full IPv4 /24 Class C was available to host Tor relays, what are
>>>> some optimal ways to allocate bandwidth, CPU cores and RAM to maximize
>>>> utilization of the IPv4 /24 for Tor?
>>>
>>> "Optimal" depends on your preferences and goals. Some examples:
>>>
>>> - IP address efficiency: run 8 relays per IPv4 address.
>>> - Use the best ports: 256 relays (443) or 512 relays (443+80).
>>> - Lowest kernel/system congestion: 1 locked relay per core/SMT thread
>>> combination, ideally on high clocked CPUs.
>>> - Easiest to manage: as few relays as possible.
>>> - Memory efficiency: only run middle relays on very high clocked CPUs (4-5
>>> Ghz).
>>> - Cost efficiency: run many relays on 1-2 generations old Epyc CPUs with a
>>> high core count (64 or more).
>>>
>>> There are always constraints. The hardware/CPU/memory and bandwidth/routing
>>> capability available to you are probably not infinite. Also the Tor Project
>>> maximizes bandwidth contributions to 20% and 10% for exit relay and overall
>>> consensus weight respectively.
>>>
>>> With 256 IP addresses on modern hardware, it will be very hard to not run
>>> in to one of these limitations long before you can make it 'optimal'.
>>> Hardware wise, one modern/current gen high performance server only running
>>> exit relays will easily push enough Tor traffic to do more than half of the
>>> total exit bandwidth of the Tor network.
>>>
>>> My advice would be:
>>> 1) Get the fastest/best hardware with current-ish generation CPU IPC
>>> capabilities you can get within your budget. To lower complexity with
>>> keeping congestion in control, one socket is easier to deal with than a
>>> dual socket system.
>>>
>>> (tip for NIC: if your switch/router has 10 Gb/s or 25 Gb/s ports, get some
>>> of the older Mellanox cards. They are very stable (more so than their Intel
>>> counterparts in my experience) and extremely affordable nowadays because of
>>> all the organizations that throw away their digital sovereignty and privacy
>>> of their employees/users to move to the cloud).
>>>
>>> 3) Start with 1 Tor relay per physical core (ignoring SMT). When the Tor
>>> relays have ramped up (this takes 2-3 months for guard relays) and there
>>> still is considerable headroom on the CPU (Tor runs extremely poorly at
>>> scale sadly, so this would be my expectation) then move to 1 Tor relay per
>>> thread (SMT included).
>>>
>>> (tip: already run/'train' some Tor relays with a very limited bandwidth (2
>>> MB/s or something) parallel to your primary ones and pin them all to 1-2
>>> cores to let them ramp up in parallel to your primary ones. This makes it
>>> *much* less cumbersome to scale up your Tor contribution when you
>>> need/want/can do that in the future).
>>>
>>> 4) Assume at least 1 GB of RAM per relay on modern CPUs + 32 GB
>>> additionally for OS, DNS, networking and to have some headroom for DoS
>>> attacks. This may sound high, especially considering the advice in the Tor
>>> documentation. But on modern CPUs (especially with a high clockspeed) guard
>>> relays can use a lot more than 512 MB of RAM, especially when they are
>>> getting attacked. Middle and exit relays require less RAM.
>>>
>>> Don't skimp out on system memory capacity. DDR4 RDIMMs with decent
>>> clockspeeds are so cheap nowadays. For reference: we ran our smaller Tor
>>> servers (16C@3.4Ghz) with 64 GB of RAM and had to upgrade it to 128 GB
>>> because during attacks RAM usage exceeded the amount available and killed
>>> processes.
>>>
>>> 5) If you have the IP space available, use one IPv4 address per relay and
>>> use all the good ports such as 443. If IP addresses are more scarce, it's
>>> also not bad to run 4 or 8 relays per IP address. Especially for middle and
>>> exit relays the port doesn't matter (much). Guard relays should ideally
>>> always run on a generally used (and generally unblocked) port.
>>>
>>>> 2) If a full 10 Gbps connection was available for Tor relays, how many CPU
>>>> cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps
>>>> connection?
>>>
>>> That greatly depends on the CPU and your configuration. I can offer 3
>>> references based on real world examples. They all run a mix of
>>> guard/middle/exit relays.
>>>
>>> 1) Typical low core count (16+SMT) with higher clockspeed (3.4 Ghz)
>>> saturates a 10 Gb/s connection with ~18.5 physical cores + SMT.
>>> 2) Typical higher core count (64+SMT) with lower clockspeed (2.25 Ghz)
>>> saturates a 10 Gb/s connection with ~31.5 physical cores + SMT.
>>> 3) Typical energy efficient/low performance CPU with low core count (16)
>>> with very low clockspeed (2.0 Ghz) used often in networking appliances
>>> saturates a 10 Gb/s connection with ~75 physical cores (note: no SMT).
>>>
>>> The amount of IP addresses required also depends on multiple factors. But
>>> I'd say that you would need between the amount and double the amount of
>>> relays of the mentioned core+SMT count in order to saturate 10 Gb/s. This
>>> would be 37-74, 63-126 and 75-150 relays respectively. So between 5 and 19
>>> IPv4 addresses would be required at minimum, depending on CPU performance
>>> level.
>>>
>>> RAM wise the more relays you run, the more RAM overhead you will have. So
>>> in general it's better to run less relays at a higher speed each than run
>>> many at a low clock speed. But since Tor scales so badly you need more
>>> Relays anyway so optimizing this isn't easy in practice.
>>>
>>>> 3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4
>>>> addresses are required to saturate?
>>>
>>> Double the amount compared to 10 Gb/s.
>>>
>>> Good luck with your Tor adventure. And let us know your findings with
>>> achieving 10 Gb/s when you get there :-).
>>>
>>> Cheers,
>>>
>>> tornth
>>>
>>> Feb 3, 2025, 18:14 by tor-relays@lists.torproject.org:
>>>
>>>> Hi All,
>>>>
>>>> Looking for guidance around running high performance Tor relays on Ubuntu.
>>>>
>>>> Few questions:
>>>> 1) If a full IPv4 /24 Class C was available to host Tor relays, what are
>>>> some optimal ways to allocate bandwidth, CPU cores and RAM to maximize
>>>> utilization of the IPv4 /24 for Tor?
>>>>
>>>> 2) If a full 10 Gbps connection was available for Tor relays, how many CPU
>>>> cores, RAM and IPv4 addresses would be required to saturate the 10 Gbps
>>>> connection?
>>>>
>>>> 3) Same for a 20 Gbps connection, how many CPU cores, RAM and IPv4
>>>> addresses are required to saturate?
>>>>
>>>> Thanks!
>>>>
>>>> Sent with[Proton Mail](https://proton.me/mail/home)secure email.
_______________________________________________
tor-relays mailing list -- tor-relays@lists.torproject.org
To unsubscribe send an email to tor-relays-le...@lists.torproject.org