Re: What is the hardware requirement for haproxy?

Willy Tarreau Wed, 21 Jan 2015 23:01:26 -0800

Hi,

On Tue, Jan 20, 2015 at 10:42:01AM -0800, Bryan Talbot wrote:
> The hardware requirements for haproxy itself are very modest and nearly
> anything will work. The requirements really depend on how much and what
> sort of traffic you need to handle. Network card and CPU speed are the most
> important hardware factors for performance though.


That's true. However, I would like to add that along the years, I found
that everyone has a different appreciation of what performance means,
and have very different expectations about the bandwidth, connection rate,
and concurrent connection count. That is what makes it hard to suggest a
sizing for hardware.

What I've observed :
  - people installing their first LB in front of an internal application
    server tend to be focused on their application's performance which
    sometimes suffers from some complex business-specific processing, and
    may consider that a few tens to hundreds of connections per second is
    high performance. These people also often expect a lot of hacks in
    path or header rewrites because the application was not designed with
    a great respect of standards in mind. Any machine that can be found
    today with a CPU faster than 100 MHz will fit, even the smallest
    USB-powered tiny devices.

  - people who already run a public web site generally consider a load
    balancer when their Apache-based or Nginx-based application server
    needs a second server. These ones want a load balancer capable of
    delivering more than a few thousands of connections per second, and
    to saturate the uplink (100 Mbps or 1 Gbps) with average objects
    (which depends on their site but often lies around 20-25 kB). A
    regular PC will be perfect. Most commonly, the previous server
    replaced during the last upgrade is well suited.

  - people who run chat servers are not much interested by the connection
    rate (at least they think so) nor the bandwidth, but mostly by concurrent
    connections. Such a machine needs a lot of RAM (typically about 33-34 kB
    per end-to-end connection including socket buffers). BTW, version 1.6
    significantly improves this situation. These users also need to be
    aware that on Linux, you're limited to 1M fd per process, thus 500k
    connections per process. But that's not all, they need to consider how
    long it takes to establish connections (eg: during a VRRP switchover).
    Taking 1 million connections in one minute means 16k conns/s. That means
    that the CPU should not be neglected either and that some of the dual-
    socket machines commonly found with a lot of RAM will require some tuning
    to prevent any inter-socket communication which hurts performance a lot.

  - people who run shops often want a lot of SSL and to be able to accept
    traffic spikes. These ones may expect in the thousands of SSL connections
    per second. This generally means a recent machine (ideally with AES-NI
    extensions).

  - people switching away from L4 load balancers tend to focus on their
    existing LB's specs and are often mislead with the mapping. L4 LBs often
    count terminated connections (TIME-WAIT) like other ones, showing a huge
    increase of the concurrent connection count making it hard to pick a new
    hardware. Additionally, they report high capacities in connection rates
    which may or may not be matched by an L7 LB, and which may or may not be
    needed. A rule of thumb is to look at this device's config to find the
    TIME-WAIT timeout value, and divide the total number of connections by
    this number. It will give the connection rate. Then multiply this
    connection rate by the average server's response time (or at least 10
    seconds), and it will give the connection count.

  - people serving large objects (video, CDN, ...) mostly consider the
    bandwidth. For them, connection rate is "low" (in the thousands of
    connections per second), and the connection count may be high (tens to
    hundreds of thousands). The bandwidth usage can reach multiple 10G links.
    Such workloads require very specific tuning and benchmarks, as the NICs
    need to be tuned, the PCIe, IRQs, process affinity, etc...

Now what to say about numbers (rules of thumb) :
  - if your machine is running with conntrack enabled, multiply the CPU
    sizing by about 3 and add about 10% to the RAM sizing.

  - if you really want to have fun with VMs, always experiment and never
    believe what the vendor or provider promises you because most of the
    time they don't even know what they're offering (eg: untuned conntrack
    in the hypervisor blocking traffic, incorrect CPU affinity, vCPUs smaller
    than CPUs causing huge latencies or even timeouts, etc).

  - anything below 1000 connections per second can be dealt with using any
    machine found on the market today, even the USB dongles running the
    slowest ARM or MIPS chips. To give you an idea, the $25 GL.Inet running
    a MIPS at 360 MHz and powered over USB gives me about 1250 conn/s and
    easily saturates 100 Mbps (next step will probably be to run haproxy
    inside a keyboard or mouse... just kidding). Older core2-based servers
    which are still working can make excellent machines for this given that
    they were undestructible and not much sensible to heat or bad treatment.

  - a regular single-socket x86 PC with an untuned out-of-the-box OS will
    easily provide up to 10k conn/s and will have no problem saturating a
    gig link at all objects size of 10k and above.

  - a properly tuned newer single-socket x86 PC with 3-4 memory channels,
    high frequency and low core count can reach in the 100k conns/s and
    10 Gbps with a good NIC. But that's a tough work so don't expect that
    from the first server you find, and always stay away from TCP offload
    engines found on some NICs, only accept stateless offloading (checksums,
    LRO, TSO, ...).

  - concerning SSL, count about 10k resumed connections/s per CPU core,
    5 Gbps of AES256 per core with AES-NI, and about 2500 RSA1024 or
    500 RSA2048 keys per second per CPU core for a 3+ GHz CPU. The ratio
    between new keys and resumed connections solely depends on the
    application and its public. For high key rates, you may need to
    consider some crypto accelerators (hint: cheapest ones do not
    offload RSA, so be careful).

  - for any serious server, always consider ECC memory to avoid random
    crashes caused by random memory corruption. It's common to see some
    LBs reach more than one year of uptime. Being forced to reboot just
    because of memory corruption at the worst moment is not fun, really.
    And always run a hardware watchdog to reboot your server if anything
    goes wrong.

Hoping this helps,
Willy

Re: What is the hardware requirement for haproxy?

Reply via email to