Today, my employer is making available a highly efficient reverse HTTP(S) proxy called simply 'RProxy'. This project is being released open-source to encourage the general community to participate in its evolution.
My employer always avoids trying to re-invent the wheel when it comes to software, so why create another reverse-proxy? Many of the wonderful open-source proxies that exist today are tailored to the average GET <-> RESPONSE traffic types. For each request, they may spawn a new thread, create a new connection to the back-end, or both. Many of the projects we analyzed could not handle large streams of data efficiently since they would block until the full client request has been received (hey, where did my memory go?). Resource exhaustion was a common element under high load: memory, file descriptors, CPU, etc. These existing projects are designed perfectly for common traffic flows, but can quickly capsize under pressure. My employer had a requirement for a proxy that could scale to thousands of simultaneous SSL connections, with certificate verification, and various caching methods, all while maintaining a low system resource footprint. After testing all the popular and well maintained open-source proxy projects, we could not find one that met our specific needs. It was under this pretext we decided to roll our own. Architecture ------------------------------------------------------------------------------- The RProxy architecture uses a mix of threading and event-driven methods of handling requests. At startup, a configured number of threads are spawned, each with their own event loop. Each of these threads will make a configured number of persistent connections to the configured back-end servers. We leverage HTTP 1.1 to keep these connections open so that each incoming request from a client does not force RProxy to establish a new connection to the back-end. This results in each request being assigned a pre-existing connection to a back-end (even if the client is using HTTP 1.0, or HTTP 1.1 with keep-alive disabled). This technique is known as pipelining, a feature which most proxies avoid due to the complexity of maintaining request states. We solved this by creating three states a back-end connection can have: - IDLE: The connection is up and is able to be used to service a new request. - ACTIVE: The connection is being used to service another request. - DOWN: The connection is down, pending a reconnect. When a new request is made, it is placed into a pending queue. This pending queue is processed whenever a back-end's state transitions to IDLE. The request is then associated with that IDLE connection and its state is changed to ACTIVE. There are many configuration options that affect how requests in a pending state are handled so that resource consumption does not become an issue under high load. Features ------------------------------------------------------------------------------- The RProxy source code has a detailed and up-to-date configuration guide, but some of the main features that stand out are: - Various methods of load-balancing requests to a back-end. - Transparent URI rewriting. - The ability to append X-Header fields to the request being made to the back-end, including dynamic additions of extended TLS fields. - Configurable thresholding and backlogging for both front-end and back-end IO. - A flexible logging system. - Full SSL support (via OpenSSL) * TLS False start * x509 verification * Certificate caching * Session caching * All other commonly used SSL options. As mentioned prior, it is best to read the documentation to get a detailed understanding of the many aspects of the system. Components ------------------------------------------------------------------------------- RProxy was built on top of several well maintained open-source libraries such as Libevent, Libconfuse, Libevhtp, and OpenSSL. While in the process of writing RProxy, many of the above libraries needed fixes and patches. We would like to thank the maintainers of these projects for their willingness to help and accept our changes (A special thanks to Nick Mathewson, maintainer of Libevent, whom we harassed the most). It is suggested that the most recent versions of the above libraries are used for optimal performance. Performance ------------------------------------------------------------------------------- RProxy was tested primarily on various *NIX platforms, however most of the performance tweaks were targeting Linux. We used an Intel i7 quad-core processor, with a generic 1Gb ethernet adapter running the latest version of CentOS for testing. Our SSL keys were 2048 bits, with client certificate validation enabled. With neither host or client based (RFC5077) caching, RProxy was able to handle on the order of 2000 full SSL transactions per-second. If one of the above cache methods were enabled, our testing demonstrated RProxy was able to handle over 6600 SSL transactions per-second. Large data flow tests showed that RProxy was able to run at 1 gigabit line-rate (or as close as you can expect once the data has reached user-land). Future ------------------------------------------------------------------------------- We continue to add functionality to the software; virtual server support is currently in development, as well as support for internal redirection. (See the develop branch to see where we're going). I can haz source? ------------------------------------------------------------------------------- The source can be found on github: https://github.com/mandiant/RProxy The current stable release is v1.0.25. It is suggested that RProxy be built with all external dependencies downloaded and installed for you, creating a nice static binary with all of the latest stable releases. This can be done with an optional cmake flag: (cd build; cmake -DRPROXY_BUILD_DEPS:STRING=ON ..; make) Otherwise the following dependencies are as follows for optimal performance: libconfuse: http://savannah.nongnu.org/download/confuse/confuse-2.7.tar.gz openssl: http://openssl.org/source/openssl-1.0.0i.tar.gz (we've had issues with newer versions) libevent: https://github.com/downloads/libevent/libevent/libevent-2.0.19-stable.tar.gz libevhtp: https://github.com/ellzey/libevhtp/tree/0.4.14 Libevhtp ------------------------------------------------------------------------------- I noticed an announcement on this list for a new project which attempts to create a new evhttp type API. It was then I realized that I had never really announced libevhtp on this mailing list, and we seem to be duplicating efforts. Libevhtp has been in development for over a year and is being used in many projects. It attempts to create a very flexible API for HTTP processing. Though it lack *real* documentation (relying on example code to show off all the features), I am working on that side of things. Some of the primary features I can rattle off the top of my head: - per-connection or per-request hooks for all stages of request processing * pre_accept [called before a connection is accepted] * post_accept [called after a connection is accepted] * on_path [called when the request path has been parsed] * on_headers_start [called after request line, prior to header parsing] * on_header [called after one key/val header has been parsed] * on_headers [called after all headers have been parsed] * on_new_chunk [called when a single chunk octet is parsed] * on_chunk_complete [called when a single chunk is finished] * on_chunks_complete [called when all chunks have been processed] * on_read [called whenever body data has been read (this includes a body of a chunk)] * on_error [called when an error occurs] * on_request_fini [called when a request has been fully processed] * on_connection_fini [called when a connection has been terminated] - Different methods of setting callbacks * evhtp_set_cb() [set a callback for a specific uri] * evhtp_set_regex_cb [set a callback for a uri with a regex] * evhtp_set_glob_cb [set a callback for a uri using a simple wildcard] - Built-in threadpool support (you don't have enable libevent locking support!) - Built-in SSL support. A simple example: https://github.com/ellzey/libevhtp/blob/master/test_basic.c A more complex example: https://github.com/ellzey/libevhtp/blob/master/test.c A real-life application: Well... See above (RProxy) Application design on using libevhtp's threading feature to do cool things in parallel without locking: https://gist.github.com/2579114 (I figured a redis example would work out well). Libevhtp can be found here: https://github.com/ellzey/libevhtp *********************************************************************** To unsubscribe, send an e-mail to majord...@freehaven.net with unsubscribe libevent-users in the body.