Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension that enables a TCP connection to use different paths.
Multipath TCP has been used for several use cases. On smartphones, MPTCP enables seamless handovers between cellular and Wi-Fi networks while preserving established connections. This use-case is what pushed Apple to use MPTCP since 2013 in multiple applications [2]. On dual-stack hosts, Multipath TCP enables the TCP connection to automatically use the best performing path, either IPv4 or IPv6. If one path fails, MPTCP automatically uses the other path. To benefit from MPTCP, both the client and the server have to support it. Multipath TCP is a backward-compatible TCP extension that is enabled by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...). Multipath TCP is included in the Linux kernel since version 5.6 [3]. To use it on Linux, an application must explicitly enable it when creating the socket. No need to change anything else in the application. This attached patch adds MPTCP per address support, to be used with: mptcp{,4,6}@<address>[:port1[-port2]] MPTCP v4 and v6 protocols have been added: they are mainly a copy of the TCP ones, with small differences: names, proto, and receivers lists. These protocols are stored in __protocol_by_family, as an alternative to TCP, similar to what has been done with QUIC. By doing that, the size of __protocol_by_family has not been increased, and it behaves like TCP. MPTCP is both supported for the frontend and backend sides. Also added an example of configuration using mptcp along with a backend allowing to experiment with it. Note that this is a re-implementation of Björn's work from 3 years ago [4], when haproxy's internals were probably less ready to deal with this, causing his work to be left pending for a while. Link: https://www.rfc-editor.org/rfc/rfc8684.html [1] Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2] Link: https://www.mptcp.dev [3] Link: https://github.com/haproxy/haproxy/issues/1028 [4] Co-authored-by: Dorian Craps <dorian.cr...@student.vinci.be> Co-authored-by: Matthieu Baerts (NGI0) <matt...@kernel.org> --- doc/configuration.txt | 21 +++++++++ examples/mptcp-backend.py | 22 +++++++++ examples/mptcp.cfg | 23 +++++++++ include/haproxy/compat.h | 5 ++ include/haproxy/protocol.h | 4 +- src/backend.c | 11 ++++- src/proto_tcp.c | 96 ++++++++++++++++++++++++++++++++++++++ src/protocol.c | 4 +- src/sock.c | 4 +- src/tools.c | 24 +++++++++- 10 files changed, 206 insertions(+), 8 deletions(-) create mode 100644 examples/mptcp-backend.py create mode 100644 examples/mptcp.cfg diff --git a/doc/configuration.txt b/doc/configuration.txt index aece65c81..3767b7ac6 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -28205,6 +28205,27 @@ report this to the maintainers. range can or must be specified. It is considered as an alias of 'stream+ipv4@'. +'mptcp@<address>[:port1[-port2]]' following <address> is considered as an IPv4 + or IPv6 address depending of the syntax but + socket type and transport method is forced to + "stream", with the MPTCP protocol. Depending + on the statement using this address, a port or + a port range can or must be specified. + +'mptcp4@<address>[:port1[-port2]]' following <address> is always considered as + an IPv4 address but socket type and transport + method is forced to "stream", with the MPTCP + protocol. Depending on the statement using + this address, a port or port range can or + must be specified. + +'mptcp6@<address>[:port1[-port2]]' following <address> is always considered as + an IPv6 address but socket type and transport + method is forced to "stream", with the MPTCP + protocol. Depending on the statement using + this address, a port or port range can or + must be specified. + 'udp@<address>[:port1[-port2]]' following <address> is considered as an IPv4 or IPv6 address depending of the syntax but socket type and transport method is forced to diff --git a/examples/mptcp-backend.py b/examples/mptcp-backend.py new file mode 100644 index 000000000..fe14e8bfe --- /dev/null +++ b/examples/mptcp-backend.py @@ -0,0 +1,22 @@ +# ============================================================================= +# Example of a simple backend server using mptcp in python, used with mptcp.cfg +# ============================================================================= + +import socket + +sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, socket.IPPROTO_MPTCP) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) +# dual stack IPv4/IPv6 +sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 0) + +sock.bind(("::", 4331)) +sock.listen() + +while True: + (conn, address) = sock.accept() + req = conn.recv(1024) + print(F"Received request : {req}") + conn.send(b"HTTP/1.0 200 OK\r\n\r\nHello\n") + conn.close() + +sock.close() diff --git a/examples/mptcp.cfg b/examples/mptcp.cfg new file mode 100644 index 000000000..514c36b22 --- /dev/null +++ b/examples/mptcp.cfg @@ -0,0 +1,23 @@ +# You can test this configuration by running the command: +# +# $ mptcpize run curl localhost:5000 + +global + strict-limits # refuse to start if insufficient FDs/memory + # add some process-wide tuning here if required + +defaults + mode http + balance roundrobin + timeout client 60s + timeout server 60s + timeout connect 1s + +frontend main + bind mptcp@[::]:5000 + default_backend mptcp_backend + +# MPTCP is usually used on the frontend, but it is also possible +# to enable it to communicate with the backend +backend mptcp_backend + server mptcp_server mptcp@[::]:4331 diff --git a/include/haproxy/compat.h b/include/haproxy/compat.h index 3829060b7..2347d62cb 100644 --- a/include/haproxy/compat.h +++ b/include/haproxy/compat.h @@ -317,6 +317,11 @@ typedef struct { } empty_t; #define queue _queue #endif +/* only Linux defines IPPROTO_MPTCP */ +#ifndef IPPROTO_MPTCP +#define IPPROTO_MPTCP 262 +#endif + #endif /* _HAPROXY_COMPAT_H */ /* diff --git a/include/haproxy/protocol.h b/include/haproxy/protocol.h index 828093d98..47cfa5935 100644 --- a/include/haproxy/protocol.h +++ b/include/haproxy/protocol.h @@ -94,10 +94,10 @@ int protocol_enable_all(void); * supported protocol types, and ctrl_type of either SOCK_STREAM or SOCK_DGRAM * depending on the requested values, or NULL if not found. */ -static inline struct protocol *protocol_lookup(int family, enum proto_type proto_type, int ctrl_dgram) +static inline struct protocol *protocol_lookup(int family, enum proto_type proto_type, int alt) { if (family >= 0 && family < AF_CUST_MAX) - return __protocol_by_family[family][proto_type][!!ctrl_dgram]; + return __protocol_by_family[family][proto_type][!!alt]; return NULL; } diff --git a/src/backend.c b/src/backend.c index 6956d9bfe..6b865768e 100644 --- a/src/backend.c +++ b/src/backend.c @@ -1690,8 +1690,15 @@ int connect_server(struct stream *s) if (!srv_conn->xprt) { /* set the correct protocol on the output stream connector */ + int mptcp = 0; + + /* cli_conn can be NULL when the origin of the stream isn't a + * connection, there's no reason to use MPTCP in this case */ + if (cli_conn && cli_conn->ctrl) + mptcp = cli_conn->ctrl->sock_prot == IPPROTO_MPTCP; + if (srv) { - if (conn_prepare(srv_conn, protocol_lookup(srv_conn->dst->ss_family, PROTO_TYPE_STREAM, 0), srv->xprt)) { + if (conn_prepare(srv_conn, protocol_lookup(srv_conn->dst->ss_family, PROTO_TYPE_STREAM, !!mptcp), srv->xprt)) { conn_free(srv_conn); return SF_ERR_INTERNAL; } @@ -1699,7 +1706,7 @@ int connect_server(struct stream *s) int ret; /* proxies exclusively run on raw_sock right now */ - ret = conn_prepare(srv_conn, protocol_lookup(srv_conn->dst->ss_family, PROTO_TYPE_STREAM, 0), xprt_get(XPRT_RAW)); + ret = conn_prepare(srv_conn, protocol_lookup(srv_conn->dst->ss_family, PROTO_TYPE_STREAM, !!mptcp), xprt_get(XPRT_RAW)); if (ret < 0 || !(srv_conn->ctrl)) { conn_free(srv_conn); return SF_ERR_INTERNAL; diff --git a/src/proto_tcp.c b/src/proto_tcp.c index d6552b2f1..9cabae11b 100644 --- a/src/proto_tcp.c +++ b/src/proto_tcp.c @@ -149,6 +149,102 @@ struct protocol proto_tcpv6 = { INITCALL1(STG_REGISTER, protocol_register, &proto_tcpv6); +#ifdef __linux__ +/* Most fields are copied from proto_tcpv4 */ +struct protocol proto_mptcpv4 = { + .name = "mptcpv4", + + /* connection layer */ + .xprt_type = PROTO_TYPE_STREAM, + .listen = tcp_bind_listener, + .enable = tcp_enable_listener, + .disable = tcp_disable_listener, + .add = default_add_listener, + .unbind = default_unbind_listener, + .suspend = default_suspend_listener, + .resume = default_resume_listener, + .accept_conn = sock_accept_conn, + .ctrl_init = sock_conn_ctrl_init, + .ctrl_close = sock_conn_ctrl_close, + .connect = tcp_connect_server, + .drain = sock_drain, + .check_events = sock_check_events, + .ignore_events = sock_ignore_events, + .get_info = tcp_get_info, + + /* binding layer */ + .rx_suspend = tcp_suspend_receiver, + .rx_resume = tcp_resume_receiver, + + /* address family */ + .fam = &proto_fam_inet4, + + /* socket layer */ + .proto_type = PROTO_TYPE_STREAM, + .sock_type = SOCK_STREAM, + .sock_prot = IPPROTO_MPTCP, /* MPTCP specific */ + .rx_enable = sock_enable, + .rx_disable = sock_disable, + .rx_unbind = sock_unbind, + .rx_listening = sock_accepting_conn, + .default_iocb = sock_accept_iocb, + .receivers = LIST_HEAD_INIT(proto_mptcpv4.receivers), + .nb_receivers = 0, +#ifdef SO_REUSEPORT + .flags = PROTO_F_REUSEPORT_SUPPORTED, +#endif +}; + +INITCALL1(STG_REGISTER, protocol_register, &proto_mptcpv4); + +/* Most fields are copied from proto_tcpv6 */ +struct protocol proto_mptcpv6 = { + .name = "mptcpv6", + + /* connection layer */ + .xprt_type = PROTO_TYPE_STREAM, + .listen = tcp_bind_listener, + .enable = tcp_enable_listener, + .disable = tcp_disable_listener, + .add = default_add_listener, + .unbind = default_unbind_listener, + .suspend = default_suspend_listener, + .resume = default_resume_listener, + .accept_conn = sock_accept_conn, + .ctrl_init = sock_conn_ctrl_init, + .ctrl_close = sock_conn_ctrl_close, + .connect = tcp_connect_server, + .drain = sock_drain, + .check_events = sock_check_events, + .ignore_events = sock_ignore_events, + .get_info = tcp_get_info, + + /* binding layer */ + .rx_suspend = tcp_suspend_receiver, + .rx_resume = tcp_resume_receiver, + + /* address family */ + .fam = &proto_fam_inet6, + + /* socket layer */ + .proto_type = PROTO_TYPE_STREAM, + .sock_type = SOCK_STREAM, + .sock_prot = IPPROTO_MPTCP, /* MPTCP specific */ + .rx_enable = sock_enable, + .rx_disable = sock_disable, + .rx_unbind = sock_unbind, + .rx_listening = sock_accepting_conn, + .default_iocb = sock_accept_iocb, + .receivers = LIST_HEAD_INIT(proto_mptcpv6.receivers), + .nb_receivers = 0, +#ifdef SO_REUSEPORT + .flags = PROTO_F_REUSEPORT_SUPPORTED, +#endif +}; + +INITCALL1(STG_REGISTER, protocol_register, &proto_mptcpv6); +#endif + /* Binds ipv4/ipv6 address <local> to socket <fd>, unless <flags> is set, in which * case we try to bind <remote>. <flags> is a 2-bit field consisting of : * - 0 : ignore remote address (may even be a NULL pointer) diff --git a/src/protocol.c b/src/protocol.c index 399835a88..c27874fd3 100644 --- a/src/protocol.c +++ b/src/protocol.c @@ -47,7 +47,9 @@ void protocol_register(struct protocol *proto) LIST_APPEND(&protocols, &proto->list); __protocol_by_family[sock_domain] [proto->proto_type] - [proto->xprt_type == PROTO_TYPE_DGRAM] = proto; + [proto->xprt_type == PROTO_TYPE_DGRAM || + proto->sock_prot == IPPROTO_MPTCP] = proto; + HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); } diff --git a/src/sock.c b/src/sock.c index df82c6ea7..e32573b7e 100644 --- a/src/sock.c +++ b/src/sock.c @@ -278,7 +278,7 @@ int sock_create_server_socket(struct connection *conn, struct proxy *be, int *st ns = __objt_server(conn->target)->netns; } #endif - sock_fd = my_socketat(ns, conn->dst->ss_family, SOCK_STREAM, 0); + sock_fd = my_socketat(ns, conn->dst->ss_family, SOCK_STREAM, conn->ctrl->sock_prot); /* at first, handle common to all proto families system limits and permission related errors */ if (sock_fd == -1) { @@ -303,7 +303,7 @@ int sock_create_server_socket(struct connection *conn, struct proxy *be, int *st } if (fd_set_nonblock(sock_fd) == -1 || - ((conn->ctrl->sock_prot == IPPROTO_TCP) && (setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)) == -1))) { + ((conn->ctrl->sock_prot == IPPROTO_TCP || conn->ctrl->sock_prot == IPPROTO_MPTCP) && (setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)) == -1))) { qfprintf(stderr,"Cannot set client socket to non blocking mode.\n"); send_log(be, LOG_EMERG, "Cannot set client socket to non blocking mode.\n"); close(sock_fd); diff --git a/src/tools.c b/src/tools.c index 15756c880..e0e871745 100644 --- a/src/tools.c +++ b/src/tools.c @@ -977,6 +977,7 @@ struct sockaddr_storage *str2sa_range(const char *str, int *port, int *low, int int new_fd = -1; enum proto_type proto_type = 0; // to shut gcc warning int ctrl_type = 0; // to shut gcc warning + int mptcp = 0; portl = porth = porta = 0; if (fqdn) @@ -1063,6 +1064,13 @@ struct sockaddr_storage *str2sa_range(const char *str, int *port, int *low, int proto_type = PROTO_TYPE_STREAM; ctrl_type = SOCK_STREAM; } + else if (strncmp(str2, "mptcp4@", 7) == 0) { + str2 += 7; + ss.ss_family = AF_INET; + proto_type = PROTO_TYPE_STREAM; + ctrl_type = SOCK_STREAM; + mptcp = 1; + } else if (strncmp(str2, "udp4@", 5) == 0) { str2 += 5; ss.ss_family = AF_INET; @@ -1075,6 +1083,13 @@ struct sockaddr_storage *str2sa_range(const char *str, int *port, int *low, int proto_type = PROTO_TYPE_STREAM; ctrl_type = SOCK_STREAM; } + else if (strncmp(str2, "mptcp6@", 7) == 0) { + str2 += 7; + ss.ss_family = AF_INET; + proto_type = PROTO_TYPE_STREAM; + ctrl_type = SOCK_STREAM; + mptcp = 1; + } else if (strncmp(str2, "udp6@", 5) == 0) { str2 += 5; ss.ss_family = AF_INET6; @@ -1087,6 +1102,13 @@ struct sockaddr_storage *str2sa_range(const char *str, int *port, int *low, int proto_type = PROTO_TYPE_STREAM; ctrl_type = SOCK_STREAM; } + else if (strncmp(str2, "mptcp@", 6) == 0) { + str2 += 6; + ss.ss_family = AF_UNSPEC; + proto_type = PROTO_TYPE_STREAM; + ctrl_type = SOCK_STREAM; + mptcp = 1; + } else if (strncmp(str2, "udp@", 4) == 0) { str2 += 4; ss.ss_family = AF_UNSPEC; @@ -1365,7 +1387,7 @@ struct sockaddr_storage *str2sa_range(const char *str, int *port, int *low, int */ new_proto = protocol_lookup(ss.ss_family, proto_type, - ctrl_type == SOCK_DGRAM); + ctrl_type == SOCK_DGRAM || !!mptcp); if (!new_proto && (!fqdn || !*fqdn) && (ss.ss_family != AF_CUST_EXISTING_FD)) { memprintf(err, "unsupported %s protocol for %s family %d address '%s'%s", -- 2.46.0