sysctl_tcp_congestion_control seems to have a bug that prevents it from
actually calling the tcp_set_default_congestion_control function. This is not
so apparent because it does not return an error and generally the /proc
interface is used to configure the default TCP congestion control algorithm.
This is present in 2.6.18 onwards and probably earlier, though I have not
inspected 2.6.15--2.6.17.

sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.

A sample program that shows this behaviour is attached. It is hardcoded to set
the congestion control algorithm to reno. Running it multiple times shows that
it succeeds, yet does not modify the default TCP congestion control algorithm.

e.g.,

$ make sysctl
$ sudo /tmp/sysctl; sudo /tmp/sysctl
old value='bic', new value should be='reno'
old value='bic', new value should be='reno'

A simple patch is also attached that modifies the if statement in
sysctl_tcp_congestion_control.
#include <linux/unistd.h>
#include <linux/types.h>
#include <linux/sysctl.h>
#include <stdio.h>

_syscall1(int, _sysctl, struct __sysctl_args *, args);
int sysctl(int *name, int nlen, void *oldval, size_t *oldlenp,
    void *newval, size_t newlen)
{
  struct __sysctl_args args={name,nlen,oldval,oldlenp,newval,newlen};
  return _sysctl(&args);
}

#define SIZE(x) sizeof(x)/sizeof(x[0])
#ifndef NET_TCO_CONG_CONTROL
#define NET_TCP_CONG_CONTROL 110
#endif

int main()
{
  char oldname[100];
  size_t oldnamelen = SIZE(oldname);
  char newval[] = "reno";
  size_t newvallen = SIZE(newval);
  int name[] = { CTL_NET, NET_IPV4, NET_TCP_CONG_CONTROL };

  if (sysctl(name, SIZE(name), oldname, &oldnamelen, newval, newvallen))
    perror("sysctl");
  else
    printf("old value='%s', new value should be='%s'\n", oldname, newval);
  return 0;
}

--- /tmp/net/ipv4/sysctl_net_ipv4.c	2007-10-05 11:07:25.000000000 -0700
+++ net/ipv4/sysctl_net_ipv4.c	2007-11-19 16:05:16.000000000 -0800
@@ -129,7 +129,7 @@
 	tcp_get_default_congestion_control(val);
 	ret = sysctl_string(&tbl, name, nlen, oldval, oldlenp, newval, newlen,
 			    context);
-	if (ret == 0 && newval && newlen)
+	if (ret == 1 && newval && newlen)
 		ret = tcp_set_default_congestion_control(val);
 	return ret;
 }

Reply via email to