Hi Martin and Marc,
On 12/14/2016 10:58 AM, Martin Steigerwald wrote:
I didn´t post to LKML about the bug I reported in kernel upstream
bugtracker
yet.
I ponder whether to report the second issue, the one about netlink, that
Gerlof pointed out to the Debian bugtracker, since for Gerlof it does only
happen with the Debian kernel and not with the CentOS kernel he tried and I
also didn´t see this issue with my self-compiled kernel.
Yesterday I did some further investigation about the second issue in
kernel 4.8.4:
NETLINK command TASKSTATS_CMD_GET with nla_type
TASKSTATS_CMD_ATTR_REGISTER_CPUMASK results in error EINVAL (-22).
Conclusion: it is a kernel bug.
Find my analysis below (I hope it is not too detailed :-).
Best regards,
Gerlof
Since there are many reasons for the error EINVAL in this piece kernel
code involved, I added many printk's to find out where the error is
generated exactly.
My analysis:
1) The NETLINK command mentioned above is meant to specify the range of
CPUs for which taskstats has to be activated. In the system call, this
is specified via a string which is "0-3" for all four CPUs in my system.
In the kernel function 'cmd_attr_register_cpumask' (source file
kernel/taskstats.c), this string is converted into a bitmap (via a call
to function 'parse'). In this case, the bitmap contains
0x000000000000000f. This bitmap is one unsigned long because it provides
enough bits for four CPUs. However, if needed a bitmap might consist of
more unsigned longs.
2) After creating the bitmap,the function 'cmd_attr_register_cpumask'
calls function 'add_del_listener' to register the bitmap.
The first call issued by this function is:
if (!cpumask_subset(mask, cpu_possible_mask))
return -EINVAL;
and this is the call that causes error EINVAL.
3) The function 'cpumask_subset' (inline function in source file
include/linux/cpumask.h) contains:
static inline int cpumask_subset(const struct cpumask *src1p,
const struct cpumask *src2p)
{
return bitmap_subset(cpumask_bits(src1p), cpumask_bits(src2p),
nr_cpumask_bits);
}
4) The function 'bitmap_subset' (inline function in source file
include/linux/bitmap.h) verifies if the bits that are set in the first
bitmap are a subset of the bits set in the second bitmap. That second
bitmap is in this case the variable 'cpu_possible_mask'. This is a
bitmap that contains a bit *for every possible CPU* that might be
present. The number of bits in this bitmap is defined by the
configuration parameter CONFIG_NR_CPUS=512 (see .config) which contains
512 on my system (default).
So for this specific call to 'bitmap_subset', the first pointer refers
to the bitmask of one unsigned long describing the four CPUs that I
specified in my NETLINK call while the second pointer refers to
'cpu_possible_mask' consisting of 8 unsigned longs for 512 CPUs. The
third argument specifies the number of bits to be checked which is
'nr_cpumask_bits' that contains 512! IMHO, this is not correct. It means
that the bits in the two bitmaps are compared over a length of 8
unsigned longs, while the first bitmap only contains one unsigned long.
5) I verified the conclusion of point 4) by the printk's in front of the
call mentioned for point 2):
static int add_del_listener(pid_t pid, const struct cpumask *mask, int
isadd)
{
struct listener_list *listeners;
struct listener *s, *tmp, *s2;
unsigned int cpu, i;
int ret = 0;
printk(KERN_INFO "GELA: NR_CPUS=%d\n", NR_CPUS);
printk(KERN_INFO "GELA: nr_cpumask_bits=%d", nr_cpumask_bits);
printk(KERN_INFO "GELA: nr_cpu_ids=%d", nr_cpu_ids);
for (i=0; i < BITS_TO_LONGS(nr_cpumask_bits); i++) {
printk(KERN_INFO "GELA: %016lx %016lx",
*(cpumask_bits(mask)+i),
*(cpumask_bits(cpu_possible_mask)+i) );
}
if (!cpumask_subset(mask, cpu_possible_mask))
return -EINVAL;
When I run my test program, in dmesg I find the following messages:
[ 36.016248] GELA: NR_CPUS=512
[ 36.016249] GELA: nr_cpumask_bits=512
[ 36.016250] GELA: nr_cpu_ids=4
[ 36.016252] GELA: 000000000000000f 000000000000000f
[ 36.016253] GELA: ffff88005a3cae14 0000000000000000 <-- from here
bits in the first bitmap are not part of that bitmap!
[ 36.016254] GELA: ffff88005fdd7d40 0000000000000000
[ 36.016255] GELA: ffffffff81612630 0000000000000000
[ 36.016256] GELA: 000000000000000c 0000000000000000
[ 36.016257] GELA: ffffffff8131c193 0000000000000000
[ 36.016258] GELA: ffffffff816125c0 0000000000000000
[ 36.016259] GELA: ffffffff818e57c0 0000000000000000
This output also shows the variable 'nr_cpu_ids' that defines how many
CPUs are currently present in the system. When I use this variable as
third parameter in the call to 'bitmap_subset' (see code in point 3),
the problem does not occur any more because only 4 bits will be check in
both bitmaps.
6) With the original code, I modified the configuration parameter
CONFIG_NR_CPUS to 8 instead of 512 to see if I can bypass the problem.
This means that the bitmap 'cpu_possible_mask' only consists of one
unsigned long now. No problem occurs for the application now and
taskstats works fine.
7) I do not know exactly why the problem did not occur with older kernel
versions. I suspect configuration parameter CONFIG_CPUMASK_OFFSTACK
which really influences the code related to the cpumask handling. In all
kernels that had NO problems, this parameter was set to 'y'. In the
kernel WITH problems, this configuration parameter is not present at all
(so effectively 'n').