Le 30/04/2020 à 07:39, Christoph Hellwig a écrit :
On Thu, Apr 30, 2020 at 08:39:00AM +0800, Jeremy Kerr wrote:
Hi Christophe,
Just use the proper non __-prefixed get/put_user variants where
that is not done yet.
But it means you are doing the access_ok() check everytime, which is
what is to be avoided by doing the access_ok() once then using the
__-prefixed variant.
5 out of 8 of these are just a access_ok(); simple_read_from_buffer().
For the cases where it's multiple __put/get_user()s, the max will be 5.
(for the mbox access). Is that worth optimising the access_ok() checks?
access_ok is just trivial comparism to the segment limit, I don't
think it has a relavant performance impact.
I think it has an impact. See the difference between the two following
trivial functions:
int test1(unsigned long val, unsigned long *addr)
{
return __put_user(val, addr);
}
int test2(unsigned long val, unsigned long *addr)
{
return put_user(val, addr);
}
00000000 <test1>:
0: 39 20 00 00 li r9,0
4: 90 64 00 00 stw r3,0(r4)
8: 7d 23 4b 78 mr r3,r9
c: 4e 80 00 20 blr
00000000 <test2>:
0: 81 22 04 38 lwz r9,1080(r2)
4: 7c 6a 1b 78 mr r10,r3
8: 7f 89 20 40 cmplw cr7,r9,r4
c: 41 9c 00 24 blt cr7,30 <test2+0x30>
10: 7d 24 48 50 subf r9,r4,r9
14: 38 60 ff f2 li r3,-14
18: 2b 89 00 02 cmplwi cr7,r9,2
1c: 4c 9d 00 20 blelr cr7
20: 39 20 00 00 li r9,0
24: 91 44 00 00 stw r10,0(r4)
28: 7d 23 4b 78 mr r3,r9
2c: 4e 80 00 20 blr
30: 38 60 ff f2 li r3,-14
34: 4e 80 00 20 blr
It looks like GCC is smart enough to read the limit in task struct only
once when we have two consecutive put_user() but there is still some
difference:
int test3(unsigned long val, unsigned long *addr)
{
return put_user(val, addr) ? : put_user(val, addr + 1);
}
int test4(unsigned long val, unsigned long *addr)
{
if (!access_ok(addr, sizeof(*addr)))
return -EFAULT;
return __put_user(val, addr) ? : __put_user(val, addr + 1);
}
00000000 <test3>:
0: 81 42 04 38 lwz r10,1080(r2)
4: 7f 8a 20 40 cmplw cr7,r10,r4
8: 41 9c 00 48 blt cr7,50 <test3+0x50>
c: 7d 04 50 50 subf r8,r4,r10
10: 39 20 ff f2 li r9,-14
14: 2b 88 00 02 cmplwi cr7,r8,2
18: 40 9d 00 30 ble cr7,48 <test3+0x48>
1c: 39 20 00 00 li r9,0
20: 90 64 00 00 stw r3,0(r4)
24: 2f 89 00 00 cmpwi cr7,r9,0
28: 40 9e 00 20 bne cr7,48 <test3+0x48>
2c: 38 84 00 04 addi r4,r4,4
30: 7f 8a 20 40 cmplw cr7,r10,r4
34: 41 9c 00 1c blt cr7,50 <test3+0x50>
38: 7d 44 50 50 subf r10,r4,r10
3c: 2b 8a 00 02 cmplwi cr7,r10,2
40: 40 9d 00 10 ble cr7,50 <test3+0x50>
44: 90 64 00 00 stw r3,0(r4)
48: 7d 23 4b 78 mr r3,r9
4c: 4e 80 00 20 blr
50: 39 20 ff f2 li r9,-14
54: 4b ff ff f4 b 48 <test3+0x48>
Disassembly of section .text.test4:
00000000 <test4>:
0: 81 22 04 38 lwz r9,1080(r2)
4: 7f 89 20 40 cmplw cr7,r9,r4
8: 41 9c 00 34 blt cr7,3c <test4+0x3c>
c: 7d 44 48 50 subf r10,r4,r9
10: 39 20 ff f2 li r9,-14
14: 2b 8a 00 06 cmplwi cr7,r10,6
18: 40 9d 00 1c ble cr7,34 <test4+0x34>
1c: 39 20 00 00 li r9,0
20: 90 64 00 00 stw r3,0(r4)
24: 2f 89 00 00 cmpwi cr7,r9,0
28: 40 9e 00 0c bne cr7,34 <test4+0x34>
2c: 38 84 00 04 addi r4,r4,4
30: 90 64 00 00 stw r3,0(r4)
34: 7d 23 4b 78 mr r3,r9
38: 4e 80 00 20 blr
3c: 39 20 ff f2 li r9,-14
40: 4b ff ff f4 b 34 <test4+0x34>