Le 26/03/2021 à 16:13, Dmitry Safonov a écrit :
Hello,
On 3/26/21 10:50 AM, Christophe Leroy wrote:
Le 26/03/2021 à 11:46, Michael Ellerman a écrit :
Laurent Dufour <lduf...@linux.ibm.com> writes:
Le 25/03/2021 à 17:56, Laurent Dufour a écrit :
Le 25/03/2021 à 17:46, Christophe Leroy a écrit :
Le 25/03/2021 à 17:11, Laurent Dufour a écrit :
Since v5.11 and the changes you made to the VDSO code, it no more
exposing
the ELF header at the beginning of the VDSO mapping in user space.
This is confusing CRIU which is checking for this ELF header cookie
(https://github.com/checkpoint-restore/criu/issues/1417).
How does it do on other architectures ?
Good question, I'll double check the CRIU code.
On x86, there are 2 VDSO entries:
7ffff7fcb000-7ffff7fce000 r--p 00000000 00:00
0 [vvar]
7ffff7fce000-7ffff7fcf000 r-xp 00000000 00:00
0 [vdso]
And the VDSO is starting with the ELF header.
I'm not an expert in loading and ELF part and reading the change
you made, I
can't identify how this could work now as I'm expecting the loader
to need
that ELF header to do the relocation.
I think the loader is able to find it at the expected place.
Actually, it seems the loader relies on the AUX vector
AT_SYSINFO_EHDR. I guess
CRIU should do the same.
From my investigation it seems that the first bytes of the VDSO
area are now
the vdso_arch_data.
Is the ELF header put somewhere else?
How could the loader process the VDSO without that ELF header?
Like most other architectures, we now have the data section as
first page and
the text section follows. So you will likely find the elf header on
the second
page.
I'm wondering if the data section you're refering to is the vvar
section I can
see on x86.
Many of the other architectures have separate vm_special_mapping's for
the data page and the vdso binary, where the former is called "vvar".
eg, s390:
static struct vm_special_mapping vvar_mapping = {
.name = "[vvar]",
.fault = vvar_fault,
};
static struct vm_special_mapping vdso_mapping = {
.name = "[vdso]",
.mremap = vdso_mremap,
};
I guess we probably should be doing that too.
Dmitry proposed the same, see
https://github.com/0x7f454c46/linux/commit/783c7a2532d2219edbcf555cc540eab05f698d2a
Discussion at https://github.com/checkpoint-restore/criu/issues/1417
Yeah, I didn't submit it officially to lkml because I couldn't test it
yet (and I usually don't send untested patches). The VM I have fails to
kexec and there's some difficulty to get serial console working, so I'd
appreciate if someone could either pick it up, or add tested-by.
Just to let everyone know, while testing your patch with selftest I encountered the following Oops.
But I also have it without your patch thought.
root@vgoip:~# ./sigreturn_vdso
test: sigreturn_vdso
tags: git_version:v5.12-rc4-1553-gc31141d460e6
VDSO is at 0x104000-0x10bfff (32768 bytes)
Signal delivered OK with VDSO mapped
VDSO moved to 0x77bf4000-0x77bfbfff (32768 bytes)
Signal delivered OK with VDSO moved
Unmapped VDSO
[ 1855.444371] Kernel attempted to read user page (7ff9ff30) - exploit attempt?
(uid: 0)
[ 1855.459404] BUG: Unable to handle kernel data access on read at 0x7ff9ff30
[ 1855.466188] Faulting instruction address: 0xc00111d4
[ 1855.471099] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1855.476428] BE PAGE_SIZE=16K PREEMPT CMPC885
[ 1855.480702] SAF3000 DIE NOTIFICATION
[ 1855.484184] CPU: 0 PID: 362 Comm: sigreturn_vdso Not tainted
5.12.0-rc4-s3k-dev-01553-gc31141d460e6 #4811
[ 1855.493644] NIP: c00111d4 LR: c0005a28 CTR: 00000000
[ 1855.498634] REGS: cadb3dd0 TRAP: 0300 Not tainted
(5.12.0-rc4-s3k-dev-01553-gc31141d460e6)
[ 1855.507068] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48000884 XER: 20000000
[ 1855.513866] DAR: 7ff9ff30 DSISR: 88000000
[ 1855.513866] GPR00: c0007788 cadb3e90 c28dc000 7ff9ff30 7ff9ff40 000004e0
7ff9fd50 00000000
[ 1855.513866] GPR08: 00000001 00000001 7ff9ff30 00000000 28000282 1001b7e8
100a0920 00000000
[ 1855.513866] GPR16: 100cac0c 100b0000 102883a4 10289685 100d0000 100d0000
100d0000 100b2e9e
[ 1855.513866] GPR24: ffffffff 102883c8 00000000 7ff9ff38 cadb3f40 cadb3ec8
c28dc000 00000000
[ 1855.552767] NIP [c00111d4] flush_icache_range+0x90/0xb4
[ 1855.557932] LR [c0005a28] handle_signal32+0x1bc/0x1c4
[ 1855.562925] Call Trace:
[ 1855.565332] [cadb3e90] [100d0000] 0x100d0000 (unreliable)
[ 1855.570666] [cadb3ec0] [c0007788] do_notify_resume+0x260/0x314
[ 1855.576432] [cadb3f20] [c000c764] syscall_exit_prepare+0x120/0x184
[ 1855.582542] [cadb3f30] [c00100b4] ret_from_syscall+0xc/0x28
[ 1855.588050] --- interrupt: c00 at 0xfe807f8
[ 1855.592183] NIP: 0fe807f8 LR: 10001048 CTR: c0139378
[ 1855.597174] REGS: cadb3f40 TRAP: 0c00 Not tainted
(5.12.0-rc4-s3k-dev-01553-gc31141d460e6)
[ 1855.605607] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 28000282 XER: 20000000
[ 1855.612664]
[ 1855.612664] GPR00: 00000025 7ffa0230 77c09690 00000000 0000000a 28000282
00000001 0ff03a38
[ 1855.612664] GPR08: 0000d032 00000328 c28dc000 00000009 88000282 1001b7e8
100a0920 00000000
[ 1855.612664] GPR16: 100cac0c 100b0000 102883a4 10289685 100d0000 100d0000
100d0000 100b2e9e
[ 1855.612664] GPR24: ffffffff 102883c8 00000000 77bff628 10002358 10010000
1000210c 00008000
[ 1855.648894] NIP [0fe807f8] 0xfe807f8
[ 1855.652426] LR [10001048] 0x10001048
[ 1855.655954] --- interrupt: c00
[ 1855.658969] Instruction dump:
[ 1855.661893] 38630010 7c001fac 38630010 4200fff0 7c0004ac 4c00012c 4e800020
7c001fac
[ 1855.669811] 2c0a0000 38630010 4082ffcc 4bffffe4 <7c00186c> 2c070000 39430010
4082ff8c
[ 1855.677910] ---[ end trace f071a5587092b3aa ]---
[ 1855.682462]
Remapped the stack executable
!! child died by signal 11
failure: sigreturn_vdso