On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote: > So I think there's an issue with "deterministically reproducible." > The syzcaller report has: > > > Unfortunately, I don't have any reproducer for this issue yet.
Yeah, Dmitry gave two other links of similar reports, the first one works for me: https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c and that one doesn't have a reproducer either. The bytes look familiar though: Code: c1 e8 03 42 80 3c 20 00 74 05 e8 79 7a a7 00 49 8b 47 10 48 89 05 f6 d8 ef 09 49 8d 7f 08 48 89 f8 48 c1 e8 03 42 80 3c 00 00 <00> 00 e8 57 7a a7 00 49 8b 47 08 48 89 05 dc d8 ef 09 49 8d 7f 18 All code ======== 0: c1 e8 03 shr $0x3,%eax 3: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1) 8: 74 05 je 0xf a: e8 79 7a a7 00 callq 0xa77a88 f: 49 8b 47 10 mov 0x10(%r15),%rax 13: 48 89 05 f6 d8 ef 09 mov %rax,0x9efd8f6(%rip) # 0x9efd910 1a: 49 8d 7f 08 lea 0x8(%r15),%rdi 1e: 48 89 f8 mov %rdi,%rax 21: 48 c1 e8 03 shr $0x3,%rax 25: 42 80 3c 00 00 cmpb $0x0,(%rax,%r8,1) 2a:* 00 00 add %al,(%rax) <-- trapping instruction 2c: e8 57 7a a7 00 callq 0xa77a88 31: 49 8b 47 08 mov 0x8(%r15),%rax 35: 48 89 05 dc d8 ef 09 mov %rax,0x9efd8dc(%rip) # 0x9efd918 3c: 49 8d 7f 18 lea 0x18(%r15),%rdi 4 zero bytes again. And that .config has kasan stuff enabled too so could the failure be related to having kasan stuff enabled and it messing up offsets? That is, provided this is the mechanism how it would happen. We still don't know what and when wrote those zeroes in there. Not having a reproducer is nasty but looking at those reports above and if I'm reading this correctly, rIP points to RIP: 0010:update_pvclock_gtod arch/x86/kvm/x86.c:1743 [inline] each time and the URL says they're 9 crashes total. And each have happened at that rIP. So all we'd need is set a watchpoint when that address is being written and dump stuff. Dmitry, can the syzkaller do debugging stuff like that? > Following my hypothesis about having a bad address calculation; the > tricky part is I'd need to look through the relocations and try to see > if any could resolve to the address that was accidentally modified. I > suspect objtool could be leveraged for that; If you can find this at compile time... > maybe it could check whether each `struct jump_entry`'s `target` > member referred to either a NOP or a CMP, and error otherwise? (Do we > have other non-NOP or CMP targets? IDK) Follow jump_label_transform() - it does verify what it is going to patch. And while I'm looking at this, I realize that the jump labels patch 5 bytes but the above zeroes are 4 bytes. In the other opcode bytes I decoded it is 4 bytes too. So this might not be caused by the jump labels patching... > This hypothesis might also be incorrect, and thus would be chasing a > red herring...not really sure how else to pursue debugging this. Yeah, this one is tricky to debug. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette