On Tue, 29 Oct 2024 19:01:03 GMT, Patricio Chilano Mateo <pchilanom...@openjdk.org> wrote:
>>> One way to get rid of this would be to have c2 just set last_Java_pc too >>> along with last_Java_sp, so we don't need to push lr to be able to do >>> last_Java_sp[-1] to make the frame walkable. >> >> If that would solve the problem, then that must mean we save/freeze >> last_Java_pc as part of the virtual thread's state. So why can't we just >> call make_walkable() before we freeze, to fix things up as if C2 had stored >> last_Java_pc to the anchor? Then freeze could assert that the thread is >> already walkable. I'm surprised it doesn't already. > > The issue is not when we make the frame walkable but how. The way it > currently works is by pushing the last_Java_pc to the stack in the runtime > stub before making the call to the VM (plus an alignment word). So to make > the frame walkable we do last_Java_sp[-1] in the VM. But this approach > creates a mismatch between the recorded cb->frame_size() (which starts from > last_Java_sp) vs the physical size of the frame which starts with rsp right > before the call. This is what the c2 runtime stub code for aarch64 looks like: > > > 0xffffdfdba584: sub sp, sp, #0x10 > 0xffffdfdba588: stp x29, x30, [sp] > 0xffffdfdba58c: ldrb w8, [x28, #1192] > 0xffffdfdba590: cbz x8, 0xffffdfdba5a8 > 0xffffdfdba594: mov x8, #0x4ba0 > 0xffffdfdba598: movk x8, #0xf6a8, lsl #16 > 0xffffdfdba59c: movk x8, #0xffff, lsl #32 > 0xffffdfdba5a0: mov x0, x28 > 0xffffdfdba5a4: blr x8 > 0xffffdfdba5a8: mov x9, sp > 0xffffdfdba5ac: str x9, [x28, #1000] <------- store > last_Java_sp > 0xffffdfdba5b0: mov x0, x1 > 0xffffdfdba5b4: mov x1, x2 > 0xffffdfdba5b8: mov x2, x28 > 0xffffdfdba5bc: adr x9, 0xffffdfdba5d4 > 0xffffdfdba5c0: mov x8, #0xe6a4 > 0xffffdfdba5c4: movk x8, #0xf717, lsl #16 > 0xffffdfdba5c8: movk x8, #0xffff, lsl #32 > 0xffffdfdba5cc: stp xzr, x9, [sp, #-16]! <------- Push two extra > words > 0xffffdfdba5d0: blr x8 > 0xffffdfdba5d4: nop > 0xffffdfdba5d8: movk xzr, #0x0 > 0xffffdfdba5dc: movk xzr, #0x0 > 0xffffdfdba5e0: add sp, sp, #0x10 <------- Remove two extra > words > 0xffffdfdba5e4: str xzr, [x28, #1000] > 0xffffdfdba5e8: str xzr, [x28, #1008] > 0xffffdfdba5ec: ldr x10, [x28, #8] > 0xffffdfdba5f0: cbnz x10, 0xffffdfdba600 > 0xffffdfdba5f4: ldp x29, x30, [sp] > 0xffffdfdba5f8: add sp, sp, #0x10 > 0xffffdfdba5fc: ret > 0xffffdfdba600: ldp x29, x30, [sp] > 0xffffdfdba604: add sp, sp, #0x10 > 0xffffdfdba608: adrp x8, 0xffffdfc30000 > 0xffffdfdba60c: add x8, x8, #0x80 > 0xffffdfdba610: br x8 OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? I also think we should fix the aarch64 c2 stub to just store last_Java_pc like you suggest. Adjusting the stack like this has in the past caused other problems, in particular making it hard to obtain safe stack traces during asynchronous profiling. It's still unclear to me exactly how we resume after preemption. It looks like we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it needs to be adjusted. If that's the case, an alternative simplification for aarch64 is to set a different last_Java_pc that is preemption-friendly that skips the stack adjustment. In your example, last_Java_pc would be set to 0xffffdfdba5e4. I think it is a reasonable requirement that preemption can return to last_Java_pc/last_Java_sp without adjustments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823701666