On Tue, 29 Oct 2024 19:01:03 GMT, Patricio Chilano Mateo 
<pchilanom...@openjdk.org> wrote:

>>> One way to get rid of this would be to have c2 just set last_Java_pc too 
>>> along with last_Java_sp, so we don't need to push lr to be able to do 
>>> last_Java_sp[-1] to make the frame walkable.
>> 
>> If that would solve the problem, then that must mean we save/freeze 
>> last_Java_pc as part of the virtual thread's state.  So why can't we just 
>> call make_walkable() before we freeze, to fix things up as if C2 had stored 
>> last_Java_pc to the anchor?  Then freeze could assert that the thread is 
>> already walkable.  I'm surprised it doesn't already.
>
> The issue is not when we make the frame walkable but how. The way it 
> currently works is by pushing the last_Java_pc to the stack in the runtime 
> stub before making the call to the VM (plus an alignment word). So to make 
> the frame walkable we do last_Java_sp[-1] in the VM. But this approach 
> creates a mismatch between the recorded cb->frame_size() (which starts from 
> last_Java_sp) vs the physical size of the frame which starts with rsp right 
> before the call. This is what the c2 runtime stub code for aarch64 looks like:
> 
> 
>    0xffffdfdba584:    sub     sp, sp, #0x10
>    0xffffdfdba588:    stp     x29, x30, [sp]
>    0xffffdfdba58c:    ldrb    w8, [x28, #1192]
>    0xffffdfdba590:    cbz     x8, 0xffffdfdba5a8
>    0xffffdfdba594:    mov     x8, #0x4ba0                     
>    0xffffdfdba598:    movk    x8, #0xf6a8, lsl #16
>    0xffffdfdba59c:    movk    x8, #0xffff, lsl #32
>    0xffffdfdba5a0:    mov     x0, x28
>    0xffffdfdba5a4:    blr     x8
>    0xffffdfdba5a8:    mov     x9, sp
>    0xffffdfdba5ac:    str     x9, [x28, #1000]       <------- store 
> last_Java_sp
>    0xffffdfdba5b0:    mov     x0, x1
>    0xffffdfdba5b4:    mov     x1, x2
>    0xffffdfdba5b8:    mov     x2, x28
>    0xffffdfdba5bc:    adr     x9, 0xffffdfdba5d4
>    0xffffdfdba5c0:    mov     x8, #0xe6a4                     
>    0xffffdfdba5c4:    movk    x8, #0xf717, lsl #16
>    0xffffdfdba5c8:    movk    x8, #0xffff, lsl #32
>    0xffffdfdba5cc:    stp     xzr, x9, [sp, #-16]!   <------- Push two extra 
> words
>    0xffffdfdba5d0:    blr     x8
>    0xffffdfdba5d4:    nop
>    0xffffdfdba5d8:    movk    xzr, #0x0
>    0xffffdfdba5dc:    movk    xzr, #0x0
>    0xffffdfdba5e0:    add     sp, sp, #0x10      <------- Remove two extra 
> words
>    0xffffdfdba5e4:    str     xzr, [x28, #1000]
>    0xffffdfdba5e8:    str     xzr, [x28, #1008]
>    0xffffdfdba5ec:    ldr     x10, [x28, #8]
>    0xffffdfdba5f0:    cbnz    x10, 0xffffdfdba600
>    0xffffdfdba5f4:    ldp     x29, x30, [sp]
>    0xffffdfdba5f8:    add     sp, sp, #0x10
>    0xffffdfdba5fc:    ret
>    0xffffdfdba600:    ldp     x29, x30, [sp]
>    0xffffdfdba604:    add     sp, sp, #0x10
>    0xffffdfdba608:    adrp    x8, 0xffffdfc30000
>    0xffffdfdba60c:    add     x8, x8, #0x80
>    0xffffdfdba610:    br      x8

OK, so you're saying it's the stack adjustment that's the problem.  It sounds 
like there is code that is using rsp instead of last_Java_sp to compute the 
frame boundary.  Isn't that a bug that should be fixed?  I also think we should 
fix the aarch64 c2 stub to just store last_Java_pc like you suggest.  Adjusting 
the stack like this has in the past caused other problems, in particular making 
it hard to obtain safe stack traces during asynchronous profiling.

It's still unclear to me exactly how we resume after preemption.  It looks like 
we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it 
needs to be adjusted.  If that's the case, an alternative simplification for 
aarch64 is to set a different last_Java_pc that is preemption-friendly that 
skips the stack adjustment.  In your example, last_Java_pc would be set to 
0xffffdfdba5e4.  I think it is a reasonable requirement that preemption can 
return to last_Java_pc/last_Java_sp without adjustments.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823701666

Reply via email to