Re: [Lldb-commits] Instruction emulation of arm64 'stp d8, d9, [sp, #-0x70]!' style instruction

Jason Molenda via lldb-commits Wed, 12 Oct 2016 18:38:22 -0700

Yeah, it's incorrectly grabbing the stack pointer reg number (31) from the Rn 
bits and using that as the register # being saved, instead of the Rt and Rt2 
register numbers, and saying that v31 is being pushed twice.  It's an easy bug 
in EmulateInstructionARM64::EmulateLDPSTP but fixing that just presents us with 
the next problem so I haven't fixed it.


When I connect to debugserver on an arm64 device, it tells lldb about the 
pseudo regs on the device, like

  <reg name="x0" regnum="0" offset="0" bitsize="64" group="general" 
altname="arg1" group_id="1" ehframe_regnum="0" dwarf_regnum="0" generic="arg1" 
invalidate_regnums="0,34"/>

  <reg name="w0" regnum="34" offset="0" bitsize="32" group="general" 
group_id="1" value_regnums="0" invalidate_regnums="0,34"/>

  <reg name="v0" regnum="63" offset="268" bitsize="128" group="vector" 
type="float" altname="q0" encoding="vector" format="vector-uint8" group_id="2" 
dwarf_regnum="64" invalidate_regnums="63,129,97"/>

  <reg name="s0" regnum="97" offset="268" bitsize="32" group="float" 
type="float" encoding="ieee754" format="float" group_id="2" value_regnums="63" 
invalidate_regnums="63,129,97"/>

  <reg name="d0" regnum="129" offset="268" bitsize="64" group="float" 
type="float" encoding="ieee754" format="float" group_id="2" value_regnums="63" 
invalidate_regnums="63,129,97"/>


(this is the reply to the "qXfer:features:read:target.xml" packet that lldb 
sends to debugserver at the startup of the debug session) so the user can refer 
to these by name:

(lldb) reg read -f x s0
      s0 = 0x404e809d
(lldb) reg read -f x d0
      d0 = 0x41bdaf19404e809d
(lldb) reg read -f x v0
      v0 = 0x000000000000000041bdaf19404e809d
(lldb) 


J

> On Oct 12, 2016, at 6:23 PM, Tamas Berghammer <tbergham...@google.com> wrote:
> 
> I am a bit surprised that we don't define the smaller floating point 
> registers on AArch64 the same way we do it on x86/x86_64/Arm (have no idea 
> about MIPS) and I would consider this as a bug what we should fix (will gave 
> it a try when I have a few free cycles) because currently it is very 
> difficult to get a 64bit double value out of the data we are displaying.
> 
> Considering the fact that based on the ABI only the bottom 64bit have to be 
> saved I start to like the idea of teaching our unwinder about the smaller 
> floating point registers because that will mean that we can inspect float and 
> double variables from the parent stack frame while we never show some 
> corrupted data for variables occupying the full 128 bit vector register.
> 
> "...the unwind plan generated currently records this as v31 being saved...": 
> I don't understand this part. The code never references v31 nor d31 so if the 
> unwind plan records it as we saved v31 then something is clearly wrong there. 
> Does it treat all d0-d31 register as v31 because of some instruction decoding 
> reason?
> 
> Tamas
> 
> On Wed, Oct 12, 2016 at 4:30 PM Jason Molenda <jmole...@apple.com> wrote:
> Hi Tamas, sorry for that last email being a mess, I was doing something else 
> while writing it and only saw how unclear it was after I sent it.  You 
> understood everything I was trying to say.
> 
> I looked at the AAPCS64 again.  It says v8-v15 are callee saved, but says, 
> "Additionally, only the bottom 64-bits of each value stored in v8-v15 need to 
> be preserved[1]; it is the responsibility of the caller to preserve larger 
> values." so we're never going to have a full 128-bit value that we can get 
> off the stack (for targets following this ABI).
> 
> DWARF can get away with only defining register numbers for v0-v31 because 
> their use in DWARF always includes a byte size.  Having lldb register numbers 
> and using that numbering scheme instead of DWARF is an interesting one.  It 
> looks like all of the arm64 targets are using the definitions from 
> Plugins/Process/Utility/RegisterInfos_arm64.h.  It also only defines v0-v31 
> but the x86_64 RegisterInfos file defines pseudo registers ("eax") which are 
> a subset of real registers ("rax") - maybe we could do something like that.
> 
> I'm looking at this right now, but fwiw I wrote a small C program that uses 
> enough fp registers that the compiler will spill all of the preserved 
> registers (d8-d15) when I compile it with clang and -Os.  I'll use this for 
> testing
> 
> 
> 
> I get a prologue like
> 
> a.out`foo:
> a.out[0x100007a08] <+0>:   0x6dba3bef   stp    d15, d14, [sp, #-96]!
> a.out[0x100007a0c] <+4>:   0x6d0133ed   stp    d13, d12, [sp, #16]
> a.out[0x100007a10] <+8>:   0x6d022beb   stp    d11, d10, [sp, #32]
> a.out[0x100007a14] <+12>:  0x6d0323e9   stp    d9, d8, [sp, #48]
> a.out[0x100007a18] <+16>:  0xa9046ffc   stp    x28, x27, [sp, #64]
> a.out[0x100007a1c] <+20>:  0xa9057bfd   stp    x29, x30, [sp, #80]
> a.out[0x100007a20] <+24>:  0x910143fd   add    x29, sp, #80              ; =80
> 
> 
> and the unwind plan generated currently records this as v31 being saved in 
> the first instruction and ignores the others (because that reg has already 
> been saved).  That's the easy bug - but it's a good test case that I'll strip 
> down into a unit test once we get this figured out.
> 
> 
> J
> 
> 
> > On Oct 12, 2016, at 2:15 PM, Tamas Berghammer <tbergham...@google.com> 
> > wrote:
> >
> > Hi Jason,
> >
> > Thank you for adding unit test for this code. I think the current 
> > implementation doesn't fail terribly on 16 vs 32 byte stack alignment 
> > because we use the "opc" from the instruction to calculate the write back 
> > address (to adjust the SP) so having the wrong size of the register won't 
> > effect that part.
> >
> > Regarding the issue about s0/d0/v0 I don't have a perfect solution but here 
> > are some ideas:
> > * Use a different register numbering schema then DWARF (e.g. LLDB/GDB/???). 
> > I think it would only require us to change 
> > EmulateInstructionARM64::CreateFunctionEntryUnwind to create an UnwindPlan 
> > with a different register numbering schema and then to lookup the register 
> > based on a different numbering schema using GetRegisterInfo in functions 
> > where we want to reference s0-s31/d0-d31. In theory this should be a simple 
> > change but I won't be surprised if changing the register numbering breaks 
> > something.
> > * Introduce the concept of register pieces. This concept already exists in 
> > the DWARF expressions where you can say that a given part of the register 
> > is saved at a given location. I expect that doing it won't be trivial but 
> > it would solve both this problem and would improve the way we understand 
> > DWARF as well.
> >
> > About your idea about saying that we saved v8&v9 even though we only saved 
> > d8&d9: I think it is a good quick and dirty solution but it has a few 
> > issues what is hard to solve. Most importantly we will lie to the user when 
> > they read out v8&v9 what will contain some garbage data. Regarding 
> > big/little endian we should be able to detect which one the inferior uses 
> > and we can do different thing based on that (decide the location of v8&v9) 
> > but it will make the hack even worth.
> >
> > Tamas
> >
> > On Tue, Oct 11, 2016 at 6:15 PM Jason Molenda <jmole...@apple.com> wrote:
> > Hi Tamas, I'm writing some unit tests for the unwind source generators - 
> > x86 last week, arm64 this week, and I noticed with this prologue:
> >
> >
> >
> > JavaScriptCore`JSC::B3::reduceDoubleToFloat:
> >
> >     0x192b45c0c <+0>:  0x6db923e9   stp    d9, d8, [sp, #-0x70]!
> >
> >     0x192b45c10 <+4>:  0xa9016ffc   stp    x28, x27, [sp, #0x10]
> >
> >     0x192b45c14 <+8>:  0xa90267fa   stp    x26, x25, [sp, #0x20]
> >
> >     0x192b45c18 <+12>: 0xa9035ff8   stp    x24, x23, [sp, #0x30]
> >
> >     0x192b45c1c <+16>: 0xa90457f6   stp    x22, x21, [sp, #0x40]
> >
> >     0x192b45c20 <+20>: 0xa9054ff4   stp    x20, x19, [sp, #0x50]
> >
> >     0x192b45c24 <+24>: 0xa9067bfd   stp    x29, x30, [sp, #0x60]
> >
> >     0x192b45c28 <+28>: 0x910183fd   add    x29, sp, #0x60            ; =0x60
> >
> >     0x192b45c2c <+32>: 0xd10a83ff   sub    sp, sp, #0x2a0            ; 
> > =0x2a0
> >
> >
> >
> >
> >
> >
> >
> > EmulateInstructionARM64::EmulateLDPSTP interprets this as a save of v31.  
> > The use of reg 31 is an easy bug, the arm manual C7.2.284 ("STP (SIMD&FP)") 
> > gives us an "opc" (0b00 == 32-bit registers, 0b01 == 64-bit registers, 0b10 
> > == 128-bit registers), an immediate value, and three registers (Rt2, Rn, 
> > Rt).  In the above example, these work out to Rt2 == 8 (d8), Rn == 31 
> > ("sp"), Rt == 9 (d9).  The unwinder is incorrectly saying v31 right now 
> > because it's using Rn -
> >
> >
> >
> >   if (vector) {
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n, 
> > reg_info_Rt))
> >
> >       return false;
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + n, 
> > reg_info_Rt2))
> >
> >       return false;
> >
> >   }
> >
> >
> >
> > This would normally take up 32 bytes of stack space and cause big problems, 
> > but because we're writing the same reg twice, I think we luck out and only 
> > take 16 bytes of the stack.
> >
> >
> >
> > We don't have dwarf register numbers for s0..31, d0..31, so we can't track 
> > this instruction's behavior 100% correctly but maybe if we said that
> >
> >
> >
> > That would be an easy fix, like
> >
> >
> >
> >   if (vector) {
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t, 
> > reg_info_Rt))
> >
> >       return false;
> >
> >     if (!GetRegisterInfo(eRegisterKindDWARF, arm64_dwarf::v0 + t2, 
> > reg_info_Rt2))
> >
> >       return false;
> >
> >   }
> >
> >
> >
> >
> >
> > We don't have dwarf register numbers for s0..31, d0..31, so I don't think 
> > we can correctly track this instruction's actions today.  Maybe we should 
> > put a save of v8 at CFA-112 and a save of v9 at CFA-104.  As long as the 
> > target is operating in little endian mode, when we go to get the contents 
> > of v8/v9 we're only actually USING the lower 64 bits so it'll work out, 
> > right?  I think I have that right.  We'll be reading garbage in the upper 
> > 64 bits - the register reading code won't have any knowledge of the fact 
> > that we only have the lower 32/64 bits available to us.
> >
> >
> >
> >
> >
> > Throwing the problem out there, would like to hear what you think.  I don't 
> > want to encode buggy behavior in a unit test ;) so I'd like it for us to 
> > think about what correct behavior would be, and do that before I write the 
> > test.
> 

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Re: [Lldb-commits] Instruction emulation of arm64 'stp d8, d9, [sp, #-0x70]!' style instruction

Reply via email to