[Lldb-commits] [PATCH] D77047: AArch64 SVE register infos and core file support

Pavel Labath via Phabricator via lldb-commits Wed, 15 Jul 2020 05:39:21 -0700

labath added a comment.

At this point, I think I (finally) have a good understanding of both how this 
patch works and interacts with the rest of the world. I have one more batch of 
comments, but hopefully none are too controversial, and I really do hope this 
is the last iteration.




================
Comment at: lldb/source/Plugins/Process/Utility/RegisterInfoPOSIX_arm64.cpp:289
+
+    lldb_private::RegisterInfo *new_reg_info_p = reg_info_ref.data();
+
----------------
I think all uses of `new_reg_info_p` could just be replaced by `reg_info_ref`


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:63-66
+    m_sve_note_payload.resize(m_sveregset.GetByteSize());
+    ::memcpy(GetSVEBuffer(), m_sveregset.GetDataStart(),
+             m_sveregset.GetByteSize());
+    ::memcpy(&m_sve_header, m_sveregset.GetDataStart(), sizeof(m_sve_header));
----------------
What's up with this copying? We already have the data in the `m_sveregset` 
DataExtractor. What's the reason for copying it into the `m_sve_note_payload` 
vector? Also, `m_sve_header` seems like it could just be a `reinterpret_cast`ed 
pointer into that buffer instead of a copy. (Maybe not even a pointer, just a 
utility function which performs the cast when called).

Actually, when I think about casts and data extractors, I am reminded of 
endianness. This will access those fields using host endianness, which is most 
likely not what we want to do. So, probably the best/simplest solution would be 
to indeed declare a `user_sve_header` struct, but don't populate it via memcpy, 
but rather via the appropriate DataExtractor extraction methods. Since the only 
field of the user_sve_header used outside this function is the `vl` field, 
perhaps the struct could be a local variable and only the vector length would 
be persisted. That would be consistent with how the `flags` field is decoded 
and stored into the `m_sve_state` field.

(If the struct fields are always little-endian (if that's true, I thought arm 
has BE variants) then you could also stick to the `reinterpret_cast` idea, but 
change the types of the struct fields to `llvm::support::ulittleXX_t` to read 
them as LE independently of the host.)


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:84-86
+  const uint32_t reg = reg_info->kinds[lldb::eRegisterKindLLDB];
+  if (reg == LLDB_INVALID_REGNUM)
+    return false;
----------------
This is already checked by ReadRegister and the false -> uint32_t conversion is 
dodgy. An assertion would completely suffice here.


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:140
+      offset -= GetGPRSize();
+      if (IsFPR(reg) && offset < m_fpregset.GetByteSize()) {
+        value.SetFromMemoryData(reg_info, m_fpregset.GetDataStart() + offset,
----------------
This `IsFPR` check is now redundant.


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:152-153
+        sve_reg_num = reg_info->value_regs[0];
+      else if (reg == GetRegNumFPCR() || reg == GetRegNumFPSR())
+        sve_reg_num = reg;
+      if (sve_reg_num != LLDB_INVALID_REGNUM) {
----------------
These two registers are special-cased both here and in the 
`CalculateSVEOffset`. Given that both functions also do a switch over the sve 
"states", it makes following the code quite challenging. What if we moved the 
handling of these registers completely up-front, and removed their handling 
from `CalculateSVEOffset` completely?
I'm thinking of something like:
```
if (reg == GetRegNumFPCR() && m_sve_state != SVEState::Disabled) {
  src = GetSVEBuffer() + GetFPCROffset(); // Or maybe just inline GetFPCROffset 
here
} else if (reg == GetRegNumFPSR() && m_sve_state != SVEState::Disabled) {
  src = ...;
} if (IsFPR(reg)) {
  // as usual, only you can assume that FP[CS]R are already handled if SVE is 
enabled
}
```


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:161-162
+  } else if (IsSVEVG(reg)) {
+    sve_vg = GetSVERegVG();
+    src = (uint8_t *)&sve_vg;
+  } else if (IsSVE(reg)) {
----------------
This also looks endian-incorrect. Maybe `value = GetSVERegVG(); return true;` ?


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:164
+  } else if (IsSVE(reg)) {
+    if (m_sve_state != SVEState::Disabled) {
+      if (m_sve_state == SVEState::FPSIMD) {
----------------
A switch over possible `m_sve_state` values would likely be cleaner.


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:174
+          ::memcpy(sve_reg_non_live.data(),
+                   (const uint8_t *)GetSVEBuffer() + offset, 16);
+        }
----------------
I guess it would be better to do the cast inside GetSVEBuffer. (and please make 
that a c++ reinterpret_cast).


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:180
+        assert(offset < m_sveregset.GetByteSize());
+        src = (uint8_t *)GetSVEBuffer() + offset;
+      }
----------------
it seems that `src` should be a const pointer.


================
Comment at: 
lldb/source/Plugins/Process/elf-core/RegisterContextPOSIXCore_arm64.cpp:82-106
+uint32_t
+RegisterContextCorePOSIX_arm64::CalculateSVEOffset(uint32_t reg_num) const {
+  uint32_t sve_offset = 0;
+  if (m_sve_state == SVE_STATE::SVE_STATE_FPSIMD) {
+    if (IsSVEZ(reg_num))
+      sve_offset = (reg_num - GetRegNumSVEZ0()) * 16;
+    else if (reg_num == GetRegNumFPSR())
----------------
omjavaid wrote:
> labath wrote:
> > omjavaid wrote:
> > > labath wrote:
> > > > omjavaid wrote:
> > > > > labath wrote:
> > > > > > omjavaid wrote:
> > > > > > > labath wrote:
> > > > > > > > I'm confused by this function. If I understant the SVE_PT 
> > > > > > > > macros and the logic in 
> > > > > > > > `RegisterInfoPOSIX_arm64::ConfigureVectorRegisterInfos` 
> > > > > > > > correctly, then they both seem to encode the same information. 
> > > > > > > > And it seems to me that this function should just be 
> > > > > > > > `reg_infos[reg_num].offset - some_constant`, which is the same 
> > > > > > > > thing that we're doing for the arm FP registers when SVE is 
> > > > > > > > disabled, and also for most other architectures too.
> > > > > > > > 
> > > > > > > > Why is that not the case? Am I missing something? If they are 
> > > > > > > > not encoding the same thing, could they be made to encode the 
> > > > > > > > same thing?
> > > > > > > This function calculates offset of a particular register in core 
> > > > > > > note data. SVE data in core dump is similar to what PTrace emits 
> > > > > > > and offsets into this data is not linear. SVE macros are used to 
> > > > > > > access those offsets based on register numbers and currently 
> > > > > > > selected vector length.
> > > > > > > Also for the purpose of ease we have linear offsets in SVE 
> > > > > > > register infos and it helps us simplify register data once it 
> > > > > > > makes way to GDBRemoteRegisterContext on the client side.
> > > > > > Could you give an example of the non-linearity of the core dump 
> > > > > > data? (Some registers, and their respective core file and 
> > > > > > gdb-remote offsets)
> > > > > In case of core file we create a buffer m_sveregset which stores SVE 
> > > > > core note information
> > > > > m_sveregset =
> > > > >       getRegset(notes, 
> > > > > m_register_info_up->GetTargetArchitecture().GetTriple(),
> > > > >                 AARCH64_SVE_Desc);
> > > > > 
> > > > > At this point we do not know what was the vector length and at what 
> > > > > offsets in the data our registers are located. We read top bytes of 
> > > > > size use_sve_header and read vector length. Based on this information 
> > > > > we configure vector length in Register infos. While the SVE payload 
> > > > > starts with user_sve_header then there are some allignment bytes 
> > > > > followed by vector length based Z registers followed by P and FFR, 
> > > > > then there are some more allginment bytes followd by FPCR and FPSR. 
> > > > > Macros provided by Linux help us jump to the desired offset by 
> > > > > providing register number and vq into the core note or Ptrace payload.
> > > > > 
> > > > > In case of client side storage we store GPRs at linear offset 
> > > > > followed by Vector Granule register. Then there are SVE registers Z, 
> > > > > P, FFR, FPSR and FPCR. Offsets of V, D and S registers in FPR regset 
> > > > > overlap with corresponding first bytes of Z registers and will be 
> > > > > same as corresponding Z register. While both FP/SVE FPSR share same 
> > > > > register offset, size and register number.
> > > > > 
> > > > > Here is an excerpt from 
> > > > > https://github.com/torvalds/linux/blob/master/Documentation/arm64/sve.rst
> > > > > 
> > > > > SVE_PT_REGS_FPSIMD
> > > > > SVE registers are not live (GETREGSET) or are to be made non-live 
> > > > > (SETREGSET).
> > > > > The payload is of type struct user_fpsimd_state, with the same 
> > > > > meaning as for NT_PRFPREG, starting at offset SVE_PT_FPSIMD_OFFSET 
> > > > > from the start of user_sve_header.
> > > > > Extra data might be appended in the future: the size of the payload 
> > > > > should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
> > > > > vq should be obtained using sve_vq_from_vl(vl).
> > > > > 
> > > > > or
> > > > > 
> > > > > SVE_PT_REGS_SVE
> > > > > SVE registers are live (GETREGSET) or are to be made live (SETREGSET).
> > > > > The payload contains the SVE register data, starting at offset 
> > > > > SVE_PT_SVE_OFFSET from the start of user_sve_header, and with size 
> > > > > SVE_PT_SVE_SIZE(vq, flags);
> > > > Given this
> > > > > SVE payload starts with ... followed by vector length based Z 
> > > > > registers followed by P and FFR,
> > > > and this
> > > > > In case of client side storage we store GPRs ... Then there are SVE 
> > > > > registers Z, P, FFR, FPSR and FPCR
> > > > I would expect that for each of the Z, P and FFR registers, the 
> > > > expression `offset_in_core(reg) - offset_in_gdb_remote(reg)` is always 
> > > > the same constant (and is basically equal to 
> > > > `SVE_PT_SVE_ZREG_OFFSET(vq, 0) - reg_info[Z0].byte_offset`). So we 
> > > > could just add/subtract that constant to the gdb-remote byte_offset 
> > > > field instead of painstakingly decomposing the register number only for 
> > > > the linux macros to reconstruct it back again. Is that not so?
> > > The standard never talks about Z, P and FFR being contagious that is what 
> > > I learnt by reading macros. There standard states this:
> > > 
> > > If the registers are present, the remainder of the record has a 
> > > vl-dependent size and layout. Macros SVE_SIG_* are defined [1] to 
> > > facilitate access to the members.
> > > Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant 
> > > layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from 
> > > the start of the register's representation in memory.
> > > If the SVE context is too big to fit in sigcontext.__reserved[], then 
> > > extra space is allocated on the stack, an extra_context record is written 
> > > in __reserved[] referencing this space. sve_context is then written in 
> > > the extra space. Refer to [1] for further details about this mechanism.
> > > 
> > > I understand what you are talking about but given the macros were 
> > > specifically provided and above line about register record was vague and 
> > > I thought best is to follow the macros for offset calculation although 
> > > other way around is simpler but may be slightly unreliable.
> > > 
> > > I suggest to keep this as it is unless there is strong reason apart from 
> > > slight performance penalty. This resembles with GDB implementation which 
> > > was done by ARM and I am only following that as reference. May be we can 
> > > revise this in future when the feature becomes more mainstream. 
> > > 
> > I'm not worried about the performance penalty (most of these abstractions 
> > can be optimized away anyway). I'm more worried about the maintainability 
> > penalty incurred by a non-standard and fairly complicated solution.
> > 
> > The way I see it, it doesn't really matter how exactly the specification 
> > describes these things. Having the macros is nice, but it's not their names 
> > that become a part of the ABI -- their implementation does. So they (linux, 
> > arm, whoever) cannot change the layout of the these registers without 
> > breaking all existing applications (and core files) any more than they can 
> > change the layout of the general purpose registers (which we also access by 
> > offset, without any fancy macros). In fact, even if they did change the 
> > layout, given that we have a copy of these headers, we wouldn't 
> > automatically inherit those changes anyway, but would have to take some 
> > manual action. And since we'd probably want to maintain some sort of 
> > backwards compatibility, we couldn't just replace the new macro 
> > definitions, but would have to do some sort of versioning (just today I got 
> > reminded that lldb contains versioned layouts of various internal structs 
> > used by the ObjC runtime).
> > 
> > So, for that reason I am more concerned about consistency with other lldb 
> > register contexts than I am with consistency with gdb. 
> > 
> > Also, I bet gdb doesn't have an equivalent of our RegisterInfo struct. 
> > Without that, using these macros would be obviously better, but given that 
> > we have that, and we already use it to access other registers, ditching 
> > that for macros does not seem like a win to me.
> > 
> > Another way to look at this would be to compare this to e.g. ELF reading 
> > code in llvm. We cannot just include <linux/elf.h> as we want this to be 
> > cross-platform. However, we also don't just copy the header blindly into 
> > our code base. We don't depart from it needlessly, but we do make sure that 
> > it plays well with the rest of llvm -- `#defines` are changed to enums, we 
> > add templates to select between 64/32 bit versions, change some 
> > platform-specific names so that different platforms may co-exist, etc.
> I understand your point here and by referring to Linux, Arm or GDB 
> implementation I only wanted to make sure that we are implementing the 
> architecture and platform specific parts correctly. The choice of using 
> Linux/SVE specific macros in elf-core context was not a blind decision. I am 
> not advocating for following a particular implementations and I would not 
> have pulled in these Linux specific macros into elf-core implementation if 
> the SVE core note layout was independent of Linux platform specifics like 
> sig_context and user_sve_header. Offset to the start of register data in 
> NT_ARM_SVE is also calculated after doing alignment correction based on VQ 
> and size of sig_context. Given that SVE is currently supported by Linux only, 
> NT_ARM_SVE is Linux only core note and core dump generated makes use of these 
> macro definitions I had no choice but to include them for extraction of data 
> from NT_ARM_SVE section. If I had not followed these macros I still would 
> have written something similar my self for register data extraction from 
> NT_ARM_SVE elf core note.
> 
> I am going to post another update where I ll try to minimize use of these 
> macros in offset calculation.
Thanks for your patience. I think this looks better. The more I understand how 
these work, the more I realize that will need fairly special handling..


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77047/new/

https://reviews.llvm.org/D77047



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D77047: AArch64 SVE register infos and core file support

Reply via email to