jasonmolenda created this revision. jasonmolenda added a reviewer: JDevlieghere. jasonmolenda added a project: LLDB. Herald added a subscriber: kristof.beyls. Herald added a project: All. jasonmolenda requested review of this revision. Herald added a subscriber: lldb-commits.
I am adding MASK (power of 2, with that alignment) watchpoint support to debugserver AArch64, where it currently only does Byte Address Select (BAS, 1-8 bytes aligned to doubleword) watchpoints. This patch generalizes debugserver's watchpoint setting code so that it will be simple to drop in MASK watchpoints for larger memory regions in a later patch. debugserver supports an unusual feature where a request to watch an unaligned range of bytes is handled. If you have a 16 byte buffer at 0x1000 and ask to watch 4 bytes at 0x1006, lldb needs to watch 0x1006-0x1009 inclusive. A single BAS watchpoint can only watch either 0x1000-0x1007 or 0x1008-0x100f. Debugserver splits this unaligned watch request into two BAS watchpoints and uses two hardware registers to do it. Most of this patch is generalizing that "align and find the correct place to split the watch request" code to work for any size watch request byte range, not just 8-byte watch requests. I spent a bit of time to check that it was behaving correctly for a lot of BAS+MASK scenarios, e.g. watch 32 bytes at 0x1038 means an 8-byte BAS watchpoint for 0x1038-0x103f and a 32-byte MASK watchpoint from 0x1040-0x105f, ideally ignoring writes to the last 8 bytes which the user didn't ask to watch. This is also part of the processing I'll need to do for the WatchpointLocations I proposed a bit ago ( https://discourse.llvm.org/t/aarch64-watchpoints-reported-address-outside-watched-range-adopting-mask-style-watchpoints/67660/6 ). If a target/stub only supports 8-byte watchpoints, and a user asks to watch a 32-byte object, lldb should use 4 watchpoint registers (if it can) to watch that object. Or a single MASK watchpoint register. Or two hardware registers if we have MASK+BAS to watch an unaligned memory range. This is only doing the simple "find correct alignment, split memory request to aligned ranges", it doesn't try to use n watchpoints to watch a large object, only 2 for unaligned requests. It's a starting point. debugserver internally has a "HiLo" table to track these joined hardware registers corresponding to one user requested watchpoint. There's no feedback given to lldb that this is how the object was watched; the Z packet doesn't have any way of communicating it. I added a test for this specific debugserver feature where I watch 4 bytes unaligned, then detect that the watchpoints are triggered as loops in the binary run. It's not a fancy test, but we didn't have anything for this previously so it's a simple start. I expect this patch to result in no observable change in behavior to debugserver. It is laying the groundwork for adding MASK watchpoints, which will be a little patch on top of it. lldb itself is in no position to deal with larger watchpoints (and the false positives we can get when we watch a region larger than the user's requested one). I also don't try to solve a user requesting multiple watchpoints that all overlap in a single hardware register. e.g. user asks to watch 0x1000-0x1001 and 0x1005-0x1007 inclusive. This can be represented by a single BAS watchpoint in AArch64 watching noncontiguous byte ranges within that doubleword, but debugserver isn't set up to combine a new watchpoint request with existing watchpoints that already cover part of that range. Another thing I might need to think about when we start using power-of-2 MASK watchpoints. Functionally, it's taking EnableHardwareWatchpoint, separating out the "align and split" logic to its own method, then once we have valid watchpoint specification(s), we call out to the new EnableBASWatchpoint to set the correct bits in the control register. The next patch will be to add an EnableMASKWatchpoint for larger requests. Repository: rG LLVM Github Monorepo https://reviews.llvm.org/D149040 Files: lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/Makefile lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/TestUnalignedSpanningDwords.py lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/main.c lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h
Index: lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h =================================================================== --- lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h +++ lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.h @@ -13,6 +13,7 @@ #include <mach/thread_status.h> #include <map> +#include <vector> #if defined(ARM_THREAD_STATE64_COUNT) @@ -35,6 +36,13 @@ memset(&m_dbg_save, 0, sizeof(m_dbg_save)); } + struct watchpoint_spec { + nub_addr_t aligned_start; + nub_addr_t user_specified_start; + nub_size_t aligned_size; + nub_size_t user_specified_size; + }; + virtual ~DNBArchMachARM64() {} static void Initialize(); @@ -71,8 +79,13 @@ bool also_set_on_task) override; bool DisableHardwareBreakpoint(uint32_t hw_break_index, bool also_set_on_task) override; + std::vector<watchpoint_spec> AlignRequestedWatchpoint(nub_addr_t user_addr, + nub_size_t user_size); uint32_t EnableHardwareWatchpoint(nub_addr_t addr, nub_size_t size, bool read, bool write, bool also_set_on_task) override; + uint32_t SetBASWatchpoint(watchpoint_spec wp, bool read, bool write, + bool also_set_on_task); + uint32_t SetMASKWatchpoint(watchpoint_spec wp); bool DisableHardwareWatchpoint(uint32_t hw_break_index, bool also_set_on_task) override; bool DisableHardwareWatchpoint_helper(uint32_t hw_break_index, Index: lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp =================================================================== --- lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp +++ lldb/tools/debugserver/source/MacOSX/arm64/DNBArchImplARM64.cpp @@ -830,6 +830,73 @@ return INVALID_NUB_HW_INDEX; } +std::vector<DNBArchMachARM64::watchpoint_spec> +DNBArchMachARM64::AlignRequestedWatchpoint(nub_addr_t user_addr, + nub_size_t user_size) { + + std::vector<watchpoint_spec> wps; + + // Can't watch zero bytes + if (user_size == 0) + return wps; + + // Smallest size we can watch on AArch64 is 8 bytes + const nub_size_t min_watchpoint_alignment = 8; + nub_size_t aligned_size = std::max(user_size, min_watchpoint_alignment); + + // AArch64 addresses are 8 bytes. + const int addr_byte_size = 8; + const int addr_bit_size = addr_byte_size * 8; + + /// Round up \a user_size to the nearest power-of-2 size, at least 8 bytes + // user_size == 3 -> aligned_size == 8 + // user_size == 8 -> aligned_size == 8 + // user_size == 13 -> aligned_size == 16 + // user_size == 16 -> aligned_size == 16 + aligned_size = 1ULL << (addr_bit_size - __builtin_clzll(aligned_size - 1)); + + nub_addr_t aligned_start = user_addr & ~(aligned_size - 1); + // Does this power-of-2 memory range, aligned to power-of-2, completely + // encompass the requested watch region. + if (aligned_start + aligned_size >= user_addr + user_size) { + watchpoint_spec wp; + wp.aligned_start = aligned_start; + wp.user_specified_start = user_addr; + wp.aligned_size = aligned_size; + wp.user_specified_size = user_size; + wps.push_back(wp); + return wps; + } + + // We need to split this into two watchpoints, split on the aligned_size + // boundary and re-evaluate the alignment of each half. + // + // user_addr 48 user_size 20 -> aligned_size 32 + // aligned_start 32 + // split_addr 64 + // first_user_addr 48 + // first_user_size 16 + // second_user_addr 64 + // second_user_size 4 + nub_addr_t split_addr = aligned_start + aligned_size; + + nub_addr_t first_user_addr = user_addr; + nub_size_t first_user_size = split_addr - user_addr; + nub_addr_t second_user_addr = split_addr; + nub_size_t second_user_size = user_size - first_user_size; + + std::vector<watchpoint_spec> first_wp = + AlignRequestedWatchpoint(first_user_addr, first_user_size); + std::vector<watchpoint_spec> second_wp = + AlignRequestedWatchpoint(second_user_addr, second_user_size); + if (first_wp.size() != 1 || second_wp.size() != 1) + return wps; + + wps.push_back(first_wp[0]); + wps.push_back(second_wp[0]); + return wps; +} + uint32_t DNBArchMachARM64::EnableHardwareWatchpoint(nub_addr_t addr, nub_size_t size, bool read, bool write, @@ -839,91 +906,58 @@ "0x%8.8llx, size = %zu, read = %u, write = %u)", (uint64_t)addr, size, read, write); - const uint32_t num_hw_watchpoints = NumSupportedHardwareWatchpoints(); + std::vector<DNBArchMachARM64::watchpoint_spec> wps = + AlignRequestedWatchpoint(addr, size); + DNBLogThreadedIf(LOG_WATCHPOINTS, + "DNBArchMachARM64::EnableHardwareWatchpoint() using %zu " + "hardware watchpoints", + wps.size()); - // Can't watch zero bytes - if (size == 0) + if (wps.size() == 0) return INVALID_NUB_HW_INDEX; // We must watch for either read or write if (read == false && write == false) return INVALID_NUB_HW_INDEX; - // Otherwise, can't watch more than 8 bytes per WVR/WCR pair - if (size > 8) - return INVALID_NUB_HW_INDEX; - - // Aarch64 watchpoints are in one of two forms: (1) 1-8 bytes, aligned to - // an 8 byte address, or (2) a power-of-two size region of memory; minimum - // 8 bytes, maximum 2GB; the starting address must be aligned to that power - // of two. - // - // For (1), 1-8 byte watchpoints, using the Byte Address Selector field in - // DBGWCR<n>.BAS. Any of the bytes may be watched, but if multiple bytes - // are watched, the bytes selected must be contiguous. The start address - // watched must be doubleword (8-byte) aligned; if the start address is - // word (4-byte) aligned, only 4 bytes can be watched. - // - // For (2), the MASK field in DBGWCR<n>.MASK is used. - // - // See the ARM ARM, section "Watchpoint exceptions", and more specifically, - // "Watchpoint data address comparisons". - // - // debugserver today only supports (1) - the Byte Address Selector 1-8 byte - // watchpoints that are 8-byte aligned. To support larger watchpoints, - // debugserver would need to interpret the mach exception when the watched - // region was hit, see if the address accessed lies within the subset - // of the power-of-two region that lldb asked us to watch (v. ARM ARM, - // "Determining the memory location that caused a Watchpoint exception"), - // and silently resume the inferior (disable watchpoint, stepi, re-enable - // watchpoint) if the address lies outside the region that lldb asked us - // to watch. - // - // Alternatively, lldb would need to be prepared for a larger region - // being watched than it requested, and silently resume the inferior if - // the accessed address is outside the region lldb wants to watch. - - nub_addr_t aligned_wp_address = addr & ~0x7; - uint32_t addr_dword_offset = addr & 0x7; - - // Do we need to split up this logical watchpoint into two hardware watchpoint - // registers? - // e.g. a watchpoint of length 4 on address 6. We need do this with - // one watchpoint on address 0 with bytes 6 & 7 being monitored - // one watchpoint on address 8 with bytes 0, 1, 2, 3 being monitored - - if (addr_dword_offset + size > 8) { - DNBLogThreadedIf(LOG_WATCHPOINTS, "DNBArchMachARM64::" - "EnableHardwareWatchpoint(addr = " - "0x%8.8llx, size = %zu) needs two " - "hardware watchpoints slots to monitor", - (uint64_t)addr, size); - int low_watchpoint_size = 8 - addr_dword_offset; - int high_watchpoint_size = addr_dword_offset + size - 8; - - uint32_t lo = EnableHardwareWatchpoint(addr, low_watchpoint_size, read, - write, also_set_on_task); - if (lo == INVALID_NUB_HW_INDEX) - return INVALID_NUB_HW_INDEX; - uint32_t hi = - EnableHardwareWatchpoint(aligned_wp_address + 8, high_watchpoint_size, - read, write, also_set_on_task); - if (hi == INVALID_NUB_HW_INDEX) { - DisableHardwareWatchpoint(lo, also_set_on_task); + if (wps.size() > 1) { + std::vector<uint32_t> wp_slots_used; + for (size_t i = 0; i < wps.size(); i++) { + uint32_t idx = EnableHardwareWatchpoint(wps[i].user_specified_start, + wps[i].user_specified_size, read, + write, also_set_on_task); + if (idx != INVALID_NUB_HW_INDEX) + wp_slots_used.push_back(idx); + } + if (wps.size() != wp_slots_used.size()) { + for (int wp_slot : wp_slots_used) + DisableHardwareWatchpoint(wp_slot, also_set_on_task); return INVALID_NUB_HW_INDEX; } - // Tag this lo->hi mapping in our database. - LoHi[lo] = hi; - return lo; + LoHi[wp_slots_used[0]] = wp_slots_used[1]; + return wp_slots_used[0]; } - // At this point - // 1 aligned_wp_address is the requested address rounded down to 8-byte - // alignment - // 2 addr_dword_offset is the offset into that double word (8-byte) region - // that we are watching - // 3 size is the number of bytes within that 8-byte region that we are - // watching + if (wps[0].aligned_size <= 8) + return SetBASWatchpoint(wps[0], read, write, also_set_on_task); + else + return INVALID_NUB_HW_INDEX; +} + +uint32_t +DNBArchMachARM64::SetBASWatchpoint(DNBArchMachARM64::watchpoint_spec wp, + bool read, bool write, + bool also_set_on_task) { + const uint32_t num_hw_watchpoints = NumSupportedHardwareWatchpoints(); + + nub_addr_t aligned_dword_addr = wp.aligned_start; + nub_addr_t watching_offset = wp.user_specified_start - wp.aligned_start; + nub_size_t watching_size = wp.user_specified_size; + + // If user asks to watch 3 bytes at 0x1005, + // aligned_dword_addr 0x1000 + // watching_offset 5 + // watching_size 3 // Set the Byte Address Selects bits DBGWCRn_EL1 bits [12:5] based on the // above. @@ -933,66 +967,75 @@ // interested in. // e.g. if we are watching bytes 4,5,6,7 in a dword we want a BAS of // 0b11110000. - uint32_t byte_address_select = ((1 << size) - 1) << addr_dword_offset; + uint32_t byte_address_select = ((1 << watching_size) - 1) << watching_offset; // Read the debug state kern_return_t kret = GetDBGState(false); + if (kret != KERN_SUCCESS) + return INVALID_NUB_HW_INDEX; - if (kret == KERN_SUCCESS) { - // Check to make sure we have the needed hardware support - uint32_t i = 0; - - for (i = 0; i < num_hw_watchpoints; ++i) { - if ((m_state.dbg.__wcr[i] & WCR_ENABLE) == 0) - break; // We found an available hw watchpoint slot (in i) - } - - // See if we found an available hw watchpoint slot above - if (i < num_hw_watchpoints) { - // DumpDBGState(m_state.dbg); - - // Clear any previous LoHi joined-watchpoint that may have been in use - LoHi[i] = 0; + // Check to make sure we have the needed hardware support + uint32_t i = 0; - // shift our Byte Address Select bits up to the correct bit range for the - // DBGWCRn_EL1 - byte_address_select = byte_address_select << 5; + for (i = 0; i < num_hw_watchpoints; ++i) { + if ((m_state.dbg.__wcr[i] & WCR_ENABLE) == 0) + break; // We found an available hw watchpoint slot + } + if (i == num_hw_watchpoints) { + DNBLogThreadedIf(LOG_WATCHPOINTS, + "DNBArchMachARM64::" + "SetBASWatchpoint(): All " + "hardware resources (%u) are in use.", + num_hw_watchpoints); + return INVALID_NUB_HW_INDEX; + } - // Make sure bits 1:0 are clear in our address - m_state.dbg.__wvr[i] = aligned_wp_address; // DVA (Data Virtual Address) - m_state.dbg.__wcr[i] = byte_address_select | // Which bytes that follow + DNBLogThreadedIf(LOG_WATCHPOINTS, + "DNBArchMachARM64::" + "SetBASWatchpoint() " + "set hardware register %d to BAS watchpoint " + "aligned start address 0x%llx, watch region start " + "offset %lld, number of bytes %zu", + i, aligned_dword_addr, watching_offset, watching_size); + + // Clear any previous LoHi joined-watchpoint that may have been in use + LoHi[i] = 0; + + // shift our Byte Address Select bits up to the correct bit range for the + // DBGWCRn_EL1 + byte_address_select = byte_address_select << 5; + + // Make sure bits 1:0 are clear in our address + m_state.dbg.__wvr[i] = aligned_dword_addr; // DVA (Data Virtual Address) + m_state.dbg.__wcr[i] = byte_address_select | // Which bytes that follow // the DVA that we will watch - S_USER | // Stop only in user mode - (read ? WCR_LOAD : 0) | // Stop on read access? - (write ? WCR_STORE : 0) | // Stop on write access? - WCR_ENABLE; // Enable this watchpoint; + S_USER | // Stop only in user mode + (read ? WCR_LOAD : 0) | // Stop on read access? + (write ? WCR_STORE : 0) | // Stop on write access? + WCR_ENABLE; // Enable this watchpoint; - DNBLogThreadedIf( - LOG_WATCHPOINTS, "DNBArchMachARM64::EnableHardwareWatchpoint() " - "adding watchpoint on address 0x%llx with control " - "register value 0x%x", - (uint64_t)m_state.dbg.__wvr[i], (uint32_t)m_state.dbg.__wcr[i]); + DNBLogThreadedIf(LOG_WATCHPOINTS, + "DNBArchMachARM64::SetBASWatchpoint() " + "adding watchpoint on address 0x%llx with control " + "register value 0x%x", + (uint64_t)m_state.dbg.__wvr[i], + (uint32_t)m_state.dbg.__wcr[i]); - // The kernel will set the MDE_ENABLE bit in the MDSCR_EL1 for us - // automatically, don't need to do it here. + // The kernel will set the MDE_ENABLE bit in the MDSCR_EL1 for us + // automatically, don't need to do it here. - kret = SetDBGState(also_set_on_task); - // DumpDBGState(m_state.dbg); + kret = SetDBGState(also_set_on_task); + // DumpDBGState(m_state.dbg); - DNBLogThreadedIf(LOG_WATCHPOINTS, "DNBArchMachARM64::" - "EnableHardwareWatchpoint() " - "SetDBGState() => 0x%8.8x.", - kret); + DNBLogThreadedIf(LOG_WATCHPOINTS, + "DNBArchMachARM64::" + "SetBASWatchpoint() " + "SetDBGState() => 0x%8.8x.", + kret); + + if (kret == KERN_SUCCESS) + return i; - if (kret == KERN_SUCCESS) - return i; - } else { - DNBLogThreadedIf(LOG_WATCHPOINTS, "DNBArchMachARM64::" - "EnableHardwareWatchpoint(): All " - "hardware resources (%u) are in use.", - num_hw_watchpoints); - } - } return INVALID_NUB_HW_INDEX; } @@ -1020,9 +1063,10 @@ m_state.dbg.__wvr[hw_index] = m_disabled_watchpoints[hw_index].addr; m_state.dbg.__wcr[hw_index] = m_disabled_watchpoints[hw_index].control; - DNBLogThreadedIf(LOG_WATCHPOINTS, "DNBArchMachARM64::" - "EnableHardwareWatchpoint( %u ) - WVR%u = " - "0x%8.8llx WCR%u = 0x%8.8llx", + DNBLogThreadedIf(LOG_WATCHPOINTS, + "DNBArchMachARM64::" + "SetBASWatchpoint( %u ) - WVR%u = " + "0x%8.8llx WCR%u = 0x%8.8llx", hw_index, hw_index, (uint64_t)m_state.dbg.__wvr[hw_index], hw_index, (uint64_t)m_state.dbg.__wcr[hw_index]); Index: lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/main.c =================================================================== --- /dev/null +++ lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/main.c @@ -0,0 +1,17 @@ +#include <stdint.h> +#include <stdio.h> +int main() { + union { + uint8_t bytebuf[16]; + uint16_t shortbuf[8]; + uint64_t dwordbuf[2]; + } a; + a.dwordbuf[0] = a.dwordbuf[1] = 0; + a.bytebuf[0] = 0; // break here + for (int i = 0; i < 8; i++) { + a.shortbuf[i] += i; + } + for (int i = 0; i < 8; i++) { + a.shortbuf[i] += i; + } +} Index: lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/TestUnalignedSpanningDwords.py =================================================================== --- /dev/null +++ lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/TestUnalignedSpanningDwords.py @@ -0,0 +1,61 @@ +""" +Watch 4 bytes which spawn two doubleword aligned regions. +On a target that supports 8 byte watchpoints, this will +need to be implemented with a hardware watchpoint on both +doublewords. +""" + + + +import lldb +from lldbsuite.test.decorators import * +from lldbsuite.test.lldbtest import * +from lldbsuite.test import lldbutil + +class UnalignedWatchpointTestCase(TestBase): + + def hit_watchpoint_and_continue(self, process, iter_str): + process.Continue() + self.assertEqual(process.GetState(), lldb.eStateStopped, + iter_str) + thread = process.GetSelectedThread() + self.assertEqual(thread.GetStopReason(), lldb.eStopReasonWatchpoint, iter_str) + self.assertEqual(thread.GetStopReasonDataCount(), 1, iter_str) + wp_num = thread.GetStopReasonDataAtIndex(0) + self.assertEqual(wp_num, 1, iter_str) + + NO_DEBUG_INFO_TESTCASE = True + # debugserver on AArch64 has this feature. + @skipIf(archs=no_match(['x86_64', 'arm64', 'arm64e', 'aarch64'])) + @skipUnlessDarwin + # debugserver only started returning an exception address within + # a range lldb expects in https://reviews.llvm.org/D147820 2023-04-12. + # older debugservers will return the base address of the doubleword + # which lldb doesn't understand, and will stop executing without a + # proper stop reason. + @skipIfOutOfTreeDebugserver + + def test_unaligned_watchpoint(self): + """Test a watchpoint that is handled by two hardware watchpoint registers.""" + self.build() + self.main_source_file = lldb.SBFileSpec("main.c") + (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(self, + "break here", self.main_source_file) + + frame = thread.GetFrameAtIndex(0) + + a_bytebuf_6 = frame.GetValueForVariablePath("a.bytebuf[6]") + a_bytebuf_6_addr = a_bytebuf_6.GetAddress().GetLoadAddress(target) + err = lldb.SBError() + wp = target.WatchAddress(a_bytebuf_6_addr, 4, False, True, err) + self.assertTrue(err.Success()) + self.assertTrue(wp.IsEnabled()) + self.assertEqual(wp.GetWatchSize(), 4) + self.assertGreater(wp.GetWatchAddress() % 8, 4, "watched region spans two doublewords") + + # We will hit our watchpoint 6 times during the execution + # of the inferior. If the remote stub does not actually split + # the watched region into two doubleword watchpoints, we will + # exit before we get to 6 watchpoint hits. + for i in range(1, 7): + self.hit_watchpoint_and_continue(process, "wp hit number %s" % i) Index: lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/Makefile =================================================================== --- /dev/null +++ lldb/test/API/functionalities/watchpoint/unaligned-spanning-two-dwords/Makefile @@ -0,0 +1,3 @@ +C_SOURCES := main.c + +include Makefile.rules
_______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits