https://github.com/DavidSpickett created https://github.com/llvm/llvm-project/pull/181860
Fixes #177570 This is a port of changes first proposed in #178653. They address the mismatch between the visible position of a character and the position in the text data from which we must print to print that visible character. This fixes a few crashes and truncated text where we had counted the ANSI code as if it were visible characters. The big addition here is VisiblePositionToActualPosition. This converts between the visible position (how we are reasoning about screen width) to the actual position in the data. OutputWordWrappedLines works on a copy of the string stripped of ANSI codes at first. Once we have the visible positions we want to use, those are converted to actual positions for printing. We could a bunch more conversions but for these small strings, a copy is probably fine and it's easier to understand the logic this way. I do feel like I have reinvented the existing TrimAndPad, with the extra condition of trimming only at whitespace. I will attempt to merge the two but would prefer to land this implementation first. OutputWordWrappedLines still has problems with unicode and multi-word spanning ANSI codes, but I will address those in a follow up. What's fixed here is enough to close the linked issue. >From 23937bc0658ec89bb8b00effd196999f03096245 Mon Sep 17 00:00:00 2001 From: David Spickett <[email protected]> Date: Tue, 17 Feb 2026 16:04:45 +0000 Subject: [PATCH] [lldb] Fix handling of ANSI codes when formatting option help Fixes #177570 This is a port of changes first proposed in #178653. They address the mismatch between the visible position of a character and the position in the text data from which we must print to print that visible character. This fixes a few crashes and truncated text where we had counted the ANSI code as if it were visible characters. The big addition here is VisiblePositionToActualPosition. This converts between the visible position (how we are reasoning about screen width) to the actual position in the data. OutputWordWrappedLines works on a copy of the string stripped of ANSI codes at first. Once we have the visible positions we want to use, those are converted to actual positions for printing. We could a bunch more conversions but for these small strings, a copy is probably fine and it's easier to understand the logic this way. I do feel like I have reinvented the existing TrimAndPad, with the extra condition of trimming only at whitespace. I will attempt to merge the two but would prefer to land this implementation first. OutputWordWrappedLines still has problems with unicode and multi-word spanning ANSI codes, but I will address those in a follow up. What's fixed here is enough to close the linked issue. --- lldb/include/lldb/Utility/AnsiTerminal.h | 110 ++++++++++++++++++-- lldb/test/API/commands/help/TestHelp.py | 20 ++-- lldb/unittests/Utility/AnsiTerminalTest.cpp | 65 +++++++++++- 3 files changed, 180 insertions(+), 15 deletions(-) diff --git a/lldb/include/lldb/Utility/AnsiTerminal.h b/lldb/include/lldb/Utility/AnsiTerminal.h index 153602cc08b09..a723cc7bf99a7 100644 --- a/lldb/include/lldb/Utility/AnsiTerminal.h +++ b/lldb/include/lldb/Utility/AnsiTerminal.h @@ -289,10 +289,87 @@ inline size_t ColumnWidth(llvm::StringRef str) { return llvm::sys::locale::columnWidth(stripped); } +struct VisibleActualPositionPair { + size_t visible; + size_t actual; + + bool operator==(const VisibleActualPositionPair &rhs) const { + return visible == rhs.visible && actual == rhs.actual; + } +}; + +/// This function converts a position in "visible" text (text with ANSI codes +/// removed) into a position in the data of the text (which includes ANSI +/// codes). That actual position should be used when printing out the data. +// +/// If a character is preceeded by an ANSI code, the returned position will +/// point to that ANSI code. As formatting that visible character requires the +/// code. +// +/// This logic does not extend to whole words. This function assumes you are not +/// going to split up the words of whatever text you pass in here. Making sure +/// not to split words is the responsibility of the caller of this function. +// +/// Returns a pair containing {visible position, actual position}. This may be +/// passed back to the function later as "hint" to allow it to skip ahead if you +/// are asking for a visible position equal or greater to the one in the hint. +// +/// The function assumes that the hint provided is correct, even if it refers +/// to a position that does not exist. If the hint is for the exact requested +/// visible position, it will return the hint, without checking the string at +/// all. +inline VisibleActualPositionPair +VisiblePositionToActualPosition(llvm::StringRef text, size_t visible_position, + std::optional<VisibleActualPositionPair> hint) { + size_t actual_position = 0; + const size_t wanted_visible_position = visible_position; + visible_position = 0; + llvm::StringRef remaining_text = text; + + if (hint) { + if (hint->visible == wanted_visible_position) + return *hint; + + if (hint->visible < wanted_visible_position) { + // Skip forward using the hint. + visible_position = hint->visible; + actual_position = hint->actual; + + remaining_text = remaining_text.drop_front(actual_position); + } + } + + while (remaining_text.size()) { + auto [left, escape, right] = ansi::FindNextAnsiSequence(remaining_text); + + for (unsigned i = 0; i < left.size(); ++i) { + if (visible_position == wanted_visible_position) + return {wanted_visible_position, actual_position}; + + actual_position++; + visible_position++; + } + + if (visible_position == wanted_visible_position) + return {wanted_visible_position, actual_position}; + + actual_position += escape.size(); + remaining_text = right; + } + + assert(visible_position == wanted_visible_position && + "should have found visible_position by now"); + + return {wanted_visible_position, actual_position}; +} + // Output text that may contain ANSI codes, word wrapped (wrapped at whitespace) // to the given stream. The indent level of the stream is counted towards the // output line length. -// FIXME: This contains several bugs and does not handle unicode. +// FIXME: This does not handle unicode correctly. +// FIXME: If an ANSI code is applied to multiple words and those words are split +// across lines, the code will apply to the indentation as well as the +// text. inline void OutputWordWrappedLines(Stream &strm, llvm::StringRef text, uint32_t output_max_columns) { // We will indent using the stream, so leading whitespace is not significant. @@ -321,14 +398,24 @@ inline void OutputWordWrappedLines(Stream &strm, llvm::StringRef text, // applied to the indent too. const int max_text_width = output_max_columns - strm.GetIndentLevel() - 1; + // start, end and final_end are all positions in the visible text, + // not the data representing that text. int start = 0; int end = start; const int final_end = visible_length; + std::optional<ansi::VisibleActualPositionPair> conversion_hint; + + // FIXME: This removes ANSI but unicode will still take up > 1 byte per + // character. + // We can either constantly convert between visible and actual position, + // or make a copy and only convert at the end. Assume that a copy is cheap to + // do. + const std::string text_without_ansi = ansi::StripAnsiTerminalCodes(text); while (end < final_end) { // Don't start the 'text' on a space, since we're already outputting the // indentation. - while ((start < final_end) && (text[start] == ' ')) + while ((start < final_end) && (text_without_ansi[start] == ' ')) start++; end = start + max_text_width; @@ -338,18 +425,29 @@ inline void OutputWordWrappedLines(Stream &strm, llvm::StringRef text, if (end != final_end) { // If we're not at the end of the text, make sure we break the line on // white space. - while (end > start && text[end] != ' ' && text[end] != '\t' && - text[end] != '\n') + while (end > start && text_without_ansi[end] != ' ' && + text_without_ansi[end] != '\t' && text_without_ansi[end] != '\n') end--; } - const int sub_len = end - start; if (start != 0) strm.EOL(); strm.Indent(); + + const int sub_len = end - start; + UNUSED_IF_ASSERT_DISABLED(sub_len); assert(start < final_end); assert(start + sub_len <= final_end); - strm << text.substr(start, sub_len); + + conversion_hint = + ansi::VisiblePositionToActualPosition(text, start, conversion_hint); + const size_t start_actual = conversion_hint->actual; + conversion_hint = + ansi::VisiblePositionToActualPosition(text, end, conversion_hint); + const size_t end_actual = conversion_hint->actual; + + const int sub_len_actual = end_actual - start_actual; + strm << text.substr(start_actual, sub_len_actual); start = end + 1; } strm.EOL(); diff --git a/lldb/test/API/commands/help/TestHelp.py b/lldb/test/API/commands/help/TestHelp.py index 15cb07b1f32e6..cb6e9473c1047 100644 --- a/lldb/test/API/commands/help/TestHelp.py +++ b/lldb/test/API/commands/help/TestHelp.py @@ -327,8 +327,6 @@ def test_help_option_description_terminal_width_with_ansi(self): ANSI codes acccording to the terminal width.""" self.runCmd("settings set use-color on") - # FIXME: lldb crashes when the width is exactly 135 - https://github.com/llvm/llvm-project/issues/177570 - # Should fit on one line. self.runCmd("settings set term-width 138") self.expect( @@ -340,18 +338,28 @@ def test_help_option_description_terminal_width_with_ansi(self): ], ) - # Must be printed on two lines. - # FIXME: Second line is truncated - https://github.com/llvm/llvm-project/issues/177570 self.runCmd("settings set term-width 100") self.expect( "help breakpoint set", matching=True, patterns=[ - r"\s+\x1b\[4mS\x1b\[0met the breakpoint only in this shared library. Can repeat this option\n" - r"\s+multiple times to specify multiple shared li\n" + r"\s+\x1b\[4mS\x1b\[0met the breakpoint only in this shared library. Can repeat this option multiple times\n" + r"\s+to specify multiple shared libraries.\n" ], ) + # If we do not account for the difference between the visible character's + # position and that character's real position into the string with the invisible + # ANSI codes, we will crash in various ways. Writing tests for each width + # would require duplicating the line splitting algorithm here. So instead, + # we will try to provoke crashes if any exist, checking that the start + # and end of the output is shown. + for width in range(70, 150): + self.runCmd(f"settings set term-width {width}") + self.expect( + "help breakpoint set", substrs=["\x1b[4mS\x1b[0met the", "libraries."] + ) + @no_debug_info_test def test_help_shows_optional_short_options(self): """Test that optional short options are printed and that they are in diff --git a/lldb/unittests/Utility/AnsiTerminalTest.cpp b/lldb/unittests/Utility/AnsiTerminalTest.cpp index 28fa32461ad5f..8410f53578175 100644 --- a/lldb/unittests/Utility/AnsiTerminalTest.cpp +++ b/lldb/unittests/Utility/AnsiTerminalTest.cpp @@ -163,12 +163,11 @@ TEST(AnsiTerminal, OutputWordWrappedLines) { "The quick \nbrown fox.\n"); // As ANSI codes do not add to visible length, the results - // should be the same as the plain text verison. + // should be the same as the plain text version. const char *fox_str_ansi = "\x1B[4mT\x1B[0mhe quick brown fox."; TestLines(fox_str_ansi, 0, 30, "\x1B[4mT\x1B[0mhe quick brown fox.\n"); TestLines(fox_str_ansi, 5, 30, " \x1B[4mT\x1B[0mhe quick brown fox.\n"); - // FIXME: Account for ANSI codes not contributing to visible length. - TestLines(fox_str_ansi, 0, 15, "\x1B[4mT\x1B[0mhe\nquick br\n"); + TestLines(fox_str_ansi, 0, 15, "\x1B[4mT\x1B[0mhe quick\nbrown fox.\n"); const std::string fox_str_emoji = "🦊 The quick brown fox. 🦊"; TestLines(fox_str_emoji, 0, 30, "🦊 The quick brown fox. 🦊\n"); @@ -180,3 +179,63 @@ TEST(AnsiTerminal, OutputWordWrappedLines) { // FIXME: should not split the middle of an emoji. TestLines("🦊🦊🦊 🦊🦊", 0, 5, "\n\n\n\n\n\n\n\x8A\xF0\x9F\xA6\n"); } + +TEST(AnsiTerminal, VisiblePositionToActualPosition) { + using Hint = ansi::VisibleActualPositionPair; + EXPECT_EQ((Hint{0, 0}), + ansi::VisiblePositionToActualPosition("", 0, std::nullopt)); + EXPECT_EQ((Hint{0, 0}), + ansi::VisiblePositionToActualPosition("abc", 0, std::nullopt)); + EXPECT_EQ((Hint{1, 1}), + ansi::VisiblePositionToActualPosition("abc", 1, std::nullopt)); + EXPECT_EQ((Hint{2, 2}), + ansi::VisiblePositionToActualPosition("abc", 2, std::nullopt)); + // We expect callers to limit the visible index to its valid range. + + // When a visible character is preceeded by an ANSI code, we would need to + // print that code when printing the character. Therefore, the actual index + // points to the preceeding ANSI code, not the visible character itself. + EXPECT_EQ((Hint{0, 0}), ansi::VisiblePositionToActualPosition("\x1B[4mabc", 0, + std::nullopt)); + EXPECT_EQ((Hint{1, 1}), ansi::VisiblePositionToActualPosition("a\x1B[4mbc", 1, + std::nullopt)); + EXPECT_EQ((Hint{2, 2}), ansi::VisiblePositionToActualPosition("ab\x1B[4mc", 2, + std::nullopt)); + + // If the visible character is proceeded by an ANSI code, we don't need to + // adjust anything. The actual index is the index of the visible character + // itself. + EXPECT_EQ((Hint{0, 0}), ansi::VisiblePositionToActualPosition("a\x1B[4mbc", 0, + std::nullopt)); + EXPECT_EQ((Hint{1, 1}), ansi::VisiblePositionToActualPosition("ab\x1B[4mc", 1, + std::nullopt)); + EXPECT_EQ((Hint{2, 2}), ansi::VisiblePositionToActualPosition("abc\x1B[4m", 2, + std::nullopt)); + + // If we want a visible character that is after ANSI codes for other + // characters, the actual index must know to skip those previous codes. + EXPECT_EQ((Hint{1, 5}), ansi::VisiblePositionToActualPosition("\x1B[4mabc", 1, + std::nullopt)); + EXPECT_EQ((Hint{2, 6}), ansi::VisiblePositionToActualPosition("a\x1B[4mbc", 2, + std::nullopt)); + + // We can give it a previous result to skip forward. To prove it does not look + // at the early parts of the string, give it hints that actually produce + // incorrect results. + + const char *actual_text = "abcdefghijk"; + // This does nothing because the hint is the answer. + EXPECT_EQ((Hint{1, 5}), + ansi::VisiblePositionToActualPosition(actual_text, 1, Hint{1, 5})); + // The hint can be completely bogus, but we trust it. So if it refers to the + // visible index being asked for, we just return the hint. + EXPECT_EQ((Hint{99, 127}), ansi::VisiblePositionToActualPosition( + actual_text, 99, Hint{99, 127})); + // This should return {2, 1}, but the actual is offset by 5 due to the hint. + EXPECT_EQ((Hint{2, 6}), + ansi::VisiblePositionToActualPosition(actual_text, 2, Hint{1, 5})); + // If the hint is for a visible index > the wanted visible index, we cannot do + // anything with it. The function can only look forward. + EXPECT_EQ((Hint{2, 2}), + ansi::VisiblePositionToActualPosition(actual_text, 2, Hint{3, 6})); +} _______________________________________________ lldb-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
