Re: r332458 - [AST] Added a helper to extract a user-friendly text of a comment.

Clement Courbet via cfe-commits Thu, 17 May 2018 00:04:38 -0700

I should have fixed it in r332576.

On Wed, May 16, 2018 at 11:49 PM, Galina Kistanova via cfe-commits <
cfe-commits@lists.llvm.org> wrote:


> Also few other builders are affected:
>
> http://lab.llvm.org:8011/builders/clang-x86_64-linux-abi-test
> http://lab.llvm.org:8011/builders/clang-lld-x86_64-2stage
> http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu
>
>
> Thanks
>
> Galina
>
> On Wed, May 16, 2018 at 12:58 PM, Galina Kistanova <gkistan...@gmail.com>
> wrote:
>
>> Hello Ilya,
>>
>> This commit broke build step for couple of our builders:
>>
>> http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/8541
>> http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu
>>
>> . . .
>> FAILED: 
>> tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o
>>
>> /usr/bin/c++   -DGTEST_HAS_RTTI=0 -DGTEST_HAS_TR1_TUPLE=0
>> -DGTEST_LANG_CXX11=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
>> -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/unittests/AST
>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/unittests/AST
>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src/tools/clang/include
>> -Itools/clang/include -Iinclude -I/home/buildslave/buildslave1
>> a/clang-with-lto-ubuntu/llvm.src/include -I/home/buildslave/buildslave1
>> a/clang-with-lto-ubuntu/llvm.src/utils/unittest/googletest/include
>> -I/home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.
>> src/utils/unittest/googlemock/include -fPIC -fvisibility-inlines-hidden
>> -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual
>> -Wno-missing-field-initializers -pedantic -Wno-long-long
>> -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
>> -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual
>> -fno-strict-aliasing -O3 -DNDEBUG    -Wno-variadic-macros -fno-exceptions
>> -fno-rtti -MD -MT tools/clang/unittests/AST/CMak
>> eFiles/ASTTests.dir/CommentTextTest.cpp.o -MF
>> tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o.d
>> -o tools/clang/unittests/AST/CMakeFiles/ASTTests.dir/CommentTextTest.cpp.o
>> -c /home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src
>> /tools/clang/unittests/AST/CommentTextTest.cpp
>> /home/buildslave/buildslave1a/clang-with-lto-ubuntu/llvm.src
>> /tools/clang/unittests/AST/CommentTextTest.cpp:62:1: error: unterminated
>> raw string
>>  R"cpp(
>>  ^
>> . . .
>>
>> Please have a look?
>>
>> The builder was already red and did not send notifications.
>>
>> Thanks
>>
>> Galina
>>
>>
>>
>> On Wed, May 16, 2018 at 5:30 AM, Ilya Biryukov via cfe-commits <
>> cfe-commits@lists.llvm.org> wrote:
>>
>>> Author: ibiryukov
>>> Date: Wed May 16 05:30:09 2018
>>> New Revision: 332458
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=332458&view=rev
>>> Log:
>>> [AST] Added a helper to extract a user-friendly text of a comment.
>>>
>>> Summary:
>>> The helper is used in clangd for documentation shown in code completion
>>> and storing the docs in the symbols. See D45999.
>>>
>>> This patch reuses the code of the Doxygen comment lexer, disabling the
>>> bits that do command and html tag parsing.
>>> The new helper works on all comments, including non-doxygen comments.
>>> However, it does not understand or transform any doxygen directives,
>>> i.e. cannot extract brief text, etc.
>>>
>>> Reviewers: sammccall, hokein, ioeric
>>>
>>> Reviewed By: ioeric
>>>
>>> Subscribers: mgorny, cfe-commits
>>>
>>> Differential Revision: https://reviews.llvm.org/D46000
>>>
>>> Added:
>>>     cfe/trunk/unittests/AST/CommentTextTest.cpp
>>> Modified:
>>>     cfe/trunk/include/clang/AST/CommentLexer.h
>>>     cfe/trunk/include/clang/AST/RawCommentList.h
>>>     cfe/trunk/lib/AST/CommentLexer.cpp
>>>     cfe/trunk/lib/AST/RawCommentList.cpp
>>>     cfe/trunk/unittests/AST/CMakeLists.txt
>>>
>>> Modified: cfe/trunk/include/clang/AST/CommentLexer.h
>>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/
>>> AST/CommentLexer.h?rev=332458&r1=332457&r2=332458&view=diff
>>> ============================================================
>>> ==================
>>> --- cfe/trunk/include/clang/AST/CommentLexer.h (original)
>>> +++ cfe/trunk/include/clang/AST/CommentLexer.h Wed May 16 05:30:09 2018
>>> @@ -281,6 +281,11 @@ private:
>>>    /// command, including command marker.
>>>    SmallString<16> VerbatimBlockEndCommandName;
>>>
>>> +  /// If true, the commands, html tags, etc will be parsed and reported
>>> as
>>> +  /// separate tokens inside the comment body. If false, the comment
>>> text will
>>> +  /// be parsed into text and newline tokens.
>>> +  bool ParseCommands;
>>> +
>>>    /// Given a character reference name (e.g., "lt"), return the
>>> character that
>>>    /// it stands for (e.g., "<").
>>>    StringRef resolveHTMLNamedCharacterReference(StringRef Name) const;
>>> @@ -315,12 +320,11 @@ private:
>>>    /// Eat string matching regexp \code \s*\* \endcode.
>>>    void skipLineStartingDecorations();
>>>
>>> -  /// Lex stuff inside comments.  CommentEnd should be set correctly.
>>> +  /// Lex comment text, including commands if ParseCommands is set to
>>> true.
>>>    void lexCommentText(Token &T);
>>>
>>> -  void setupAndLexVerbatimBlock(Token &T,
>>> -                                const char *TextBegin,
>>> -                                char Marker, const CommandInfo *Info);
>>> +  void setupAndLexVerbatimBlock(Token &T, const char *TextBegin, char
>>> Marker,
>>> +                                const CommandInfo *Info);
>>>
>>>    void lexVerbatimBlockFirstLine(Token &T);
>>>
>>> @@ -343,14 +347,13 @@ private:
>>>
>>>  public:
>>>    Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine &Diags,
>>> -        const CommandTraits &Traits,
>>> -        SourceLocation FileLoc,
>>> -        const char *BufferStart, const char *BufferEnd);
>>> +        const CommandTraits &Traits, SourceLocation FileLoc,
>>> +        const char *BufferStart, const char *BufferEnd,
>>> +        bool ParseCommands = true);
>>>
>>>    void lex(Token &T);
>>>
>>> -  StringRef getSpelling(const Token &Tok,
>>> -                        const SourceManager &SourceMgr,
>>> +  StringRef getSpelling(const Token &Tok, const SourceManager
>>> &SourceMgr,
>>>                          bool *Invalid = nullptr) const;
>>>  };
>>>
>>>
>>> Modified: cfe/trunk/include/clang/AST/RawCommentList.h
>>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/
>>> AST/RawCommentList.h?rev=332458&r1=332457&r2=332458&view=diff
>>> ============================================================
>>> ==================
>>> --- cfe/trunk/include/clang/AST/RawCommentList.h (original)
>>> +++ cfe/trunk/include/clang/AST/RawCommentList.h Wed May 16 05:30:09
>>> 2018
>>> @@ -111,6 +111,30 @@ public:
>>>      return extractBriefText(Context);
>>>    }
>>>
>>> +  /// Returns sanitized comment text, suitable for presentation in
>>> editor UIs.
>>> +  /// E.g. will transform:
>>> +  ///     // This is a long multiline comment.
>>> +  ///     //   Parts of it  might be indented.
>>> +  ///     /* The comments styles might be mixed. */
>>> +  ///  into
>>> +  ///     "This is a long multiline comment.\n"
>>> +  ///     "  Parts of it  might be indented.\n"
>>> +  ///     "The comments styles might be mixed."
>>> +  /// Also removes leading indentation and sanitizes some common cases:
>>> +  ///     /* This is a first line.
>>> +  ///      *   This is a second line. It is indented.
>>> +  ///      * This is a third line. */
>>> +  /// and
>>> +  ///     /* This is a first line.
>>> +  ///          This is a second line. It is indented.
>>> +  ///     This is a third line. */
>>> +  /// will both turn into:
>>> +  ///     "This is a first line.\n"
>>> +  ///     "  This is a second line. It is indented.\n"
>>> +  ///     "This is a third line."
>>> +  std::string getFormattedText(const SourceManager &SourceMgr,
>>> +                               DiagnosticsEngine &Diags) const;
>>> +
>>>    /// Parse the comment, assuming it is attached to decl \c D.
>>>    comments::FullComment *parse(const ASTContext &Context,
>>>                                 const Preprocessor *PP, const Decl *D)
>>> const;
>>>
>>> Modified: cfe/trunk/lib/AST/CommentLexer.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/Commen
>>> tLexer.cpp?rev=332458&r1=332457&r2=332458&view=diff
>>> ============================================================
>>> ==================
>>> --- cfe/trunk/lib/AST/CommentLexer.cpp (original)
>>> +++ cfe/trunk/lib/AST/CommentLexer.cpp Wed May 16 05:30:09 2018
>>> @@ -294,6 +294,39 @@ void Lexer::lexCommentText(Token &T) {
>>>    assert(CommentState == LCS_InsideBCPLComment ||
>>>           CommentState == LCS_InsideCComment);
>>>
>>> +  // Handles lexing non-command text, i.e. text and newline.
>>> +  auto HandleNonCommandToken = [&]() -> void {
>>> +    assert(State == LS_Normal);
>>> +
>>> +    const char *TokenPtr = BufferPtr;
>>> +    assert(TokenPtr < CommentEnd);
>>> +    switch (*TokenPtr) {
>>> +      case '\n':
>>> +      case '\r':
>>> +          TokenPtr = skipNewline(TokenPtr, CommentEnd);
>>> +          formTokenWithChars(T, TokenPtr, tok::newline);
>>> +
>>> +          if (CommentState == LCS_InsideCComment)
>>> +            skipLineStartingDecorations();
>>> +          return;
>>> +
>>> +      default: {
>>> +          StringRef TokStartSymbols = ParseCommands ? "\n\r\\@&<" :
>>> "\n\r";
>>> +          size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr)
>>> +                           .find_first_of(TokStartSymbols);
>>> +          if (End != StringRef::npos)
>>> +            TokenPtr += End;
>>> +          else
>>> +            TokenPtr = CommentEnd;
>>> +          formTextToken(T, TokenPtr);
>>> +          return;
>>> +      }
>>> +    }
>>> +  };
>>> +
>>> +  if (!ParseCommands)
>>> +    return HandleNonCommandToken();
>>> +
>>>    switch (State) {
>>>    case LS_Normal:
>>>      break;
>>> @@ -315,136 +348,116 @@ void Lexer::lexCommentText(Token &T) {
>>>    }
>>>
>>>    assert(State == LS_Normal);
>>> -
>>>    const char *TokenPtr = BufferPtr;
>>>    assert(TokenPtr < CommentEnd);
>>> -  while (TokenPtr != CommentEnd) {
>>> -    switch(*TokenPtr) {
>>> -      case '\\':
>>> -      case '@': {
>>> -        // Commands that start with a backslash and commands that start
>>> with
>>> -        // 'at' have equivalent semantics.  But we keep information
>>> about the
>>> -        // exact syntax in AST for comments.
>>> -        tok::TokenKind CommandKind =
>>> -            (*TokenPtr == '@') ? tok::at_command :
>>> tok::backslash_command;
>>> +  switch(*TokenPtr) {
>>> +    case '\\':
>>> +    case '@': {
>>> +      // Commands that start with a backslash and commands that start
>>> with
>>> +      // 'at' have equivalent semantics.  But we keep information about
>>> the
>>> +      // exact syntax in AST for comments.
>>> +      tok::TokenKind CommandKind =
>>> +          (*TokenPtr == '@') ? tok::at_command : tok::backslash_command;
>>> +      TokenPtr++;
>>> +      if (TokenPtr == CommentEnd) {
>>> +        formTextToken(T, TokenPtr);
>>> +        return;
>>> +      }
>>> +      char C = *TokenPtr;
>>> +      switch (C) {
>>> +      default:
>>> +        break;
>>> +
>>> +      case '\\': case '@': case '&': case '$':
>>> +      case '#':  case '<': case '>': case '%':
>>> +      case '\"': case '.': case ':':
>>> +        // This is one of \\ \@ \& \$ etc escape sequences.
>>>          TokenPtr++;
>>> -        if (TokenPtr == CommentEnd) {
>>> -          formTextToken(T, TokenPtr);
>>> -          return;
>>> -        }
>>> -        char C = *TokenPtr;
>>> -        switch (C) {
>>> -        default:
>>> -          break;
>>> -
>>> -        case '\\': case '@': case '&': case '$':
>>> -        case '#':  case '<': case '>': case '%':
>>> -        case '\"': case '.': case ':':
>>> -          // This is one of \\ \@ \& \$ etc escape sequences.
>>> +        if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {
>>> +          // This is the \:: escape sequence.
>>>            TokenPtr++;
>>> -          if (C == ':' && TokenPtr != CommentEnd && *TokenPtr == ':') {
>>> -            // This is the \:: escape sequence.
>>> -            TokenPtr++;
>>> -          }
>>> -          StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr
>>> + 1));
>>> -          formTokenWithChars(T, TokenPtr, tok::text);
>>> -          T.setText(UnescapedText);
>>> -          return;
>>>          }
>>> +        StringRef UnescapedText(BufferPtr + 1, TokenPtr - (BufferPtr +
>>> 1));
>>> +        formTokenWithChars(T, TokenPtr, tok::text);
>>> +        T.setText(UnescapedText);
>>> +        return;
>>> +      }
>>>
>>> -        // Don't make zero-length commands.
>>> -        if (!isCommandNameStartCharacter(*TokenPtr)) {
>>> -          formTextToken(T, TokenPtr);
>>> -          return;
>>> -        }
>>> +      // Don't make zero-length commands.
>>> +      if (!isCommandNameStartCharacter(*TokenPtr)) {
>>> +        formTextToken(T, TokenPtr);
>>> +        return;
>>> +      }
>>>
>>> -        TokenPtr = skipCommandName(TokenPtr, CommentEnd);
>>> -        unsigned Length = TokenPtr - (BufferPtr + 1);
>>> +      TokenPtr = skipCommandName(TokenPtr, CommentEnd);
>>> +      unsigned Length = TokenPtr - (BufferPtr + 1);
>>>
>>> -        // Hardcoded support for lexing LaTeX formula commands
>>> -        // \f$ \f[ \f] \f{ \f} as a single command.
>>> -        if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr !=
>>> CommentEnd) {
>>> -          C = *TokenPtr;
>>> -          if (C == '$' || C == '[' || C == ']' || C == '{' || C == '}')
>>> {
>>> -            TokenPtr++;
>>> -            Length++;
>>> -          }
>>> +      // Hardcoded support for lexing LaTeX formula commands
>>> +      // \f$ \f[ \f] \f{ \f} as a single command.
>>> +      if (Length == 1 && TokenPtr[-1] == 'f' && TokenPtr != CommentEnd)
>>> {
>>> +        C = *TokenPtr;
>>> +        if (C == '$' || C == '[' || C == ']' || C == '{' || C == '}') {
>>> +          TokenPtr++;
>>> +          Length++;
>>>          }
>>> +      }
>>>
>>> -        StringRef CommandName(BufferPtr + 1, Length);
>>> +      StringRef CommandName(BufferPtr + 1, Length);
>>>
>>> -        const CommandInfo *Info = Traits.getCommandInfoOrNULL(Co
>>> mmandName);
>>> -        if (!Info) {
>>> -          if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {
>>> -            StringRef CorrectedName = Info->Name;
>>> -            SourceLocation Loc = getSourceLocation(BufferPtr);
>>> -            SourceLocation EndLoc = getSourceLocation(TokenPtr);
>>> -            SourceRange FullRange = SourceRange(Loc, EndLoc);
>>> -            SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);
>>> -            Diag(Loc, diag::warn_correct_comment_command_name)
>>> -              << FullRange << CommandName << CorrectedName
>>> -              << FixItHint::CreateReplacement(CommandRange,
>>> CorrectedName);
>>> -          } else {
>>> -            formTokenWithChars(T, TokenPtr, tok::unknown_command);
>>> -            T.setUnknownCommandName(CommandName);
>>> -            Diag(T.getLocation(), diag::warn_unknown_comment_com
>>> mand_name)
>>> -                << SourceRange(T.getLocation(), T.getEndLocation());
>>> -            return;
>>> -          }
>>> -        }
>>> -        if (Info->IsVerbatimBlockCommand) {
>>> -          setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);
>>> +      const CommandInfo *Info = Traits.getCommandInfoOrNULL(Co
>>> mmandName);
>>> +      if (!Info) {
>>> +        if ((Info = Traits.getTypoCorrectCommandInfo(CommandName))) {
>>> +          StringRef CorrectedName = Info->Name;
>>> +          SourceLocation Loc = getSourceLocation(BufferPtr);
>>> +          SourceLocation EndLoc = getSourceLocation(TokenPtr);
>>> +          SourceRange FullRange = SourceRange(Loc, EndLoc);
>>> +          SourceRange CommandRange(Loc.getLocWithOffset(1), EndLoc);
>>> +          Diag(Loc, diag::warn_correct_comment_command_name)
>>> +            << FullRange << CommandName << CorrectedName
>>> +            << FixItHint::CreateReplacement(CommandRange,
>>> CorrectedName);
>>> +        } else {
>>> +          formTokenWithChars(T, TokenPtr, tok::unknown_command);
>>> +          T.setUnknownCommandName(CommandName);
>>> +          Diag(T.getLocation(), diag::warn_unknown_comment_com
>>> mand_name)
>>> +              << SourceRange(T.getLocation(), T.getEndLocation());
>>>            return;
>>>          }
>>> -        if (Info->IsVerbatimLineCommand) {
>>> -          setupAndLexVerbatimLine(T, TokenPtr, Info);
>>> -          return;
>>> -        }
>>> -        formTokenWithChars(T, TokenPtr, CommandKind);
>>> -        T.setCommandID(Info->getID());
>>> -        return;
>>>        }
>>> -
>>> -      case '&':
>>> -        lexHTMLCharacterReference(T);
>>> +      if (Info->IsVerbatimBlockCommand) {
>>> +        setupAndLexVerbatimBlock(T, TokenPtr, *BufferPtr, Info);
>>>          return;
>>> -
>>> -      case '<': {
>>> -        TokenPtr++;
>>> -        if (TokenPtr == CommentEnd) {
>>> -          formTextToken(T, TokenPtr);
>>> -          return;
>>> -        }
>>> -        const char C = *TokenPtr;
>>> -        if (isHTMLIdentifierStartingCharacter(C))
>>> -          setupAndLexHTMLStartTag(T);
>>> -        else if (C == '/')
>>> -          setupAndLexHTMLEndTag(T);
>>> -        else
>>> -          formTextToken(T, TokenPtr);
>>> +      }
>>> +      if (Info->IsVerbatimLineCommand) {
>>> +        setupAndLexVerbatimLine(T, TokenPtr, Info);
>>>          return;
>>>        }
>>> +      formTokenWithChars(T, TokenPtr, CommandKind);
>>> +      T.setCommandID(Info->getID());
>>> +      return;
>>> +    }
>>>
>>> -      case '\n':
>>> -      case '\r':
>>> -        TokenPtr = skipNewline(TokenPtr, CommentEnd);
>>> -        formTokenWithChars(T, TokenPtr, tok::newline);
>>> -
>>> -        if (CommentState == LCS_InsideCComment)
>>> -          skipLineStartingDecorations();
>>> -        return;
>>> +    case '&':
>>> +      lexHTMLCharacterReference(T);
>>> +      return;
>>>
>>> -      default: {
>>> -        size_t End = StringRef(TokenPtr, CommentEnd - TokenPtr).
>>> -                         find_first_of("\n\r\\@&<");
>>> -        if (End != StringRef::npos)
>>> -          TokenPtr += End;
>>> -        else
>>> -          TokenPtr = CommentEnd;
>>> +    case '<': {
>>> +      TokenPtr++;
>>> +      if (TokenPtr == CommentEnd) {
>>>          formTextToken(T, TokenPtr);
>>>          return;
>>>        }
>>> +      const char C = *TokenPtr;
>>> +      if (isHTMLIdentifierStartingCharacter(C))
>>> +        setupAndLexHTMLStartTag(T);
>>> +      else if (C == '/')
>>> +        setupAndLexHTMLEndTag(T);
>>> +      else
>>> +        formTextToken(T, TokenPtr);
>>> +      return;
>>>      }
>>> +
>>> +    default:
>>> +      return HandleNonCommandToken();
>>>    }
>>>  }
>>>
>>> @@ -727,14 +740,13 @@ void Lexer::lexHTMLEndTag(Token &T) {
>>>  }
>>>
>>>  Lexer::Lexer(llvm::BumpPtrAllocator &Allocator, DiagnosticsEngine
>>> &Diags,
>>> -             const CommandTraits &Traits,
>>> -             SourceLocation FileLoc,
>>> -             const char *BufferStart, const char *BufferEnd):
>>> -    Allocator(Allocator), Diags(Diags), Traits(Traits),
>>> -    BufferStart(BufferStart), BufferEnd(BufferEnd),
>>> -    FileLoc(FileLoc), BufferPtr(BufferStart),
>>> -    CommentState(LCS_BeforeComment), State(LS_Normal) {
>>> -}
>>> +             const CommandTraits &Traits, SourceLocation FileLoc,
>>> +             const char *BufferStart, const char *BufferEnd,
>>> +             bool ParseCommands)
>>> +    : Allocator(Allocator), Diags(Diags), Traits(Traits),
>>> +      BufferStart(BufferStart), BufferEnd(BufferEnd), FileLoc(FileLoc),
>>> +      BufferPtr(BufferStart), CommentState(LCS_BeforeComment),
>>> State(LS_Normal),
>>> +      ParseCommands(ParseCommands) {}
>>>
>>>  void Lexer::lex(Token &T) {
>>>  again:
>>>
>>> Modified: cfe/trunk/lib/AST/RawCommentList.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/RawCom
>>> mentList.cpp?rev=332458&r1=332457&r2=332458&view=diff
>>> ============================================================
>>> ==================
>>> --- cfe/trunk/lib/AST/RawCommentList.cpp (original)
>>> +++ cfe/trunk/lib/AST/RawCommentList.cpp Wed May 16 05:30:09 2018
>>> @@ -335,3 +335,94 @@ void RawCommentList::addDeserializedComm
>>>               BeforeThanCompare<RawComment>(SourceMgr));
>>>    std::swap(Comments, MergedComments);
>>>  }
>>> +
>>> +std::string RawComment::getFormattedText(const SourceManager
>>> &SourceMgr,
>>> +                                         DiagnosticsEngine &Diags)
>>> const {
>>> +  llvm::StringRef CommentText = getRawText(SourceMgr);
>>> +  if (CommentText.empty())
>>> +    return "";
>>> +
>>> +  llvm::BumpPtrAllocator Allocator;
>>> +  // We do not parse any commands, so CommentOptions are ignored by
>>> +  // comments::Lexer. Therefore, we just use default-constructed
>>> options.
>>> +  CommentOptions DefOpts;
>>> +  comments::CommandTraits EmptyTraits(Allocator, DefOpts);
>>> +  comments::Lexer L(Allocator, Diags, EmptyTraits,
>>> getSourceRange().getBegin(),
>>> +                    CommentText.begin(), CommentText.end(),
>>> +                    /*ParseCommands=*/false);
>>> +
>>> +  std::string Result;
>>> +  // A column number of the first non-whitespace token in the comment
>>> text.
>>> +  // We skip whitespace up to this column, but keep the whitespace
>>> after this
>>> +  // column. IndentColumn is calculated when lexing the first line and
>>> reused
>>> +  // for the rest of lines.
>>> +  unsigned IndentColumn = 0;
>>> +
>>> +  // Processes one line of the comment and adds it to the result.
>>> +  // Handles skipping the indent at the start of the line.
>>> +  // Returns false when eof is reached and true otherwise.
>>> +  auto LexLine = [&](bool IsFirstLine) -> bool {
>>> +    comments::Token Tok;
>>> +    // Lex the first token on the line. We handle it separately,
>>> because we to
>>> +    // fix up its indentation.
>>> +    L.lex(Tok);
>>> +    if (Tok.is(comments::tok::eof))
>>> +      return false;
>>> +    if (Tok.is(comments::tok::newline)) {
>>> +      Result += "\n";
>>> +      return true;
>>> +    }
>>> +    llvm::StringRef TokText = L.getSpelling(Tok, SourceMgr);
>>> +    bool LocInvalid = false;
>>> +    unsigned TokColumn =
>>> +        SourceMgr.getSpellingColumnNumber(Tok.getLocation(),
>>> &LocInvalid);
>>> +    assert(!LocInvalid && "getFormattedText for invalid location");
>>> +
>>> +    // Amount of leading whitespace in TokText.
>>> +    size_t WhitespaceLen = TokText.find_first_not_of(" \t");
>>> +    if (WhitespaceLen == StringRef::npos)
>>> +      WhitespaceLen = TokText.size();
>>> +    // Remember the amount of whitespace we skipped in the first line
>>> to remove
>>> +    // indent up to that column in the following lines.
>>> +    if (IsFirstLine)
>>> +      IndentColumn = TokColumn + WhitespaceLen;
>>> +
>>> +    // Amount of leading whitespace we actually want to skip.
>>> +    // For the first line we skip all the whitespace.
>>> +    // For the rest of the lines, we skip whitespace up to IndentColumn.
>>> +    unsigned SkipLen =
>>> +        IsFirstLine
>>> +            ? WhitespaceLen
>>> +            : std::min<size_t>(
>>> +                  WhitespaceLen,
>>> +                  std::max<int>(static_cast<int>(IndentColumn) -
>>> TokColumn, 0));
>>> +    llvm::StringRef Trimmed = TokText.drop_front(SkipLen);
>>> +    Result += Trimmed;
>>> +    // Lex all tokens in the rest of the line.
>>> +    for (L.lex(Tok); Tok.isNot(comments::tok::eof); L.lex(Tok)) {
>>> +      if (Tok.is(comments::tok::newline)) {
>>> +        Result += "\n";
>>> +        return true;
>>> +      }
>>> +      Result += L.getSpelling(Tok, SourceMgr);
>>> +    }
>>> +    // We've reached the end of file token.
>>> +    return false;
>>> +  };
>>> +
>>> +  auto DropTrailingNewLines = [](std::string &Str) {
>>> +    while (Str.back() == '\n')
>>> +      Str.pop_back();
>>> +  };
>>> +
>>> +  // Proces first line separately to remember indent for the following
>>> lines.
>>> +  if (!LexLine(/*IsFirstLine=*/true)) {
>>> +    DropTrailingNewLines(Result);
>>> +    return Result;
>>> +  }
>>> +  // Process the rest of the lines.
>>> +  while (LexLine(/*IsFirstLine=*/false))
>>> +    ;
>>> +  DropTrailingNewLines(Result);
>>> +  return Result;
>>> +}
>>>
>>> Modified: cfe/trunk/unittests/AST/CMakeLists.txt
>>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/
>>> CMakeLists.txt?rev=332458&r1=332457&r2=332458&view=diff
>>> ============================================================
>>> ==================
>>> --- cfe/trunk/unittests/AST/CMakeLists.txt (original)
>>> +++ cfe/trunk/unittests/AST/CMakeLists.txt Wed May 16 05:30:09 2018
>>> @@ -9,6 +9,7 @@ add_clang_unittest(ASTTests
>>>    ASTVectorTest.cpp
>>>    CommentLexer.cpp
>>>    CommentParser.cpp
>>> +  CommentTextTest.cpp
>>>    DataCollectionTest.cpp
>>>    DeclPrinterTest.cpp
>>>    DeclTest.cpp
>>>
>>> Added: cfe/trunk/unittests/AST/CommentTextTest.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/unittests/AST/
>>> CommentTextTest.cpp?rev=332458&view=auto
>>> ============================================================
>>> ==================
>>> --- cfe/trunk/unittests/AST/CommentTextTest.cpp (added)
>>> +++ cfe/trunk/unittests/AST/CommentTextTest.cpp Wed May 16 05:30:09 2018
>>> @@ -0,0 +1,122 @@
>>> +//===- unittest/AST/CommentTextTest.cpp - Comment text extraction test
>>> ----===//
>>> +//
>>> +//                     The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>> +//===------------------------------------------------------
>>> ----------------===//
>>> +//
>>> +// Tests for user-friendly output formatting of comments, i.e.
>>> +// RawComment::getFormattedText().
>>> +//
>>> +//===------------------------------------------------------
>>> ----------------===//
>>> +
>>> +#include "clang/AST/RawCommentList.h"
>>> +#include "clang/Basic/CommentOptions.h"
>>> +#include "clang/Basic/Diagnostic.h"
>>> +#include "clang/Basic/DiagnosticIDs.h"
>>> +#include "clang/Basic/FileManager.h"
>>> +#include "clang/Basic/FileSystemOptions.h"
>>> +#include "clang/Basic/SourceLocation.h"
>>> +#include "clang/Basic/SourceManager.h"
>>> +#include "clang/Basic/VirtualFileSystem.h"
>>> +#include "llvm/Support/MemoryBuffer.h"
>>> +#include <gtest/gtest.h>
>>> +
>>> +namespace clang {
>>> +
>>> +class CommentTextTest : public ::testing::Test {
>>> +protected:
>>> +  std::string formatComment(llvm::StringRef CommentText) {
>>> +    SourceManagerForFile FileSourceMgr("comment-test.cpp",
>>> CommentText);
>>> +    SourceManager& SourceMgr = FileSourceMgr.get();
>>> +
>>> +    auto CommentStartOffset = CommentText.find("/");
>>> +    assert(CommentStartOffset != llvm::StringRef::npos);
>>> +    FileID File = SourceMgr.getMainFileID();
>>> +
>>> +    SourceRange CommentRange(
>>> +        SourceMgr.getLocForStartOfFile(File).getLocWithOffset(
>>> +            CommentStartOffset),
>>> +        SourceMgr.getLocForEndOfFile(File));
>>> +    CommentOptions EmptyOpts;
>>> +    // FIXME: technically, merged that we set here is incorrect, but
>>> that
>>> +    // shouldn't matter.
>>> +    RawComment Comment(SourceMgr, CommentRange, EmptyOpts,
>>> /*Merged=*/true);
>>> +    DiagnosticsEngine Diags(new DiagnosticIDs, new DiagnosticOptions);
>>> +    return Comment.getFormattedText(SourceMgr, Diags);
>>> +  }
>>> +};
>>> +
>>> +TEST_F(CommentTextTest, FormattedText) {
>>> +  // clang-format off
>>> +  auto ExpectedOutput =
>>> +R"(This function does this and that.
>>> +For example,
>>> +   Runnning it in that case will give you
>>> +   this result.
>>> +That's about it.)";
>>> +  // Two-slash comments.
>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>> +R"cpp(
>>> +// This function does this and that.
>>> +// For example,
>>> +//    Runnning it in that case will give you
>>> +//    this result.
>>> +// That's about it.)cpp"));
>>> +
>>> +  // Three-slash comments.
>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>> +R"cpp(
>>> +/// This function does this and that.
>>> +/// For example,
>>> +///    Runnning it in that case will give you
>>> +///    this result.
>>> +/// That's about it.)cpp"));
>>> +
>>> +  // Block comments.
>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>> +R"cpp(
>>> +/* This function does this and that.
>>> + * For example,
>>> + *    Runnning it in that case will give you
>>> + *    this result.
>>> + * That's about it.*/)cpp"));
>>> +
>>> +  // Doxygen-style block comments.
>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>> +R"cpp(
>>> +/** This function does this and that.
>>> +  * For example,
>>> +  *    Runnning it in that case will give you
>>> +  *    this result.
>>> +  * That's about it.*/)cpp"));
>>> +
>>> +  // Weird indentation.
>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>> +R"cpp(
>>> +       // This function does this and that.
>>> +  //      For example,
>>> +  //         Runnning it in that case will give you
>>> +        //   this result.
>>> +       // That's about it.)cpp"));
>>> +  // clang-format on
>>> +}
>>> +
>>> +TEST_F(CommentTextTest, KeepsDoxygenControlSeqs) {
>>> +  // clang-format off
>>> +  auto ExpectedOutput =
>>> +R"(\brief This is the brief part of the comment.
>>> +\param a something about a.
>>> +@param b something about b.)";
>>> +
>>> +  EXPECT_EQ(ExpectedOutput, formatComment(
>>> +R"cpp(
>>> +/// \brief This is the brief part of the comment.
>>> +/// \param a something about a.
>>> +/// @param b something about b.)cpp"));
>>> +  // clang-format on
>>> +}
>>> +
>>> +} // namespace clang
>>>
>>>
>>> _______________________________________________
>>> cfe-commits mailing list
>>> cfe-commits@lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>>
>>
>>
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>
>

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: r332458 - [AST] Added a helper to extract a user-friendly text of a comment.

Reply via email to