ilya-biryukov added a comment.

This change LG as an extraction of the helper functionality to be reused in 
clang, clang-tidy, etc.
However, I feel there are potential improvements both to the underlying code 
and the new APIs that we could make.

I left some comments, trying to focus on interface of the class and keeping the 
changes that would be required to address them to be purely cosmetic. As 
chatted offline, any non-trivial changes to the underlying implementation are 
out of scope of this patch.

> preserver sorted #includes.

A typo in the change description:
s/preserver/preserve

> This is best effort - only works when the
>  block is already sorted.

Could we describe behavior for the unsorted case in a sensible way? E.g. is it 
added to the end of the include list, added after closely related headers, 
added to a totally random place (but deterministic) place, etc? 
It seems like an important case, and I believe we shouldn't completely ignore 
it and describe what the "best effort" means in that case.

> When inserting multiple #includes to the end of a file which doesn't
>  end with a "\n" character, a "\n" will be prepended to each #include.
>  This is a special and rare case that was previously handled. This is now
>  relaxed to avoid complexity as it's rare in practice.

I don't quite get it, could you please elaborate on what has changed here 
exactly? Maybe give an example?



================
Comment at: lib/Format/Format.cpp:1691
   // 0. Otherwise, returns the priority of the matching category or INT_MAX.
-  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) {
+  int getIncludePriority(StringRef IncludeName, bool CheckMainHeader) const {
     int Ret = INT_MAX;
----------------
I would personally keep the function non-const and not use mutable fields here.
Even though it's logically const, I would strive towards keeping the things 
`const` only if there are actually immutable
One particular problem that could be avoided is accidentally calling the const 
methods concurrently on different threads.

But up to you if you actually want to make this change.


================
Comment at: lib/Format/Format.cpp:1988
 
-bool isDeletedHeader(llvm::StringRef HeaderName,
-                     const std::set<llvm::StringRef> &HeadersToDelete) {
-  return HeadersToDelete.count(HeaderName) ||
-         HeadersToDelete.count(HeaderName.trim("\"<>"));
-}
+/// Generates replacements for inserting or deleting #include headers in a 
file.
+class HeaderIncludes {
----------------
Maybe s/#include headers/#include directives/?
This is how they called in terminology of preprocessing.


================
Comment at: lib/Format/Format.cpp:1995
+  /// Inserts an #include directive of \p IncludeName into the code. If \p
+  /// IncludeName is quoted e.g. <...> or "...", it will be #included as is;
+  /// by default,  it is quoted with "".
----------------
Maybe make an API more explicit about the style of includes that should be used?
Make it literally impossible to misuse the API:
```
enum class IncludeStyle { Quoted, Angle };
/// \p IncludeName is the include path to be inserted, \p Style specifies 
whether the included path should use quotes or angle brackets.
Optional<Replacement> insert(StringRef IncludeName, IncludeStyle Style = 
IncludeStyle::Quoted) const;
```



================
Comment at: lib/Format/Format.cpp:2013
+
+  /// Removes all existing #includes of \p Header. If \p Header is not quoted
+  /// (e.g. without <> or ""), #include of \p Header with both quotations will
----------------
Have you considered changing the API slightly to allow iterating over all 
includes in the file and APIs to remove includes pointed by the iterators?
It would give more flexibility, allowing the clients to implement useful things 
like:
- Remove all #includes that start with a prefix, e.g. `clang/Basic/....`.
- Find and remove all duplicate #include directives.
- etc, etc.

The current implementation seems tied to a specific use-case that we have in 
clang-format, i.e. "preprocess once, remove headers in batches". 
It feels wasteful to pay overhead of `StringMap<vector<Include>>` sometimes, 
e.g. if the code wants to insert just a single header or do other things, like 
removing duplicate headers.

I.e. I propose to extract the code that parses the source code and produces a 
list of #include directives with their respective ranges and filenames into a 
separate function/class.
This would. arguably, improve readability of the include insertion code.

That might not be a trivial change, though, since it requires rewriting some of 
the code used here, while the current patch seems to focus on simply extracting 
useful functionality from `clang-format` for reuse in clangd, clang-tidy, etc.
Let me know what you think.


================
Comment at: lib/Format/Format.cpp:2015
+  /// (e.g. without <> or ""), #include of \p Header with both quotations will
+  /// be removed.
+  tooling::Replacements remove(llvm::StringRef Header) const;
----------------
We should mention that remove only deletes exact matches and won't try to 
resolve filenames, patch up path separators, etc.


================
Comment at: lib/Format/Format.cpp:2016
+  /// be removed.
+  tooling::Replacements remove(llvm::StringRef Header) const;
 
----------------
Maybe also add a param of type `Optional<IncludeStyle>` and 
  - only remove includes with the specified style, if specified the value is 
specified
  - remove both styles the value is `llvm::None`.


================
Comment at: lib/Format/Format.cpp:2023
+    // An include header quoted with either <> or "".
+    std::string Name;
+    // The range of the whole line of include directive including any eading
----------------
Maybe use `StringRef` here?


================
Comment at: lib/Format/Format.cpp:2031
 
-  llvm::Regex IncludeRegex(IncludeRegexPattern);
-  llvm::Regex DefineRegex(R"(^[\t\ ]*#[\t\ ]*define[\t\ ]*[^\\]*$)");
-  SmallVector<StringRef, 4> Matches;
+  StringRef FileName;
+  StringRef Code;
----------------
What do we use the `FileName` for?
On a higher level, it seems we shouldn't need it to implement what 
`HeaderIncludes` does.


================
Comment at: lib/Format/Format.cpp:2040
+  std::unordered_map<int, llvm::SmallVector<const Include *, 8>>
+      IncludesByPriority;
 
----------------
Could you please add a comment explaining what are those priorities and what 
this map is used for?


================
Comment at: lib/Format/Format.cpp:2046
+  // Code with the first MinInsertOffset characters trimmed.
+  StringRef TrimmedCode;
+  // Max insertion offset in the original code.
----------------
Is this field always equal to `Code.drop_front(MinInsertOffset)`?
If so, maybe compute it when needed and don't store explicitly?


================
Comment at: lib/Format/Format.cpp:2048
+  // Max insertion offset in the original code.
+  unsigned MaxInsertOffset;
+  IncludeCategoryManager Categories;
----------------
Could you add a comment explaining how min and max offsets are calculated? I 
assume minoffset goes after the file comment and maxoffset is where 
non-preprocessor code starts, but it would be nice to have a short comment to 
know for sure.


Repository:
  rC Clang

https://reviews.llvm.org/D46180



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to