[PATCH] D134130: [clangd] Add doxygen parsing for Hover [1/3]

Kadir Cetinkaya via Phabricator via cfe-commits Tue, 13 Dec 2022 08:19:17 -0800

kadircet added a comment.

Hi!


Sorry for letting these series of patches sit around without any comments. We 
were having some discussions internally to both understand the value 
proposition and its implications on the infrastructure.
So it'd help a lot if you can provide some more information for use cases, that 
way we can find a nice scope for this functionality to make sure it provides 
most of the value and doesn't introduce an unnecessary complexity into rest of 
the infrastructure and also we should try not to regress indexing of projects 
that don't have doxygen comments.

So first of all, what are the exact use cases you're planning to 
address/improve with support for doxygen parsing of comments? Couple that comes 
to mind:

- obtaining docs about params & return value
- stripping doxygen commands
- treating brief/detail/warning/note differently
- formatting text within comments (bold etc)
- getting linebreaks/indent right clangd#1040 
<https://github.com/clangd/clangd/issues/1040>

any other use cases that you believe are important?

as you might've noticed, this list already talks about dealing with certain 
doxygen commands (but not all).
that list is gathered by counting occurrences of those commands in a codebase 
with lots of open-source third_party code. findings and some initial ideas look 
like this:

- \brief: ~70k occurrences
  - common but usefulness in practice is unclear
  - can infer for non-doxy too (e.g. first sentence of a regular documentation)
  - maybe just strip (or merge into regular documentation)?
- \return[s]: 30k occurrences
  - unclear if worth separation in hover, because it might be tied to rest of 
the documentation (re-ordering concerns)
  - can infer for non-doxy maybe?
  - probably just strip the command and keep the rest as-is.
- \param: 28k occurrences
  - useful for signature help. maybe hover on func calls
  - probably worth storing in a structured manner.
- \detail[s]: 2k
- \p: 20k
- \code: 1k
- \warning: 2k
- \note: 9k
  - (for all of the above) just introduce as formatted text?

what do you think about those conclusions? any other commands that you seem 
worth giving a thought?
One important concern we've noticed around this part is, re-ordering comment 
components might actually hinder readability. as the comments are usually 
written with the assumption that they'll be read from top to bottom, but if we 
re-order them during presentation (e.g. hover has its own layout) we might 
start referring to concepts/entities in documentation before they're 
introduced. so we believe it's important to avoid any sort of re-ordering. this 
is one of the big down sides for preserving parameter comments in a structured 
way.

another important thing to consider is trying to heuristically infer some of 
these fields for non-doxygen code bases as well. that way we can provide a 
similar experience for both.

some other things to discuss about the design overall:

- How to store the extra information?
  - Proposal from our side would be to introduce structured storage for the 
pieces we want (limited), and keep the rest as part of main documentation text 
while doing stripping/reformatting.
- What to use as a parser?
  - Clang's doxygen parser actually looks like a great piece of code to re-use, 
it's unfortunate that it can issue diagnostics, requires AST etc. It'd be great 
to refactor that into a state where we can use it without any AST or 
diagnostics, and a minimal SourceManager (this seems feasible to achieve at 
first glance, as most of these inputs seem to be optional or used in 1 or 2 
places).
  - we still need to make sure performance and behaviour on non-doxygen is 
reasonable though. do you have any numbers here?
- How to store in the index?
  - If we can strip the parser off the dependencies on an astcontext, 
diagnostics etc. the best option would be to just store as raw text and run the 
whole pipeline on demand (e.g. do the doxygen parsing and markdown-ization 
afterwards). This is the simplest approach as it keeps index interfaces the 
same.

Happy to move the discussion to some other medium as well, if you would like to 
have them in discourse/github etc.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134130/new/

https://reviews.llvm.org/D134130

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D134130: [clangd] Add doxygen parsing for Hover [1/3]

Reply via email to