krasimir added a comment.

Here's some empirical ideas about the approach: In clang-format, braces can be 
handled in two quite distinct ways, controlled by BraceBlockKind 
<https://github.com/llvm/llvm-project/blob/9c4afce7024aa1fe4efe2631240d1d766b4ea257/clang/lib/Format/FormatToken.h#L123>:
 BK_Block and BK_BracedInit.
BK_Block is for braces that open blocks that usually are at one level deeper 
and consist of a sequence of statements and other constructs.
BK_BracedInit is for initializer lists, dictionaries and other similar 
syntactics that are somewhat more appropriate to put together into a line.
The level of granularity of detailed formatting in clang-format is an unwrapped 
line, which is conceptually a sequence of tokens that "make sense together as a 
single line" (without going much into style details and ignoring column 
limits). Separate statements are for example put on separate unwrapped lines. 
The formatting information flowing between unwrapped lines includes just 
higher-level data such as the current nesting depth.
The brace detection is handled primarily in 
UnwrappedLineParser::calculateBraceTypes 
<https://github.com/llvm/llvm-project/blob/9f8a7e82b85078b5afbbc44429355f156e044205/clang/lib/Format/UnwrappedLineParser.cpp#L421>,
 which happens quite early during the parsing of the input sequence.

If an opening brace is marked BK_Block there, later the stuff between it and 
the matching closing brace is parsed as a block: multiple "statements" are put 
on their own unrwapped lines, inside the block.
If the brace is marked BK_BracedInit, the staff following it is parsed more 
like dictionary-struct-array-literal stuff, and importantly is kept on the same 
unwrapped line as the surrounding code (as a braced list may occur as a 
subexpression of a larger expression, and the formatting of the larger 
expression may depend non-trivially by the formatting of the braced list).

For example, consider the following pseudo-C-family fragment:

  % cat test.cc                                          
  
  int f() {
    block_example {
      get;
      set;
    };
  
    int init_list_example({1, 2, {more}}, other);
  }

If we examine the parsed unwrapped lines (`clang-format -debug`), they look 
like:

  Line(0, FSC=0): int[T=81, OC=0] identifier[T=81, OC=4] l_paren[T=81, OC=5] 
r_paren[T=81, OC=6] l_brace[T=21, OC=8] 
  Line(1, FSC=0): identifier[T=81, OC=2] l_brace[T=21, OC=16] 
  Line(2, FSC=0): identifier[T=81, OC=4] semi[T=81, OC=7] 
  Line(2, FSC=0): identifier[T=81, OC=4] semi[T=81, OC=7] 
  Line(1, FSC=0): r_brace[T=81, OC=2] semi[T=81, OC=3] 
  Line(1, FSC=0): int[T=81, OC=2] identifier[T=81, OC=6] l_paren[T=81, OC=23] 
l_brace[T=81, OC=24] numeric_constant[T=81, OC=25] comma[T=81, OC=26] 
numeric_constant[T=81, OC=28] comma[T=81, OC=29] l_brace[T=81, OC=31] 
identifier[T=81, OC=32] r_brace[T=81, OC=36] r_brace[T=81, OC=37] comma[T=81, 
OC=38] identifier[T=81, OC=40] r_paren[T=81, OC=45] semi[T=81, OC=46] 
  Line(0, FSC=0): r_brace[T=81, OC=0] 
  Line(0, FSC=0): eof[T=81, OC=0] 

The block-like thing is put on separate lines at level 2; the braced-list-like 
thing is kept on the same unwrapped line.

Currently, C# auto property get/set "blocks" are parsed as BK_Block, hence the 
need to later merge lines using a heuristic.
If you could instead mark those as BK_BracedInit in `calculateBraceTypes`, that 
will have the effect of keeping them inline with the surrounding code and might 
produce good formatting of the whole line with a few tweaks.
There's another complication: syntactically in C++, a semicolon is very 
special. For example, C/C++ requires a semicolon after class declarations. 
There's a bunch of places in clang-formatthat use the presence of semicolons to 
determine "end of statement/end of line". So the semicolons in `{ get; set; }` 
might cause a bit of trouble and throw the parser off a bit. Fortunately, 
clang-format already has some code that deals with similar complications for 
javascript, where stuff like `X<{a: string; b: number}>` is correctly handled 
as a braced list, even in the presence of semicolons inside. You can look at 
how this is handled for inspiration (I'm not 100% sure these are the only 
places in code that contribute to the formatting of these constructs):

- in calculateBraceTypes here: 
https://github.com/llvm/llvm-project/blob/9f8a7e82b85078b5afbbc44429355f156e044205/clang/lib/Format/UnwrappedLineParser.cpp#L446
- and later in parseBracedList the semicolons are specially treated here: 
https://github.com/llvm/llvm-project/blob/9f8a7e82b85078b5afbbc44429355f156e044205/clang/lib/Format/UnwrappedLineParser.cpp#L1698

I hope this is helpful, but please take it with a grain of salt as I'm not very 
familiar with those parts of clang-format and am mostly poking around it and 
looking at how similar constructs for other languages are handled.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75194/new/

https://reviews.llvm.org/D75194



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to