hokein added a comment. As discussed offline, we decided to stop spending effort on improving the cppreference_parser.py, instead, we allow human edits in the generated `StdSymbolMap.inc` and `CSymbolMap.inc`, it gives us more flexibility: sort the headers for multi-header symbol in a humn-desired order rather than suboptimal alphabet order; symbols (e.g. `std::experimental::filesystem`) that are not available in cppreference website; symbols (e.g. `PRIX16`) from a complicated cppreference HTML page which is hard to parse in cppreference_parser.py etc.
We have two major things in this patch: 1. extend the Stdlib API to support multiple-header symbols 2. emits multiple-header symbols in the *.inc files For 1), this change is good. We need to make sure the change not break other users of the `.inc` files. clangd is the main user, the behavior will be changed in https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/index/CanonicalIncludes.cpp#L736 as now we have duplicated qualified names (the ideal solution is to ignore multiple-header symbols, I think it is fine now if we just add a single `std::size_t` symbol, see my comment below) For 2), I have some concerns, it looks like some multiple-headers symbols that are handled incorrectly by the `cppreference_parser.py` (see my other comments). I think the best way for us is to list them manually (no action required, I will make a followup) I'd suggest to narrow the scope of the patch: - keep 1) and add a unittest in `llvm-project/clang/unittests/Tooling/StandardLibraryTest.cpp`; - drop all generated multi-header symbols in *.inc files, instead, we manually add a single one `std::size_t` symbol to the `StdSymbolMap.inc` for the StandardLibrary unittest. This means we need to revert the current change of `gen_std.py`; ================ Comment at: clang/include/clang/Tooling/Inclusions/CSymbolMap.inc:80 SYMBOL(FOPEN_MAX, None, <stdio.h>) +SYMBOL(FP_FAST_FMA, None, <math.h>) +SYMBOL(FP_FAST_FMAF, None, <math.h>) ---------------- These symbols seem to only have a single header, I think it is an improvement effect of the new change of parsing logic in cppreference_parser.py. ================ Comment at: clang/include/clang/Tooling/Inclusions/CSymbolMap.inc:96 +SYMBOL(INT16_C, None, <inttypes.h>) +SYMBOL(INT16_C, None, <stdint.h>) SYMBOL(INT16_MAX, None, <stdint.h>) ---------------- This type of symbol (`INTX_C`) seems not correct, they are only from <stdint.h> -- (this is a limitation of our current cppreference parser (it can not handle the complex page: https://en.cppreference.com/w/c/types/integer) ================ Comment at: clang/include/clang/Tooling/Inclusions/CSymbolMap.inc:172 SYMBOL(ONCE_FLAG_INIT, None, <threads.h>) +SYMBOL(PRIX16, None, <inttypes.h>) +SYMBOL(PRIX16, None, <stdint.h>) ---------------- I think this kind of symbol (`PRIdN`, `PRIdLEASTN`, `PRIdFASTN`, `PRIdMAX`, `PRIdPTR` etc) is only from `<inttypes.h>` (another limitation of the cppreference parser). ================ Comment at: clang/lib/Tooling/Inclusions/Stdlib/StandardLibrary.cpp:33 #undef SYMBOL SymbolNames = new std::remove_reference_t<decltype(*SymbolNames)>[SymCount]; SymbolHeaderIDs = ---------------- It is sad that we're allocating more memory for these two arrays now (as we have multiple SYMBOLs for the same qualified name). No action needed, can you add a comment saying that we're allocate more space here? ================ Comment at: clang/tools/include-mapping/test.py:56 """ - self.assertEqual(_ParseSymbolPage(html, 'foo'), set(['<cmath>'])) + self.assertEqual(_ParseSymbolPage(html, 'foo', ""), set(['<cmath>'])) ---------------- The test needs to update as well, as you remove the change of `_ParseSymbolPage`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D142092/new/ https://reviews.llvm.org/D142092 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits