This version fixes C++ template propagation of musttail, avoids ICEs for ARM (and probably some other targets) generation of -O0 tail calls, and improves the error messages in tree-musttail again, as well as the documentation.
I bootstrapped/tested it on x86_64-linux, and checked the musttail tests work on arm and riscv targets. -O0 is still not as good as clang (e.g. it doesn't handle struct returns), but I believe it's good enough for now to be usable.