Hello, For the record `org-match-substring-regexp' is a variation on:
"\\(\\S-\\)\\([_^]\\)\\(\\(?:\\*\\|[-+]?[^-+*!@#$%^_ \t\r\n,:\"?<>~;./{}=()]+\\)\\)\\)" I think it is a bit convoluted and therefore difficult to predict. For example, as recent bug report showed, you may tend to interpret a_b[fn:1] as a_{b}[fn:1] but, in fact, it is equivalent to a_{b[fn}:1] Of course, we can prevent this by forbidding "[" and "]" in the last part of the regexp. But I wonder if there's something better to do. The idea behind this regexp is that we should be able to write simple sub/superscript, including numbers and entities, without requiring curly braces (see `org-use-sub-superscripts' docstring for details). Maybe something like the following could be an interesting alternative: "\\(\\S-\\)\\([_^]\\)\\(\\*\\|[+-]?\\(?:\\w\\|[0-9.,\\]\\)*\\(\\w\\|[0-9]\\)\\)" That is, without braces, either an asterisk or any combination of word, number, dot, comma and backslash characters, which may start with either a plus or a minus sign but cannot end with either a dot or a comma. I find it arguably more predictable (no inverted class). Also, we "gain" the following: a^3.14. <=> a^{3.14}. At the moment, a^3.14. <=> a^{3}.14. What do you think? Regards, -- Nicolas Goaziou