On 13/03/2021 12:24, Ihor Radchenko wrote:
Here is the right one
I think, some unit tests would be great to avoid future regressions. Since it is heuristics, some cases will be always broken, so tests could clarify intentions.
I have not tested the regexp on real cases, so you have full rights to be skeptical concerning my comments. I prefer to have explicit URLs in my files.
+ (let ((non-space-bracket "[^][ \t\n()<>]+"))
I think, "+" is redundant here, it does not allow empty inner parenthesis and leads to nested construction "(x+)+" that could be simplified to "x+"
+ ;; Heiristics for an URL link. Source:
Typo: Heuristics
+ ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls + (rx-to-string + `(seq (regexp "\\<")
Isn't it an equivalent to "word-start"?
+ (regexp ,types-re) + ":" + (1+ (or (regex ,non-space-bracket) + (seq "(" + (* (or (regex ,non-space-bracket) + (seq "(" + (regex ,non-space-bracket)
Maybe "0+" to allow "(())"?
+ ")"))) + ")"))) + (or (seq "(" + (* (or (regex ,non-space-bracket) + (seq "(" + (regex ,non-space-bracket) + ")"))) + ")") + (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))
Is the group useful for any purpose? The construction is already inside an "or".
I have tried to compare current, your, and a little modified your regexps on several synthetic examples:
#+begin_src elisp (let* ((types-re (regexp-opt (org-link-types) t)) (org-link-plain-re-ir (let ((non-space-bracket "[^][ \t\n()<>]+")) ;; Heiristics for an URL link. Source: ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls (rx-to-string `(seq (regexp "\\<") (regexp ,types-re) ":" (1+ (or (regex ,non-space-bracket) (seq "(" (* (or (regex ,non-space-bracket) (seq "(" (regex ,non-space-bracket) ")"))) ")"))) (or (seq "(" (* (or (regex ,non-space-bracket) (seq "(" (regex ,non-space-bracket) ")"))) ")") (regexp "\\([^[:punct:] \t\n]\\|/\\)")))))) (org-link-plain-re-nm (let* ((non-space-bracket "[^][ \t\n()<>]") (parenthesis `(seq "(" (0+ (or (regex ,non-space-bracket) (seq "(" (0+ (regex ,non-space-bracket)) ")"))) ")"))) ;; Heuristics for an URL link inspired by ;; https://daringfireball.net/2010/07/improved_regex_for_matching_urls (rx-to-string `(seq word-start (regexp ,types-re) ":" (1+ (or (regex ,non-space-bracket) ,parenthesis)) (or (regexp "[^[:punct:] \t\n]") ?/ ,parenthesis)))))) (cons (list "src" "orig" "ir" "nm") (mapcar (lambda (s) (cons s (mapcar (lambda (r) (if (string-match r s) (match-string 0 s) "")) (list org-link-plain-re org-link-plain-re-ir org-link-plain-re-nm)))) (list "The file:a link" "The file:aa link" "The file:a(b)c link" "The file:a() link" "The file:aa((a)) link" "The file:aa(()) link" "The file:///a link" "The file:///a/, link" "The http:// link" "The (some file:ab) link" "The file:aa) link" "The file:aa( link" )))) #+end_src #+RESULTS: | src | orig | ir | nm | | The file:a link | | | | | The file:aa link | file:aa | file:aa | file:aa | | The file:a(b)c link | file:a(b) | file:a(b)c | file:a(b)c | | The file:a() link | | file:a() | file:a() | | The file:aa((a)) link | file:aa | file:aa((a)) | file:aa((a)) | | The file:aa(()) link | file:aa | file:aa | file:aa(()) | | The file:/a link | file:/a | file:/a | file:/a | | The file:/a/, link | file:/a/ | file:/a/ | file:/a/ | | The http:// link | http:// | http:// | http:// | | The (some file:ab) link | file:ab | file:ab | file:ab | | The file:aa) link | file:aa | file:aa | file:aa | | The file:aa( link | file:aa | file:aa | file:aa |