On Tue, May 13, 2025 at 10:28 AM Ihor Radchenko <yanta...@posteo.net> wrote:
> > --- a/lisp/ol-bibtex.el > > +++ b/lisp/ol-bibtex.el > > @@ -107,6 +107,7 @@ > > > > ;;; Code: > > > > +(require 'org) > > Please don't. (require 'org) is to be avoided. We have enough cyclic > dependencies to handle already. Just use (defvar ...) declaration. Fixed > > - "\\(:[[:alnum:]_@#%:]+:\\)[ \t]*$" > > + (concat org-tag-group-enclosed-re "[ \t]*$") > > I am not a big fan of regexps that enclose the whole regexp in a group. > Groups are not free performance-wise and should be avoided when > possible. Instead, just use group-less version of the constant and add > groups in place if strictly necessary. Good catch. It was my oversight. I thought they were strictly equivalent. > > -(defconst org-tag-re "[[:alnum:]_@#%]+" > > +(defconst org-tag-valid-char-set "[:alnum:]_@#%" > > + "[Tag] A string representing the set of characters and character > > +classes valid within a tag. This is the base pattern for tag > > +matching regex.") > > Please get rid of [Tag]. What does it even refer to? > Also, the first line of the docstring should concisely describe what the > variable does. See > https://www.gnu.org/software/emacs/manual/html_node/elisp/Documentation-Tips.html Done. > Also, we should probably make this constant internal. I am not sure if > this design will stick, so let's introduce the minimal number of new > public variables. How do I define an internal constant? Like org--tag-valid-char-set? > > +(defconst org-tag-invalid-char-re > > + (format "[^%s]" org-tag-valid-char-set) > > + "[Tag] Regexp matching a single character that's NOT a valid tag char.") > > This constant is only useful in very specific context - when we need to > transform arbitrary strings into something that can be used as tags. > Rather than introducing such constant, better factor out a function > doing the replacement. This is the invalid char set. It's the opposite of what you described. It is used to replace the invalid tag chars with colons, although the semantic is probably not important. Having this as a function instead of a const is OK, but I don't think it's too much of a difference tbh. There's rarely a scenario you would need to call this function at run time with a different input value. I can update if you feel strongly about this. > > +(defconst org-tag-group-enclosed-re > > + (format "\\(:\\([%s:]+\\):\\)" org-tag-valid-char-set) > > + "Regex pattern for a colon-enclosed group of tags without matching > > +the enclosing spaces and tabs, e.g., \":TAG1:TAG2:\". Match group > > +1 stores the tags with the enclosing colons, and match group 2 > > +stores the tags without the enclosing colons. Built using > > +org-tag-valid-char-set with the addition of the colon.") > > I do not think that we really need this one, considering my other comments. The alternative is to copy paste (format "\\(:\\([%s:]+\\):\\)" org-tag-valid-char-set) 3 times if we don't have this as a variable. I understand the need to reduce the number of confusing vars, but I feel this one is well deserved because of how often it's needed in other places. > > +(defconst org-tag-group-optional-re > > + (concat "\\(?:[ \t]+" org-tag-group-enclosed-re "\\)?[ \t]*$") > > + "Regexp matching an optional tag group at the end of a line, > > + with optional leading and trailing spaces. If a tag group is > > +present, group 1 is the full tag group (with colons), group 2 is > > +the tag content (without colons).") > > This is nothing but org-tag-group-re enclosed in shy group + ? > Redundant. > No, not exactly. The org-tag-group-re matches the tags group, the leading and trailing spaces. org-tag-group-optional-re has the tags group optional, but still needs to match the leading and trailing spaces* > >> Since we are rewriting things here, may as well use rx for better > >> readability. > > Ah, this looks like a much better alternative. However, I'm not super > > familiar with rx, and would like to keep this refactoraization patch > > semantically equivalent to the old code. Do you mind if I do the rx > > change in a next path with a proper planning of unit tests whatnot? > > No problem. Thank you. I'll dig into rx after we support emojis or whatever other unicode that are needed as part of tags. > P.S. Please avoid top-posting. See > https://orgmode.org/worg/org-mailing-list.html#orgc4ee9e4 > Sounds good. Please let me know if this works.
0001-lisp-org.el-refactor-the-org-tag-family-regex-to-pre.patch
Description: Binary data