>> I am not sure if we need to be so granular in constants. Maybe we can > instead just have `org-tag-re' and derive other regexps from it? > > I'm following the existing pattern for one const per use case. This > trades code readability for less duplication. But I think maybe it > makes sense to reduce the number of consts. To do that, I would like > to agree on a few semantically meaningful subcomponents such as > org-tag-re, org-tag-group (with : between tags), org-tag-group-line, > and have other regex defined inline? But for now I'll just all the > consts. Please take a look and let me know which ones you think would > not make sense to be consts.
> --- a/lisp/ol-bibtex.el > +++ b/lisp/ol-bibtex.el > @@ -107,6 +107,7 @@ > > ;;; Code: > > +(require 'org) Please don't. (require 'org) is to be avoided. We have enough cyclic dependencies to handle already. Just use (defvar ...) declaration. > - "\\(:[[:alnum:]_@#%:]+:\\)[ \t]*$" > + (concat org-tag-group-enclosed-re "[ \t]*$") I am not a big fan of regexps that enclose the whole regexp in a group. Groups are not free performance-wise and should be avoided when possible. Instead, just use group-less version of the constant and add groups in place if strictly necessary. > -(defconst org-tag-re "[[:alnum:]_@#%]+" > +(defconst org-tag-valid-char-set "[:alnum:]_@#%" > + "[Tag] A string representing the set of characters and character > +classes valid within a tag. This is the base pattern for tag > +matching regex.") Please get rid of [Tag]. What does it even refer to? Also, the first line of the docstring should concisely describe what the variable does. See https://www.gnu.org/software/emacs/manual/html_node/elisp/Documentation-Tips.html Also, we should probably make this constant internal. I am not sure if this design will stick, so let's introduce the minimal number of new public variables. > +(defconst org-tag-invalid-char-re > + (format "[^%s]" org-tag-valid-char-set) > + "[Tag] Regexp matching a single character that's NOT a valid tag char.") This constant is only useful in very specific context - when we need to transform arbitrary strings into something that can be used as tags. Rather than introducing such constant, better factor out a function doing the replacement. > +(defconst org-tag-group-enclosed-re > + (format "\\(:\\([%s:]+\\):\\)" org-tag-valid-char-set) > + "Regex pattern for a colon-enclosed group of tags without matching > +the enclosing spaces and tabs, e.g., \":TAG1:TAG2:\". Match group > +1 stores the tags with the enclosing colons, and match group 2 > +stores the tags without the enclosing colons. Built using > +org-tag-valid-char-set with the addition of the colon.") I do not think that we really need this one, considering my other comments. > +(defconst org-tag-group-optional-re > + (concat "\\(?:[ \t]+" org-tag-group-enclosed-re "\\)?[ \t]*$") > + "Regexp matching an optional tag group at the end of a line, > + with optional leading and trailing spaces. If a tag group is > +present, group 1 is the full tag group (with colons), group 2 is > +the tag content (without colons).") This is nothing but org-tag-group-re enclosed in shy group + ? Redundant. >> Since we are rewriting things here, may as well use rx for better >> readability. > > Ah, this looks like a much better alternative. However, I'm not super > familiar with rx, and would like to keep this refactoraization patch > semantically equivalent to the old code. Do you mind if I do the rx > change in a next path with a proper planning of unit tests whatnot? No problem. P.S. Please avoid top-posting. See https://orgmode.org/worg/org-mailing-list.html#orgc4ee9e4 -- Ihor Radchenko // yantar92, Org mode maintainer, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>