>> I am not sure if we need to be so granular in constants. Maybe we can
> instead just have `org-tag-re' and derive other regexps from it?
>
> I'm following the existing pattern for one const per use case. This
> trades code readability for less duplication. But I think maybe it
> makes sense to reduce the number of consts. To do that, I would like
> to agree on a few semantically meaningful subcomponents such as
> org-tag-re, org-tag-group (with : between tags), org-tag-group-line,
> and have other regex defined inline? But for now I'll just all the
> consts. Please take a look and let me know which ones you think would
> not make sense to be consts.

> --- a/lisp/ol-bibtex.el
> +++ b/lisp/ol-bibtex.el
> @@ -107,6 +107,7 @@
>  
>  ;;; Code:
>  
> +(require 'org)

Please don't. (require 'org) is to be avoided. We have enough cyclic
dependencies to handle already. Just use (defvar ...) declaration.

> -                       "\\(:[[:alnum:]_@#%:]+:\\)[ \t]*$"
> +                          (concat org-tag-group-enclosed-re "[ \t]*$")

I am not a big fan of regexps that enclose the whole regexp in a group.
Groups are not free performance-wise and should be avoided when
possible. Instead, just use group-less version of the constant and add
groups in place if strictly necessary.

> -(defconst org-tag-re "[[:alnum:]_@#%]+"
> +(defconst org-tag-valid-char-set "[:alnum:]_@#%"
> +  "[Tag] A string representing the set of characters and character
> +classes valid within a tag.  This is the base pattern for tag
> +matching regex.")

Please get rid of [Tag]. What does it even refer to?
Also, the first line of the docstring should concisely describe what the
variable does. See 
https://www.gnu.org/software/emacs/manual/html_node/elisp/Documentation-Tips.html

Also, we should probably make this constant internal. I am not sure if
this design will stick, so let's introduce the minimal number of new
public variables.

> +(defconst org-tag-invalid-char-re
> +  (format "[^%s]" org-tag-valid-char-set)
> +  "[Tag] Regexp matching a single character that's NOT a valid tag char.")

This constant is only useful in very specific context - when we need to
transform arbitrary strings into something that can be used as tags.
Rather than introducing such constant, better factor out a function
doing the replacement.

> +(defconst org-tag-group-enclosed-re
> +  (format "\\(:\\([%s:]+\\):\\)" org-tag-valid-char-set)
> +  "Regex pattern for a colon-enclosed group of tags without matching
> +the enclosing spaces and tabs, e.g., \":TAG1:TAG2:\". Match group
> +1 stores the tags with the enclosing colons, and match group 2
> +stores the tags without the enclosing colons. Built using
> +org-tag-valid-char-set with the addition of the colon.")

I do not think that we really need this one, considering my other comments.

> +(defconst org-tag-group-optional-re
> +  (concat "\\(?:[ \t]+" org-tag-group-enclosed-re "\\)?[ \t]*$")
> +  "Regexp matching an optional tag group at the end of a line,
> + with optional leading and trailing spaces.  If a tag group is
> +present, group 1 is the full tag group (with colons), group 2 is
> +the tag content (without colons).")

This is nothing but org-tag-group-re enclosed in shy group + ?
Redundant.

>> Since we are rewriting things here, may as well use rx for better 
>> readability.
>
> Ah, this looks like a much better alternative. However, I'm not super
> familiar with rx, and would like to keep this refactoraization patch
> semantically equivalent to the old code. Do you mind if I do the rx
> change in a next path with a proper planning of unit tests whatnot?

No problem.

P.S. Please avoid top-posting. See 
https://orgmode.org/worg/org-mailing-list.html#orgc4ee9e4

-- 
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to