> (Note that the exact changes needed to support emoji or other things may
need to be discussed first.)

Very well. Let me start a new thread for emoji. The existing thread
can be repurposed to focus on the refactorization. It's actually
better this way because refactorization is beneficial by itself.

> In the commit messages we quote symbols like `org-tag-valid-char-set'.
> ... and use double space between sentences.

Done

> I am not sure if we need to be so granular in constants. Maybe we can
instead just have `org-tag-re' and derive other regexps from it?

I'm following the existing pattern for one const per use case. This
trades code readability for less duplication. But I think maybe it
makes sense to reduce the number of consts. To do that, I would like
to agree on a few semantically meaningful subcomponents such as
org-tag-re, org-tag-group (with : between tags), org-tag-group-line,
and have other regex defined inline? But for now I'll just all the
consts. Please take a look and let me know which ones you think would
not make sense to be consts.

> We usually do not announce future plans in docstrings. Docstring are for
the users to read, and announcing changes in Org syntax that may or may
not happen is not a good idea.

Yup. Let me edit that .

> Since we are rewriting things here, may as well use rx for better readability.

Ah, this looks like a much better alternative. However, I'm not super
familiar with rx, and would like to keep this refactoraization patch
semantically equivalent to the old code. Do you mind if I do the rx
change in a next path with a proper planning of unit tests whatnot?

> Note that tag regexp is used manually in a number of places in Org
codebase. Search for "_@#%:" across Org codebase and see. All these
instances should be changed and ideally use the regexp constants instead
of hard-coded constants.

Great call out! Done in this new patch.


On Sun, May 11, 2025 at 9:09 AM Ihor Radchenko <yanta...@posteo.net> wrote:
>
> Shuo Shen <shuoshen...@gmail.com> writes:
>
> > This is the first patch for the feature request: [FR] Allow emojis in
> > tags (was: emojis in tags?)
> > (https://list.orgmode.org/orgmode/87wn4pdarh.fsf@localhost/)
>
> (Note that the exact changes needed to support emoji or other things may
> need to be discussed first.)
>
> > It's my first diff. I'm trying to get an understanding of code as well
> > as the process at the same time, and don't want to get things
> > complicated. Therefore, I'm keeping the change minimal.
>
> Thanks! See my comments below.
>
> > Subject: [PATCH] lisp/org.el: Define a const for the tag matching pattern
> >
> > * org.el (org-tag-valid-char-set, org-tag-re, org-tag-group-re)
> > (org-tag-line-re): Defined the const org-tag-valid-char-set.
>
> In the commit messages we quote symbols like `org-tag-valid-char-set'.
>
> > org-tag-valid-char-set will be used as a single source of truth for
> > matching tags. The problem is that the tag matching pattern is defined
>
> ... and use double space between sentences.
> See https://orgmode.org/worg/org-contribute.html#org219e2af
>
> > everywhere (org-tag-re, org-tag-group-re, org-tag-line-re, and a bunch
> > of inline places. This is the first diff to address the problem. When
> > the refactorizations are all done, we will add emoji support for tags
>
> I am not sure if we need to be so granular in constants. Maybe we can
> instead just have `org-tag-re' and derive other regexps from it?
>
> > +(defconst org-tag-valid-char-set "[:alnum:]_@#%"
> > +  "[Tag] A string representing the set of characters and character
> > + classes valid within a tag.  This is the base definition for tag
> > +characters and will be the place to add new patterns.")
>
> We usually do not announce future plans in docstrings. Docstring are for
> the users to read, and announcing changes in Org syntax that may or may
> not happen is not a good idea.
>
> > +(defconst org-tag-re (format "[%s]+" org-tag-valid-char-set)
> >    "Regexp matching a single tag.")
>
> Since we are rewriting things here, may as well use rx for better readability.
>
> > +(defconst org-tag-group-re
> > ...
> >  (defconst org-tag-line-re
> > -  "^\\*+ \\(?:.*[ \t]\\)?\\(:\\([[:alnum:]_@#%:]+\\):\\)[ \t]*$"
> > +  (format "^\\*+ \\(?:.*[ \t]\\)?\\(:\\([%s:]+\\):\\)[ \t]*$" 
> > org-tag-valid-char-set)
>
> Note that tag regexp is used manually in a number of places in Org
> codebase. Search for "_@#%:" across Org codebase and see. All these
> instances should be changed and ideally use the regexp constants instead
> of hard-coded constants.
>
> --
> Ihor Radchenko // yantar92,
> Org mode maintainer,
> Learn more about Org mode at <https://orgmode.org/>.
> Support Org development at <https://liberapay.com/org-mode>,
> or support my work at <https://liberapay.com/yantar92>

Attachment: 0001-lisp-org.el-refactor-the-org-tag-family-regex-to-pre.patch
Description: Binary data

Reply via email to