> (Note that the exact changes needed to support emoji or other things may need to be discussed first.)
Very well. Let me start a new thread for emoji. The existing thread can be repurposed to focus on the refactorization. It's actually better this way because refactorization is beneficial by itself. > In the commit messages we quote symbols like `org-tag-valid-char-set'. > ... and use double space between sentences. Done > I am not sure if we need to be so granular in constants. Maybe we can instead just have `org-tag-re' and derive other regexps from it? I'm following the existing pattern for one const per use case. This trades code readability for less duplication. But I think maybe it makes sense to reduce the number of consts. To do that, I would like to agree on a few semantically meaningful subcomponents such as org-tag-re, org-tag-group (with : between tags), org-tag-group-line, and have other regex defined inline? But for now I'll just all the consts. Please take a look and let me know which ones you think would not make sense to be consts. > We usually do not announce future plans in docstrings. Docstring are for the users to read, and announcing changes in Org syntax that may or may not happen is not a good idea. Yup. Let me edit that . > Since we are rewriting things here, may as well use rx for better readability. Ah, this looks like a much better alternative. However, I'm not super familiar with rx, and would like to keep this refactoraization patch semantically equivalent to the old code. Do you mind if I do the rx change in a next path with a proper planning of unit tests whatnot? > Note that tag regexp is used manually in a number of places in Org codebase. Search for "_@#%:" across Org codebase and see. All these instances should be changed and ideally use the regexp constants instead of hard-coded constants. Great call out! Done in this new patch. On Sun, May 11, 2025 at 9:09 AM Ihor Radchenko <yanta...@posteo.net> wrote: > > Shuo Shen <shuoshen...@gmail.com> writes: > > > This is the first patch for the feature request: [FR] Allow emojis in > > tags (was: emojis in tags?) > > (https://list.orgmode.org/orgmode/87wn4pdarh.fsf@localhost/) > > (Note that the exact changes needed to support emoji or other things may > need to be discussed first.) > > > It's my first diff. I'm trying to get an understanding of code as well > > as the process at the same time, and don't want to get things > > complicated. Therefore, I'm keeping the change minimal. > > Thanks! See my comments below. > > > Subject: [PATCH] lisp/org.el: Define a const for the tag matching pattern > > > > * org.el (org-tag-valid-char-set, org-tag-re, org-tag-group-re) > > (org-tag-line-re): Defined the const org-tag-valid-char-set. > > In the commit messages we quote symbols like `org-tag-valid-char-set'. > > > org-tag-valid-char-set will be used as a single source of truth for > > matching tags. The problem is that the tag matching pattern is defined > > ... and use double space between sentences. > See https://orgmode.org/worg/org-contribute.html#org219e2af > > > everywhere (org-tag-re, org-tag-group-re, org-tag-line-re, and a bunch > > of inline places. This is the first diff to address the problem. When > > the refactorizations are all done, we will add emoji support for tags > > I am not sure if we need to be so granular in constants. Maybe we can > instead just have `org-tag-re' and derive other regexps from it? > > > +(defconst org-tag-valid-char-set "[:alnum:]_@#%" > > + "[Tag] A string representing the set of characters and character > > + classes valid within a tag. This is the base definition for tag > > +characters and will be the place to add new patterns.") > > We usually do not announce future plans in docstrings. Docstring are for > the users to read, and announcing changes in Org syntax that may or may > not happen is not a good idea. > > > +(defconst org-tag-re (format "[%s]+" org-tag-valid-char-set) > > "Regexp matching a single tag.") > > Since we are rewriting things here, may as well use rx for better readability. > > > +(defconst org-tag-group-re > > ... > > (defconst org-tag-line-re > > - "^\\*+ \\(?:.*[ \t]\\)?\\(:\\([[:alnum:]_@#%:]+\\):\\)[ \t]*$" > > + (format "^\\*+ \\(?:.*[ \t]\\)?\\(:\\([%s:]+\\):\\)[ \t]*$" > > org-tag-valid-char-set) > > Note that tag regexp is used manually in a number of places in Org > codebase. Search for "_@#%:" across Org codebase and see. All these > instances should be changed and ideally use the regexp constants instead > of hard-coded constants. > > -- > Ihor Radchenko // yantar92, > Org mode maintainer, > Learn more about Org mode at <https://orgmode.org/>. > Support Org development at <https://liberapay.com/org-mode>, > or support my work at <https://liberapay.com/yantar92>
0001-lisp-org.el-refactor-the-org-tag-family-regex-to-pre.patch
Description: Binary data