On Wed, 11 May 2022 at 09:15, Reuben Thomas <[email protected]> wrote:
>
> I'm happy to prepare a patch in this case. I would simply remove all
> mention of syntax tables, as that functionality is no longer available.
>
Attached. Here's the commit message to explain what I've done:
Remove mention of both Emacs and non-Emacs syntax tables, as these are
no
longer supported by the code; instead, fixed character classes are used.
Document the word character class (alnum + _).
Replace mentions of #defining emacs with RE_NO_GNU_OPS (which takes
effect
in the opposite sense); merge the node “GNU Emacs Operators” into “GNU
Operators”.
For \` and \', refer to the “whole string” rather than the (Emacs)
“buffer”.
Leave a TODO to document the classes that can be used with \s and \S.
(This
was not previously documented, and is best left to another commit.)
--
https://rrt.sc3d.org
From 72bdacccbd3e6cc3eb6e16549cf51ea9e7321ae2 Mon Sep 17 00:00:00 2001
From: Reuben Thomas <[email protected]>
Date: Wed, 11 May 2022 11:47:00 +0100
Subject: [PATCH] doc/regex.texi: remove Emacs-specific documentation; match
code
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Remove mention of both Emacs and non-Emacs syntax tables, as these are no
longer supported by the code; instead, fixed character classes are used.
Document the word character class (alnum + _).
Replace mentions of #defining emacs with RE_NO_GNU_OPS (which takes effect
in the opposite sense); merge the node “GNU Emacs Operators” into “GNU
Operators”.
For \` and \', refer to the “whole string” rather than the (Emacs) “buffer”.
Leave a TODO to document the classes that can be used with \s and \S. (This
was not previously documented, and is best left to another commit.)
---
doc/regex.texi | 113 +++++++++++++------------------------------------
1 file changed, 30 insertions(+), 83 deletions(-)
diff --git a/doc/regex.texi b/doc/regex.texi
index d21052282d..7015c8a651 100644
--- a/doc/regex.texi
+++ b/doc/regex.texi
@@ -108,8 +108,8 @@ Compiling}, for more information on compiling.
Regex considers the current syntax to be a collection of bits; we refer
to these bits as @dfn{syntax bits}. In most cases, they affect what
characters represent what operators. We describe the meanings of the
-operators to which we refer in @ref{Common Operators}, @ref{GNU
-Operators}, and @ref{GNU Emacs Operators}.
+operators to which we refer in @ref{Common Operators}, and @ref{GNU
+Operators}.
For reference, here is the complete list of syntax bits, in alphabetical
order:
@@ -467,15 +467,15 @@ cases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},
(@pxref{Match-non-word-constituent Operator}).
@item
-@samp{\`} represents the match-beginning-of-buffer
-operator and @samp{\'} represents the match-end-of-buffer operator
-(@pxref{Buffer Operators}).
+@samp{\`} represents the match-beginning-of-string
+operator and @samp{\'} represents the match-end-of-string operator
+(@pxref{Whole-string Operators}).
@item
-If Regex was compiled with the C preprocessor symbol @code{emacs}
-defined, then @samp{\s@var{class}} represents the match-syntactic-class
-operator and @samp{\S@var{class}} represents the
-match-not-syntactic-class operator (@pxref{Syntactic Class Operators}).
+@samp{\s@var{class}} represents the match-syntactic-class operator and
+@samp{\S@var{class}} represents the match-not-syntactic-class operator
+(@pxref{Syntactic Class Operators}), unless the syntax bit
+@code{RE_NO_GNU_OPS} is set.
@end itemize
@@ -1243,22 +1243,24 @@ exactly the dual of @samp{^}'s; see the previous section. (That is,
@node GNU Operators
@chapter GNU Operators
-Following are operators that GNU defines (and POSIX doesn't).
+Following are operators that GNU defines (and POSIX doesn't) that you
+can use unless the syntax bit @code{RE_NO_GNU_OPS} is set.
@menu
* Word Operators::
-* Buffer Operators::
+* Whole-string Operators::
@end menu
@node Word Operators
@section Word Operators
The operators in this section require Regex to recognize parts of words.
-Regex uses a syntax table to determine whether or not a character is
-part of a word, i.e., whether or not it is @dfn{word-constituent}.
+Characters that are part of words, which are called
+@dfn{word-constituent}, are letters, digits, and the underscore
+(@samp{_}); more precisely, any character in the POSIX class
+@code{alnum} in the current locale, or underscore.
@menu
-* Non-Emacs Syntax Tables::
* Match-word-boundary Operator:: \b
* Match-within-word Operator:: \B
* Match-beginning-of-word Operator:: \<
@@ -1267,34 +1269,6 @@ part of a word, i.e., whether or not it is @dfn{word-constituent}.
* Match-non-word-constituent Operator:: \W
@end menu
-@node Non-Emacs Syntax Tables
-@subsection Non-Emacs Syntax Tables
-
-A @dfn{syntax table} is an array indexed by the characters in your
-character set. In the ASCII encoding, therefore, a syntax table
-has 256 elements. Regex always uses a @code{char *} variable
-@code{re_syntax_table} as its syntax table. In some cases, it
-initializes this variable and in others it expects you to initialize it.
-
-@itemize @bullet
-@item
-If Regex is compiled with the preprocessor symbols @code{emacs} and
-@code{SYNTAX_TABLE} both undefined, then Regex allocates
-@code{re_syntax_table} and initializes an element @var{i} either to
-@code{Sword} (which it defines) if @var{i} is a letter, number, or
-@samp{_}, or to zero if it's not.
-
-@item
-If Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}
-defined, then Regex expects you to define a @code{char *} variable
-@code{re_syntax_table} to be a valid syntax table.
-
-@item
-@xref{Emacs Syntax Tables}, for what happens when Regex is compiled with
-the preprocessor symbol @code{emacs} defined.
-
-@end itemize
-
@node Match-word-boundary Operator
@subsection The Match-word-boundary Operator (@code{\b})
@@ -1347,74 +1321,47 @@ This operator (represented by @samp{\W}) matches any character that is
not word-constituent.
-@node Buffer Operators
-@section Buffer Operators
+@node Whole-string Operators
+@section Whole-string Operators
-Following are operators which work on buffers. In Emacs, a @dfn{buffer}
-is, naturally, an Emacs buffer. For other programs, Regex considers the
-entire string to be matched as the buffer.
+Following are operators which work on the whole string.
@menu
-* Match-beginning-of-buffer Operator:: \`
-* Match-end-of-buffer Operator:: \'
+* Match-beginning-of-string Operator:: \`
+* Match-end-of-string Operator:: \'
+* Syntactic Class Operators::
@end menu
-@node Match-beginning-of-buffer Operator
-@subsection The Match-beginning-of-buffer Operator (@code{\`})
+@node Match-beginning-of-string Operator
+@subsection The Match-beginning-of-string Operator (@code{\`})
@cindex @samp{\`}
This operator (represented by @samp{\`}) matches the empty string at the
-beginning of the buffer.
+beginning of the string.
-@node Match-end-of-buffer Operator
-@subsection The Match-end-of-buffer Operator (@code{\'})
+@node Match-end-of-string Operator
+@subsection The Match-end-of-string Operator (@code{\'})
@cindex @samp{\'}
This operator (represented by @samp{\'}) matches the empty string at the
-end of the buffer.
-
-
-@node GNU Emacs Operators
-@chapter GNU Emacs Operators
-
-Following are operators that GNU defines (and POSIX doesn't)
-that you can use only when Regex is compiled with the preprocessor
-symbol @code{emacs} defined.
-
-@menu
-* Syntactic Class Operators::
-@end menu
+end of the string.
@node Syntactic Class Operators
@section Syntactic Class Operators
The operators in this section require Regex to recognize the syntactic
-classes of characters. Regex uses a syntax table to determine this.
+classes of characters.
+@c TODO: What are the valid classes?
@menu
-* Emacs Syntax Tables::
* Match-syntactic-class Operator:: \sCLASS
* Match-not-syntactic-class Operator:: \SCLASS
@end menu
-@node Emacs Syntax Tables
-@subsection Emacs Syntax Tables
-
-A @dfn{syntax table} is an array indexed by the characters in your
-character set. In the ASCII encoding, therefore, a syntax table
-has 256 elements.
-
-If Regex is compiled with the preprocessor symbol @code{emacs} defined,
-then Regex expects you to define and initialize the variable
-@code{re_syntax_table} to be an Emacs syntax table. Emacs' syntax
-tables are more complicated than Regex's own (@pxref{Non-Emacs Syntax
-Tables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},
-for a description of Emacs' syntax tables.
-
@node Match-syntactic-class Operator
@subsection The Match-syntactic-class Operator (@code{\s}@var{class})
--
2.25.1