bug 9 revisisted

Angus Leeming Sat, 01 Feb 2003 11:11:17 -0800

I'm not very good at eating humble pie, but I feel I have to here :-(

Lars, the patch that I submitted to TeX.pm this afternoon broke reLyX when 
tested with the other test cases in bugzilla. It didn't include the 
trailing whitespace as part of the macro and reLyX therefore generated 
tokens such as


$token='\hline
';

I have, therefore reverted my patch.

The fix to this patch is simple: an extra pair of braces to bracket the 
appropriate 'or': '((a|b)\s*)'. I have done the same thing for the '\*?' 
giving '(((a|b)\*?)\s*)

I attach a patch against the current TeX.pm (ie, this morning's) together 
with a test latex file that has examples of all token types, 
1 '\\  '
2 '\)  '
2 '\)*  '
3 '\(  '
4 '\hline  '
5 '\section   {Title}'
6 '\section*  {Title}'
Each is followed by whitespace. The patched reLyX converts this file 
correctly, differentiating correctly between 
        '\( a = b \) *' and '( c = d \)*'
(The xdvi shows that the former has a space bwtween b and *, the latter does 
not have such a space between d and *. This is correct.)

I've reopened the bug and put the patch and test case there but still think 
that this should be applied to 1.3.

Your call,
Angus


$ cvs -q diff -r 1.2 lib/reLyX/Text/TeX.pm > relyx.diff
$ cat relyx.diff
Index: lib/reLyX/Text/TeX.pm
===================================================================
RCS file: /usr/local/lyx/cvsroot/lyx-devel/lib/reLyX/Text/TeX.pm,v
retrieving revision 1.2
diff -u -p -r1.2 TeX.pm
--- lib/reLyX/Text/TeX.pm       1 Feb 2003 15:09:10 -0000       1.2
+++ lib/reLyX/Text/TeX.pm       1 Feb 2003 18:01:09 -0000
@@ -110,7 +110,7 @@ $usualtokenclass = "[^$notusualtoks]";
 # Ie, one or more alphabetic chars followed by zero or 1 asterisks
 # Eg, \section or \section*
 # Putting all this together:
-$macro = '\\\\(?:\)|([^a-zA-Z)]\*?)|([a-zA-Z]+\*?)\s*)';
+$macro = '\\\\(?:\)|((([^a-zA-Z)])|([a-zA-Z]+))\*?)\s*)';

 # active is a backslashed macro or $$ (same as \[) or ^^ followed by a char
 #    (^^A means ASCII(1), e.g. See the TeXbook) or a special character like 
~

? lib/reLyX/Text/tmp
Index: lib/reLyX/BasicLyX.pm
===================================================================
RCS file: /usr/local/lyx/cvsroot/lyx-devel/lib/reLyX/BasicLyX.pm,v
retrieving revision 1.5
diff -u -p -r1.5 BasicLyX.pm
--- lib/reLyX/BasicLyX.pm	7 Jan 2003 14:30:52 -0000	1.5
+++ lib/reLyX/BasicLyX.pm	1 Feb 2003 19:10:30 -0000
@@ -347,12 +347,12 @@ sub basic_lyx {
 	# If, e.g., there's just a comment in this token, don't do anything
 	# This actually shouldn't happen if CleanTeX has already removed them
 	last TYPESW if !defined $eaten->print;
-        
+
         # Handle LaTeX tokens
         if (/^Token$/o) {
 
 	    my $name = $eaten->token_name; # name of the token, e.g., "\large"
-	    print "'$name' " if $debug_on;
+	    print " '$name'" if $debug_on;
 
 	    # Tokens which turn into a bit of LyX text
 	    if (exists $TextTokenTransTable{$name}) {
Index: lib/reLyX/Text/TeX.pm
===================================================================
RCS file: /usr/local/lyx/cvsroot/lyx-devel/lib/reLyX/Text/TeX.pm,v
retrieving revision 1.3
diff -u -p -r1.3 TeX.pm
--- lib/reLyX/Text/TeX.pm	1 Feb 2003 17:46:18 -0000	1.3
+++ lib/reLyX/Text/TeX.pm	1 Feb 2003 19:10:32 -0000
@@ -96,13 +96,22 @@ $notusualtoks = "\\\\" . '\${}^_~&@%'; #
 $notusualtokenclass = "[$notusualtoks]";
 $usualtokenclass = "[^$notusualtoks]";
 
-# Original $macro wouldn't recognize, e.g., '\section*'. Added '\*?' - Ak
-# (Had to add it for \section and \\ separately.)
-#    \" or \frac, e.g. Note that it eats whitespace AFTER the token. This is
-# correct LaTeX behavior, but if text follows such a macro, and you just
-# print out the macro & then the text, they will run together.
-$macro = '\\\\(?:[^a-zA-Z]\*?|([a-zA-Z]+\*?)\s*)'; # Has one level of grouping
-#$macro = '\\\\(?:[^a-zA-Z]|([a-zA-Z]+)\s*)'; # Contains one level of grouping
+# The $macro RE matches LaTeX macros. Here's exactly what it does:
+# $macro = \\\\(?:RE)
+# This matches either '\\' or \RE where RE = RE1 or RE2
+# RE1 = '\)', so $macro will match the end of a math environment, '\)'
+# RE2 = (((RE3 or RE4)\*?)\s*) where
+# RE3 and RE4 can each be followed by zero or one asterisks. Either is still
+# a macro. Ditto, trailing whitespace is included in the token because that's
+# what LaTeX does.
+# RE3 = '([^a-zA-Z)])' matches a single non-alphabetic char. We already
+# test for '\)', so that is explictly excluded from RE3 because '\)*' is not
+# a macro. Rather it is '\)' followed by an asterisk.
+# RE4 = '([a-zA-Z]+\*?)'
+# Ie, one or more alphabetic chars followed by zero or 1 asterisks
+# Eg, \section or \section*
+# Putting all this together:
+$macro = '\\\\(?:\)|((([^a-zA-Z)])|([a-zA-Z]+))\*?)\s*)';
 
 # active is a backslashed macro or $$ (same as \[) or ^^ followed by a char
 #    (^^A means ASCII(1), e.g. See the TeXbook) or a special character like ~

\documentclass{article}
\begin{document}
\begin{tabular}{|c|c|}
\hline 
\( a = b \) * & \( c = d \)*
\\  \hline \end{tabular}

\section  {Some Title}
blah
\section*{Some Other title}
blah blah blah
\end{document}

bug 9 revisisted

Reply via email to