On Sun, 1 Dec 2024 21:10:31 -0500
Maury Markowitz <maury.markow...@gmail.com> wrote:

> > DATA { yy_push_state(data); }
> 
> Is there a difference between yy_push_state(data) and BEGIN(data)?

Yes.  As stated in the manual, BEGIN simply changes the start
condition. yy_push_state and yy_pop_state use a stack, so you can
return to the previous state, no matter how you got there.  

In your case you could use BEGIN, because you have only two start
conditions to deal with.  Not much to get tangled up in there.  

> <data>{
>  [[:alpha:]]/[^,\r\n] { /* ... */ return STRING; }
>  [\r]?[\n] { yy_pop_state(); }
> }

There's some syntax here that's new to me. I believe [:alpha:} is only
A-Za-z, yes? If so that is not correct, as 'one', 'one!' and 'one1' are
all valid.

Yes.  I knew I was asking for trouble by offering syntax without a test
case.  The '/' in the regex means that what's to the right must match
but is excluded from yytext and, as the manual says, "is then returned
to the input".  

> <DATA>[^0-9][A-Za-z]*[,:\n] {
>             yytext[strlen(yytext) - 1] = '\0';
>             yylval.s = str_new(yytext + 1);
>             return STRING;

(Note that you're not cancelling the start condition on matching that
rule.  I'm guessing you'd want to.)

The problem with that is that it accepts too much.  As you noted, it
accepts the comma-delimiter as part of the string.  

But I'm not super keen on the negative pattern  ([^0-9]).  It's easy to
devise a negative pattern that encompasses as much or more than
the patterns you already have to match quoted strings and numbers.  It
also leaves 117 = 127 - 10 possible characters. Is a space valid? A
tab?  A vertical tab?  A non-ASCII character with the high bit set?  

I would try to devise a positive regex of all characters that may
comprise a string in a DATA statement, and no other.  That might
require some real thinking, never easy.  For example, is

        10100 DATA "one",2B,"three"

valid?  If it is, you'll need one pattern for "leading alpha" that might
be just one character, and another for "leading digit" that requires
some nonnumeric data following.  

--jkl

Reply via email to