On Sun, 1 Dec 2024 21:10:31 -0500 Maury Markowitz <maury.markow...@gmail.com> wrote:
> > DATA { yy_push_state(data); } > > Is there a difference between yy_push_state(data) and BEGIN(data)? Yes. As stated in the manual, BEGIN simply changes the start condition. yy_push_state and yy_pop_state use a stack, so you can return to the previous state, no matter how you got there. In your case you could use BEGIN, because you have only two start conditions to deal with. Not much to get tangled up in there. > <data>{ > [[:alpha:]]/[^,\r\n] { /* ... */ return STRING; } > [\r]?[\n] { yy_pop_state(); } > } There's some syntax here that's new to me. I believe [:alpha:} is only A-Za-z, yes? If so that is not correct, as 'one', 'one!' and 'one1' are all valid. Yes. I knew I was asking for trouble by offering syntax without a test case. The '/' in the regex means that what's to the right must match but is excluded from yytext and, as the manual says, "is then returned to the input". > <DATA>[^0-9][A-Za-z]*[,:\n] { > yytext[strlen(yytext) - 1] = '\0'; > yylval.s = str_new(yytext + 1); > return STRING; (Note that you're not cancelling the start condition on matching that rule. I'm guessing you'd want to.) The problem with that is that it accepts too much. As you noted, it accepts the comma-delimiter as part of the string. But I'm not super keen on the negative pattern ([^0-9]). It's easy to devise a negative pattern that encompasses as much or more than the patterns you already have to match quoted strings and numbers. It also leaves 117 = 127 - 10 possible characters. Is a space valid? A tab? A vertical tab? A non-ASCII character with the high bit set? I would try to devise a positive regex of all characters that may comprise a string in a DATA statement, and no other. That might require some real thinking, never easy. For example, is 10100 DATA "one",2B,"three" valid? If it is, you'll need one pattern for "leading alpha" that might be just one character, and another for "leading digit" that requires some nonnumeric data following. --jkl