> On Dec 1, 2024, at 2:07 PM, James K. Lowden <jklow...@schemamania.org> wrote: > > require some real thinking, never easy. For example, is > > 10100 DATA "one",2B,"three" > > valid? If it is, you'll need one pattern for "leading alpha" that might > be just one character, and another for "leading digit" that requires
On further reflection, it seems it is much easier to parse everything in the data section as a string, and then turn it into a string or number at runtime based on the variable that's reading it. That's how INPUT works, so I can re-use that code. So I made my DATA_STATEMENT state exclusive, and added these rules: <DATA_STATEMENT>{ \"[^"^\n]*[\"\n] { yytext[strlen(yytext) - 1] = '\0'; yylval.s = str_new(yytext + 1); return STRING; } [^,:\n]* { yylval.s = str_new(yytext); return STRING; } [,] { return ','; } [:] { BEGIN(INITIAL); return ':'; } [\n] { BEGIN(INITIAL); return '\n'; } } This *almost* works! The problem is leading whitespace. For instance: 10 DATA "one",two In this case the second rule matches with the leading whitespace, so it's longer. I played around with a couple of emulators to check the behaviour, and leading and trailing space are removed, so: 10 DATA one. , two Produces 'one' and 'two". So, what's the easiest way to strip the white whitespace on either side, but NOT return them as part of the resulting string. Is this something I should do in my pattern, or better off in C code?