> On Dec 1, 2024, at 2:07 PM, James K. Lowden <jklow...@schemamania.org> wrote:
> 
> require some real thinking, never easy.  For example, is
> 
>       10100 DATA "one",2B,"three"
> 
> valid?  If it is, you'll need one pattern for "leading alpha" that might
> be just one character, and another for "leading digit" that requires

On further reflection, it seems it is much easier to parse everything in the 
data section as a string, and then turn it into a string or number at runtime 
based on the variable that's reading it. That's how INPUT works, so I can 
re-use that code.

So I made my DATA_STATEMENT state exclusive, and added these rules:

<DATA_STATEMENT>{
\"[^"^\n]*[\"\n] {
             yytext[strlen(yytext) - 1] = '\0';
             yylval.s = str_new(yytext + 1);
             return STRING;
           }

[^,:\n]* {
            yylval.s = str_new(yytext);
            return STRING;
          }
 
[,] { return ','; }
[:] { BEGIN(INITIAL); return ':'; }
[\n] { BEGIN(INITIAL); return '\n'; }
}

This *almost* works!

The problem is leading whitespace. For instance:

10 DATA "one",two

In this case the second rule matches with the leading whitespace, so it's 
longer.

I played around with a couple of emulators to check the behaviour, and leading 
and trailing space are removed, so:

10 DATA one.    ,     two

Produces 'one' and 'two".

So, what's the easiest way to strip the white whitespace on either side, but 
NOT return them as part of the resulting string. Is this something I should do 
in my pattern, or better off in C code?

Reply via email to