I seem to have programmed myself into a corner, and I'm hoping someone can 
offer some suggestions.

Source code here: https://github.com/maurymarkowitz/RetroBASIC/tree/master/src

The BASIC language has two constant types, numbers and strings. The ~300 line 
scanner is basically a list of the keywords and then:

[0-9]*[0-9.][0-9]*([Ee][-+]?[0-9]+)? {
            yylval.d = strtod(yytext, NULL);
            return NUMBER;
          }

\"[^"^\n]*[\"\n] {
            yytext[strlen(yytext) - 1] = '\0';
            yylval.s = str_new(yytext + 1);
            return STRING;
          }

Over in my ~1700 line parser, I have the concept of an expression, which is 
either of these constants along with functions, operators etc. The language 
also has list-like operators like:

PRINT A,B,C

Which I implemented as an exprlist:

exprlist:
  expression
  {
    $$ = lst_prepend(NULL, $1);
  }
  |
  exprlist ',' expression
  {
    $$ = lst_append($1, $3);
  }
  ;

The problem arises in the DATA statement, which is normally something along the 
lines of:

DATA 10,20,"HELLO","WORLD!"

I parse this as the statement token and then an exprlist:

  DATA exprlist
  {
    statement_t *new = make_statement(DATA);
    new->parms.data = $2;
    $$ = new;
  }


I can then read the values at runtime by walking down the list. But many 
dialects allow strings to be unquoted as long as they do not contain a line 
end, colon or comma:

DATA 10,20,HELLO,WORLD!

I am looking for ways to attack this. I tried this in my scanner:

[\,\:\n].*[\,\:\n] {
           yytext[strlen(yytext) - 1] = '\0';
           yylval.s = str_new(yytext + 1);
           return STRING;
         }

... but that captures too much and I get errors on every line. Many variations 
on this theme either cause errors everywhere or fail to parse the DATA line.

I am not sure where I should attempt to fix this, in the scanner with a pattern 
for "unquoted string", or the parser as a "datalist", or a mix of both?

Can someone offer some ways to attack this?

Reply via email to