I seem to have programmed myself into a corner, and I'm hoping someone can offer some suggestions.
Source code here: https://github.com/maurymarkowitz/RetroBASIC/tree/master/src The BASIC language has two constant types, numbers and strings. The ~300 line scanner is basically a list of the keywords and then: [0-9]*[0-9.][0-9]*([Ee][-+]?[0-9]+)? { yylval.d = strtod(yytext, NULL); return NUMBER; } \"[^"^\n]*[\"\n] { yytext[strlen(yytext) - 1] = '\0'; yylval.s = str_new(yytext + 1); return STRING; } Over in my ~1700 line parser, I have the concept of an expression, which is either of these constants along with functions, operators etc. The language also has list-like operators like: PRINT A,B,C Which I implemented as an exprlist: exprlist: expression { $$ = lst_prepend(NULL, $1); } | exprlist ',' expression { $$ = lst_append($1, $3); } ; The problem arises in the DATA statement, which is normally something along the lines of: DATA 10,20,"HELLO","WORLD!" I parse this as the statement token and then an exprlist: DATA exprlist { statement_t *new = make_statement(DATA); new->parms.data = $2; $$ = new; } I can then read the values at runtime by walking down the list. But many dialects allow strings to be unquoted as long as they do not contain a line end, colon or comma: DATA 10,20,HELLO,WORLD! I am looking for ways to attack this. I tried this in my scanner: [\,\:\n].*[\,\:\n] { yytext[strlen(yytext) - 1] = '\0'; yylval.s = str_new(yytext + 1); return STRING; } ... but that captures too much and I get errors on every line. Many variations on this theme either cause errors everywhere or fail to parse the DATA line. I am not sure where I should attempt to fix this, in the scanner with a pattern for "unquoted string", or the parser as a "datalist", or a mix of both? Can someone offer some ways to attack this?