Re: Unquoted strings in BASIC

EML Sun, 01 Dec 2024 09:31:44 -0800

Basic... wow. Start by fixing your regexes:


[0-9]*[0-9.][0-9]*([Ee][-+]?[0-9]+)? {
             yylval.d = strtod(yytext, NULL);
             return NUMBER;
           }

This matches a single '.', '.E0', and so on. Presumably you wantsomething which looks more like


dec_digit [0-9]
suffix    ...whatever
numA {dec_digit}+\.{dec_digit}+{suffix}?
numB      {dec_digit}+\.{suffix}
numC      \.{dec_digit}+{suffix}?
numD      {dec_digit}+{suffix}?
number    {numA}|{numB}|{numC}|{numD}

\"[^"^\n]*[\"\n] {
             yytext[strlen(yytext) - 1] = '\0';
             yylval.s = str_new(yytext + 1);
             return STRING;
           }

What does a string actually look like? And why have you got 'yytext+1'?If you're trying to get rid of the leading quote, you also need a codeblock to get rid of the closing quote.

This matches, among other things, a string which starts with a doublequote, terminated by a newline, with no closing quote. Not even Basiccan be that bad. This bit with 'zero or more chars which aren't anewline' is also redundant.

And note that you only need one caret (^), which must be at the start ofthe character class ([]). Your regexp literally matches a caret.


I am looking for ways to attack this. I tried this in my scanner:

[\,\:\n].*[\,\:\n] {
            yytext[strlen(yytext) - 1] = '\0';
            yylval.s = str_new(yytext + 1);
            return STRING;
          }

This matches all sorts of stuff which isn't a string. The basic unquotedstring is presumably any alphanumeric sequence, starting with a letter.The comma isn't really relevant, since it's not alphanumeric. Maybesomething like:


quoted_string    \"[^"\n]*\"
unquoted_string  [a-zA-Z][a-zA-Z0-9]*
string {quoted_string}|{unquoted_string}

...but this could interfere with variable names, and so on, which willneed more work. This will probably require you to take into account thecurrent context; see Hans's reply.

Re: Unquoted strings in BASIC

Reply via email to