> > Let me jump in for half a second here (no pun intended), but what > > about the use of back quotes? ` `? Use a very limited escaping > > policy of \` => ` and \\ => \ . > > Actually, having to double backslashes is one of the things I want > to get rid of. The here-document-based ideas seem to allow that.
Hrm, that would be nice to get rid of as \ is a highly overloaded, overused character. As someone who is presently in the throws of writing a new language, might I suggest using non-newline anchored token as opposed to more dynamic token? Using $$[.*]\n as a lexical token is a quasi-problematic as the anchor is the newline, something that SQL has been free of for as long as I'm aware of. By using a static lexical token, such as @@, newline's aren't important, thus reducing the number of accidental syntax errors from programmers. While I abhor the "let's put a magic token in this context to handle this quirk" grammar design methodology that Perl has brought, I do think that a simple doubling up of a nearly unused operator would be sufficient, concise, and easy. For example: !! Invalid as !! is a valid expression, though a NOOP. @@ Valid candidate as @@ is an invalid expression ## Valid candidate, but common comment syntax, avoid using $$ Valid candidate, but again, a common syntax in shell like languages %% Valid candidate, %% is an invalid expression ^^ Invalid candidate, ^^ is a valid expression && Invalid as && is a valid token ** Valid candidate, but ** is used as a power operator in Ruby Of the above, I'd think @@, %%, or $$ would be the best choices. If a dynamic token is desired, use a token that is terminally anchored with something other than a new line to keep PostgreSQL's SQL contextually free from newlines. If the desire for something HERE document-like is strong enough... well, how about the following flex patterns: @(@[^\n]+\n|[EMAIL PROTECTED]@) %(%[^\n]+\n|[^%]*%) $($[^\n]+\n|[^$]*$) If the developer knows his/her string and opts to use an empty string to name the token, so be it, @@ would be the beginning and terminating token for a literal string block. If the developer writing something with pl/autoconf (doesn't exist!!! Just an example of where @@ is used), then @autoconf me harder@ could be used as the start and ending token, which should provide enough bits to prevent the likelihood of the string being used in the enclosed data. If a newline is desired, it would be valid in the above: @ @ Inside the block @ @ @[EMAIL PROTECTED] the [EMAIL PROTECTED]@ and the resulting string would be " Inside the block ". %{ /* Headers/definitions/prototypes */ #include <string.h> static bool initialized = false; static char *lit_name; static char *lit_val; %} lit_quote_pattern @(@[^\n]+\n|[EMAIL PROTECTED]@) %x LIT_QUOTE %x SQL %% %{ /* Init bits */ if (!initialized) { BEGIN(SQL); initialized = true; } %} <SQL>{lit_quote_pattern} { /* -2 == leading/trailing chars, +1 '\0' = -1*/ lit_name = malloc(yyleng - 1); strncpy(&lit_name, &yyleng[1], yyleng - 2); lit_name[yyleng-1] = '\0'; lit_val = NULL; BEGIN(LIT_QUOTE); } <LIT_QUOTE>{lit_quote_pattern} { /* */ if (strncmp(lit_name, yytext[1], yyleng - 2) == 0) { /* Found the terminator, set yylval.??? to lit_val after appending yytext and return whatever the string type is to yyparse() */ yylval.??? = strdup(lit_val); free(lit_val); free(lit_name); lit_name = lit_val = NULL; BEGIN(SQL); return(tSTRING); } else { /* Do nothing until we hit a match */ } } <LIT_QUOTE>. { /* Not sure these func names off the top of my head: */ pg_append_str_to_buf(lit_val, yytext, yyleng); } %% /* Or something similarly flexible */ -sc -- Sean Chittenden ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]