On Wed, Jun 20, 2012 at 1:26 AM, Gregory Woodhouse <gregwoodho...@me.com> wrote: > I want to write a rule that will recognize strings in a language (MUMPS) that > doubles double quotes as a means of escaping them. For example "The double > quote symbol is \"." would be "The double quote symbol is ""." and "\"" would > be """". That seems simple enough except that I need to write regular > expression that matches any printing character (including #\spacer and #\tab > except, of course #\". There is the complement operator, but that gives me > any character but #\", not quite what I want. With a set difference, I > suppose I could do something like > > DQUOTE (DQUOTE DQUOTE | printing - DQUOTE)* DQUOTE > > but again, I'm not quite sure how to express this in the lexer.
Perhaps we can use the character set complement operator. Let's see... ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; #lang racket (require parser-tools/lex) (define my-lexer (lexer [(concatenation "\"" (repetition 0 +inf.0 (union (char-complement #\") "\"\"")) "\"") lexeme])) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Would this work? Here's how it behaves on a few examples: ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > (my-lexer (open-input-string "\"hello world\"")) "\"hello world\"" > (my-lexer (open-input-string "\"hello \"\"world\"")) "\"hello \"\"world\"" > (my-lexer (open-input-string "\"hello \"world\"")) "\"hello \"" ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ____________________ Racket Users list: http://lists.racket-lang.org/users