On Wed, Jun 20, 2012 at 1:26 AM, Gregory Woodhouse <gregwoodho...@me.com> wrote:
> I want to write a rule that will recognize strings in a language (MUMPS) that 
> doubles double quotes as a means of escaping them. For example "The double 
> quote symbol is \"." would be "The double quote symbol is ""." and "\"" would 
> be """". That seems simple enough except that I need to write regular 
> expression that matches any printing character (including #\spacer  and #\tab 
> except, of course #\". There is the complement operator, but that gives me 
> any character but #\", not quite what I want.  With a set difference, I 
> suppose I could do something like
>
> DQUOTE (DQUOTE DQUOTE | printing - DQUOTE)* DQUOTE
>
> but again, I'm not quite sure how to express this in the lexer.


Perhaps we can use the character set complement operator.  Let's see...

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#lang racket

(require parser-tools/lex)

(define my-lexer
  (lexer [(concatenation
           "\""
           (repetition 0 +inf.0 (union (char-complement #\")
                                       "\"\""))
           "\"")
          lexeme]))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


Would this work?  Here's how it behaves on a few examples:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> (my-lexer (open-input-string "\"hello world\""))
"\"hello world\""
> (my-lexer (open-input-string "\"hello \"\"world\""))
"\"hello \"\"world\""
> (my-lexer (open-input-string "\"hello \"world\""))
"\"hello \""
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to