I have functions below that pass your tests, but a lot depends on what you count as whitespace (not to mention newlines, since one of your comments suggests that you want to match all Unicode line-terminators). The functions below assume that "newline" means #\u000A and "whitespace" means space, tab, newline, formfeed, return.
Both regexps use [^\S\n] to match non-newline whitespace. On Wed, Jun 8, 2011 at 2:13 PM, Richard Lawrence <richard.lawre...@berkeley.edu> wrote: > Hi everyone, > > I'm sure this is a really trivial question, but I've been trying on my > own for some time now, and I can't quite figure it out. I am trying to > define a pair of functions, skip-whitespace and skip-blank-line, that do > the following: > > - skip-whitespace should consume any whitespace characters from an input > port, possibly up to and including a single newline, but it should not > consume any more whitespace after a newline--i.e., it should not skip a > blank line in the input > > e.g., > (define ip (open-input-string " ABC")) > (define ip2 (open-input-string " \n\t\nABC")) > (define ip3 (open-input-string "ABC")) > (skip-whitespace ip) (skip-whitespace ip2) (skip-whitespace ip3) > (peek-char ip) ; should be #\A > (peek-char ip2) ; should be #\tab > (peek-char ip3) ; should be #\A (: skip-whitespace (Input-Port -> Boolean)) (define (skip-whitespace in) (and (regexp-match #px"[^\\S\\\n]*\\\n?" in) #t)) > > - skip-blank-line should consume whitespace characters from an input > port just in case that sequence of whitespace characters ends in a > newline, and not consume any input otherwise > > e.g., > (define ip (open-input-string " ABC")) > (define ip2 (open-input-string " \n\t\nABC")) > (define ip3 (open-input-string "ABC")) > (skip-blank-line ip) (skip-blank-line ip2) (skip-blank-line ip3) > (peek-char ip) ; should be #\space > (peek-char ip2) ; should be #\tab > (peek-char ip3) ; should be #\A (: skip-blank-line (Input-Port -> Boolean)) (define (skip-blank-line in) (and (regexp-try-match #px"^[^\\S\\\n]*\\\n" in) #t)) > [snip] > This works fine. But I can't figure out how to write the parallel regexp > for skip-blank-line. All the regexps I can come up with either read too > much whitespace or too little. > > #lang typed/racket > (: skip-blank-line (Input-Port -> Boolean)) > (define (skip-blank-line in) > (if (try-read #px"^[[:blank:]]*$" in) #t #f)) > > This consumes too little in the second case: it doesn't consume the > initial spaces and newline of ip2; the next char is #\space rather than > #\tab. (The same is true if I change the character class :blank: to > :space:.) $ matches the end of input. That only corresponds to a newline in multi mode [see http://docs.racket-lang.org/reference/regexp.html?q=regexp#(def._((quote._~23~25kernel)._regexp))] > ... but what I could > really use is a character class that just matches line-terminators, > instead of :space:. That seems to be the job of "\\p{Zl}", but I guess > there's something I don't understand about that, because (regexp-match > #px"\\p{Zl}" "\n") doesn't match anything.) The newline's Unicode character category is Cc, not Zl. But Cc will match far more that what you want. -Jon _________________________________________________ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/users