On Mon, Feb 11, 2019 at 2:32 PM David Storrs <david.sto...@gmail.com> wrote:

> It would be great if Perl's / pcre's "x" mode could be pulled into
> Racket, ideally with modifiers on the end of the regexp instead of
> inside it.  This mode specifies that all whitespace in the pattern
> should be ignored, so if you want to actually match whitespace then
> you need to specify it.  Example:
>
> ; tokenizer code running under the at-exp sublanguage: match a word
> boundary, 'var', whitespace, one or more contiguous word characters
> (ASCII alphanumerics or _), and then another word boundary
>
> @pregexp{\b var \s+ \w+ \b}x
>

Would this do what you want?

#lang at-exp racket

(define (pregexp-x . args)
  (pregexp
   (regexp-replace* #px"\\s+" (string-append* args) "")))

@pregexp-x{\b var \s+ \w+ \b} ;; -> #px"\\bvar\\s+\\w+\\b"

(With an easy optimization being to make `pregexp-x` a macro so that it can
do that work at compile-time when all of the args are string literals.)

Oh, and if I'm already asking for the moon, does Racket have any
> equivalent to Perl's named captures or composable regexen, or any
> possibility of adding them?
>
> (regexp-match @px{\b(?<greet>hi|hello)} "hi bob")
> (println greet) ; prints "hi"
>
>
I think both of these could be implemented well in "user space."

You can already compose regular expressions in (byte-)string form.
Functions like `regexp-match` implicitly convert string patterns with
`regexp` (IIRC this predates regexp literals in Racket), and the docs
promise that `object-name` on a regexp value (using either syntax) will
return its source in string form. (That part I didn't realize until just
now.) So, your example:

> (define bar #px"(bar)")
> (regexp-match @px{(foo)@bar} "foobar") ; equivalent to
> @px{(foo)(bar)}. returns '("foobar" "foo" "bar")

could be written as:
#lang at-exp racket

(define (px . args)
  (pregexp (string-append*
            (for/list ([x (in-list args)])
              (if (string? x) x (object-name x))))))

(define bar #px"(bar)")

(regexp-match @px{(foo)@bar} "foobar")
;; -> '("foobar" "foo" "bar")

The named captures are more interesting, because it uses the regexp pattern
as a binding form. There are various balances you could string between
static and dynamic, but what struck me is that it's essentially the same
problem addressed by `web-server/formlets
<https://docs.racket-lang.org/web-server/formlets.html>` (and the original
paper on Links/OCaml, "The Essence of Form Abstraction").

A formlet combines:

   - an HTML fragment than produces N inputs, and
   - a function that consumes N inputs and does some processing

into a composable representation. The library provides some syntactic sugar
to make this nice to write and driver functions like `send/formlet`, which
takes a formlet, sends its HTML to the user.

In your case, you want to associate a regexp that produces N captures with
a function that consumes N inputs. The interesting part is implementing the
syntactic sugar, and the tedious part is implementing `cross` (the
operation to combine formlets—formlets are applicative functors/idioms),
but here's a sketch of how "pregexp formlets" might work at a low-level:
#lang racket

(struct px-formlet (px process i) #:transparent)

(define greet-formlet
  (px-formlet #px"\\b(hi|hello)"
              (λ (greet)
                (println greet))
              1))

(define (px-formlet-match f input)
  (match-define (px-formlet px process i) f)
  (define rslt (regexp-match px input))
  (and rslt
       (unless (= i (length (cdr rslt)))
         ;; your library should make it difficult
         ;; or impossible to get here
         (error 'px-formlet-match "wrong number of captures"))
       (apply process (cdr rslt))))

(px-formlet-match greet-formlet "hi bob")

-Philip

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to