[racket-users] Unexpected pattern matching behavior

Greg Rosenblatt Wed, 25 Apr 2018 13:55:06 -0700

I'm confused by the behavior of pattern matching in some cases.  I'll 
discuss a few sets of examples involving segment patterns, then a mix of 
`and`, `or`, and `not` patterns.



Here is a basic segment pattern matching the entire contents of a list:
> (match '(1 2 3)
     ((list x ...) `(result: ,x)))
'(result: (1 2 3))

Let's add a prefix to the scrutinized list, and match it with a separate 
pattern variable:
> (match '((1 2 3) 1 2 3)
     ((list w x ...) `(result: ,w ,x)))
'(result: (1 2 3) (1 2 3))

Now let's constrain the prefix to be the same as the matched segment:
> (match '((1 2 3) 1 2 3)
     ((list x x ...) `(result: ,x)))
'(result: (1 2 3))

Everything seems fine so far.  Let's make the segment pattern more 
interesting.

This time, we'll have our segment pattern match elements that are lists of 
two elements:
> (match '((a 1) (b 2) (c 3))
     ((list (list k v) ...) `(result: ,k ,v)))
'(result: (a b c) (1 2 3))

We can add a prefix as before, corresponding to the first element of each 
two-element list in the remaining segment, and match it with a separate 
pattern variable:
> (match '((a b c) (a 1) (b 2) (c 3))
     ((list j (list k v) ...) `(result: ,j ,k ,v)))
'(result: (a b c) (a b c) (1 2 3))

So far so good.  Now let's try constraining the prefix to be the same as 
the first element of each two-element list in the segment pattern:
> (match '((a b c) (a 1) (b 2) (c 3))
     ((list k (list k v) ...) `(result: ,k ,v)))
; k113: unbound identifier;
;  also, no #%top syntax transformer is bound
;   in: k113
; [,bt for context]

Something went wrong.  Is this behavior expected?  This error doesn't look 
like the kind you'd expect a user to see.


Let's try another example with segment patterns:
> (match '(1 2 3 : 1 2 3)
     ((list x ... ': y ...) `(result: ,x ,y)))
'(result: (1 2 3) (1 2 3))

We're using : as a divider, allowing both the x and y segments to be 
non-empty.  In this case, they are bound to the same values.

What happens if we try to constrain the two segments to be identical by 
repeating the variable?:
> (match '(1 2 3 : 1 2 3)
     ((list x ... ': x ...) `(result: ,x)))
non-linear pattern used in `match` with ... at #<syntax:4:19 x> and 
#<syntax:4:10 x>
'(result: (1 2 3))

It looks like we got the right result (see the next example), but it looks 
like an error message was printed beforehand.  Is this message intended as 
a warning?  Should it actually be raising an error instead?  It almost 
looks like debug logging.

Why might an error be the expected behavior?  It seems we get the right 
result only coincidentally.  If we remove the divider and use separate 
variables, we can see the greedy behavior of segment patterns:
> (match '(1 2 3 1 2 3)
     ((list x ... y ...) `(result: ,x ,y)))
'(result: (1 2 3 1 2 3) ())

When we try to constrain the segment by repeating the same variable, we get 
the printed warning followed by an unexpected result:
> (match '(1 2 3 1 2 3)
     ((list x ... x ...) `(result: ,x)))
non-linear pattern used in `match` with ... at #<syntax:8:16 x> and 
#<syntax:8:10 x>
'(result: (1 2 3 1 2 3))

Is this greedy behavior expected even when the segment pattern variable is 
constrained in this way?  It seems like we should at least get an error.


Moving on, here's an example using `and` and `not` that behaves reasonably:
> (match (list 1 2 3)
   ((and (list a _ _) (not (list _ a _))) `(result: ,a)))
'(result: 1)

We can see how the scrutinized list is being constrained to not repeat the 
same first two elements:
> (match (list 1 1 3)
     ((and (list a _ _) (not (list _ a _))) `(result: ,a)))
; match: no matching clause for '(1 1 3) [,bt for context]

If we swap the order of the sub-patterns, expecting the same behavior as 
the first example, we get a surprise:
> (match (list 1 2 3)
     ((and (not (list _ a _)) (list a _ _)) `(result: ,a)))
; match: no matching clause for '(1 2 3) [,bt for context]

It seems that the order of sub-patterns in an `and` pattern matters, and 
determines logical scoping of pattern variables.  In the first pattern, we 
are getting the behavior of the logical proposition:
(exists (a)
  (and (== VALUE (list a _ _)) (not (== VALUE (list _ a _)))))

While in the second example, we seem to be getting the behavior of the 
logical proposition:
(exists (a)
  (and (== VALUE (list a _ _)) (not (exists (a) (== VALUE (list _ a _))))))
Or, pushing the `not` inward slightly:
(exists (a)
  (and (== VALUE (list a _ _)) (forall (a) (not (== VALUE (list _ a _))))))

Is this non-commutative behavior of `and` intended?


Here's an example involving `and` and `or` that attempts to ensure that a 
three-element list either has identical first and second elements, or 
identical second and third elements:
> (match (list 1 2 2)
   ((and (list a b c) (or (list _ a _) (list _ _ b))) 'success))
; readline-input:2:47: match: variable not bound in all or patterns
;   at: b
;   in: (or (list _ a _) (list _ _ b))
; [,bt for context]

Given the binding behavior we saw with `and` in the previous example, I 
figured this one would work, since we're just reusing the same pattern 
variables.  But instead we get an error about a variable not being bound in 
all `or` patterns.

The documentation suggests something like this might be the case, 
"instances of an id in different or and not sub-patterns are independent. 
The binding for id is not available in other parts of the same pattern."

But this seems inconsistent with the successful `(and ... (not ...))` 
example.  Why does that one work, but this one not?

What makes this really perplexing is that a slightly different set of 
examples, where we make sure to mention the same pattern variables in each 
disjunct, does work, this time ensuring that the first element is repeated 
in at least one of the other two positions:
> (match (list 1 2 3)
     ((and (list a b c) (or (list _ a _) (list _ _ a))) 'success))
; match: no matching clause for '(1 2 3) [,bt for context]
> (match (list 1 2 1)
    ((and (list a b c) (or (list _ a _) (list _ _ a))) 'success))
'success
> (match (list 1 1 3)
     ((and (list a b c) (or (list _ a _) (list _ _ a))) 'success))
'success


Here's an example involving `not` and `or` that ensures the scrutinized 
three-element list does not repeat consecutive elements:
> (match (list 1 2 3)
     ((not (or (list a a _) (list _ a a))) 'success))
'success

If we change the pattern variable in the second disjunct, the intended 
logical meaning is the same, but the example breaks:
> (match (list 1 2 3)
     ((not (or (list a a _) (list _ b b))) 'success))
; readline-input:13:34: match: variable not bound in all or patterns
;   at: b
;   in: (or (list a a _) (list _ b b))
; [,bt for context]

I can understand this behavior in the case of `or` appearing in a positive 
context (since otherwise we could introduce ambiguous bindings on the 
right-hand-side), but when negated, bindings for these pattern variables 
won't be available on the right-hand-side anyway.  Is this behavior 
intended for simplicity of implementation?

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] Unexpected pattern matching behavior

Reply via email to