
I'm not an expert on PetitParser, but I think I understand what is
happening. If I am right, I would expect the parser which fails to also
fail, for the same reason, if the input is just 'John Smith' without the
'Jr'. If this is not so, you can disregard the rest of this post.

The top-level construct in your parser is PPSequenceParser, which works in a
simple-minded way; it just checks whether each of its component parsers
succeeds. If one of them fails, the whole sequence fails; it does not try
backtracking. (You can see the code at PPSequenceParser>>#parseOn:) In your
case, the second component parser, which is 'middleName optional', succeeds,
because 'Smith' could be a middle name. The next component, 'lastName',
fails because 'Jr' is not a valid last name, but there is no way for the
sequence parser to recall that the previous component had an optional
element. So the sequence fails.

The only way to cope with this that I can see is to make the options
explicit by using the slash, which does show the parser where to backtrack
to. This is what your second parser does. You could limit the scope of the
backtracking to avoid re-parsing the first name, by writing something like:

firstName, ((middleName, lastName)/ lastName), generational optional

(I'm not sure whether the innermost parentheses are necessary, but at least
they do no harm.)

Thinking about this, I wondered how 'optional' could ever be used except at
the end of a sequence. I think the answer is that it works if the optional
token has a format or structure which identifies it uniquely if it does
occur; in this case, the effect of 'optional' is to say 'forget it if it
doesn't occur'. In your case, there is nothing to distinguish a middle name
from a last name; indeed, I believe in US usage they can be the same - if
Jane Smith marries John Doe, can she become Jane Smith Doe?

If you are going to produce a parser which copes with all the vagaries of
people's names, especially outside the US, I think you will have some fun.
Many people in France would write a surname like yours with a space after
the 'De', and probably a lower-case 'd' as well. In Scotland, a suffix like
'Jr' could appear as 'the Younger'. Some people have more than two
forenames. Some people have double-barrelled surnames, with or without a
hyphen. Those are just a few of the complications I can think of. So good

Hope this helps

Peter Kenny

-----Original Message-----
From: Pharo-users [] On Behalf Of
Sean P. DeNigris
Sent: 21 September 2017 03:18
Subject: [Pharo-users] PetitParser Mystery

        generationalPart := (#space asParser, generational) ==> #second.
        middleName := (#space asParser, (generational not,
abbreviatableToken) ==>
#second) ==> #second.
        lastName := (#space asParser, (generational not, token) ==> #second)
==> #second.
        input := 'John Smith Jr'.

The following parser fails:
        abbreviatableToken, middleName optional, lastName, generationalPart

But this one succeeds:
        (abbreviatableToken, middleName, lastName, generationalPart
optional) / (abbreviatableToken, nil asParser, lastName, generationalPart

They look the same to me. What is the difference?


Sent from:

Reply via email to