pluralization idea that keeps bugging me

Larry Wall Sat, 26 Jan 2008 08:59:20 -0800

Last night I got a message entitled: "yum: 1 Updates Available".
Of course, that's probably just a Python programmer giving up on doing
the right thing, but we see this sort of bletcherousness all the time.


After a recent exchange on PerlMonks about join, I've been thinking
about the problem of pluralization in interpolated strings, where we
get things like:

    say "Received $m message{ 1==$m ?? '' !! 's' }."

My first thought is that this is such a common idiom that we ought
to have some syntactic sugar for it:

    say "Received $m message\s."

which reads nicely enough since the usual case is plural.
Basically, \s would be smart enough to magically know somehow whether
the last interpolation was 1 or not.  It would be particular nice when
the interpolation is a closure:

    say "Received {calculate_number_of_messages()} message\s."

That would cover most of the cases for English speakers using regular
nouns, but I wonder whether there's some kind of generalization that
would help for cases like:

    say "There was/were $o ox/oxen"

But that doesn't work since / isn't a metacharacter.  Using an adverb
seems like overkill, if we can piggyback on an existing metachar.

Maybe something like

    say "There was\swere $o ox\soxen"

where if anything alphabetic follows the \s it is the alternative
plural.  But note that the first \s there would have to be looking
forward rather than backward to do the verb, which constrains the
possible mechanisms, and makes it problematic to use \s multiple times:

    say "There was\swere $o ox\soxen and $g goat\s."

though that could be made clearer with explicit concatenation:

    say "There was\swere $o ox\soxen " ~ "and $g goat\s."
    say "There was\swere $o ox\soxen ", "and $g goat\s."

Or maybe instead of using \ we should use a sigil:

    say "There $<was|were> $o $<ox|oxen>"

except, of course, that $<> is already taken.  Seems tacky to
use up a real variable name like:

    say "There $X<was|were> $o $X<ox|oxen>"

I suppose one could make a case for Num vars having a .<> method though:

    say "There $o<was|were> $o $o<ox|oxen>"

That nicely resolves the ambiguity of

    say "There $o<was|were> $o $o<ox|oxen> and $g goat$g<s>"
    
but doesn't really help when you really need it, which is when you
interpolate something hairy:

    say "There $j.k.l.m.o<was|were> $j.k.l.m.o $j.k.l.m.o<ox|oxen> and 
$j.k.l.m.g goat$j.k.l.m.g<s>"

It's even less helpful when you interpolate a closure since there's
no variable name to refer to (unless you assign one, but then we're
losing much of our syntactic sugary wonderfulness).  So maybe we should
just make \s dwim and leave it at that.  Two dwimminesses, really.
The first dwim finds the associated interpolation, either the first
interpolation of a variable or closure before the \s, or if there
is none, the first one after.  Call that interpolated value $X for
the moment.  (It doesn't really have to have a real variable name,
but the important thing is not to evaluate the expression multiple
times since it might have side effects (including the side effect of
being inefficient to compute).)

The second dwim looks at the alphabeticality of the next character
(defined Unicodically, of course) to decide if there is one argument or two:

    foo\s       means   $X == 1 ?? 'foo' !! 'foos'
    foo\sbar    means   $X == 1 ?? 'foo' !! 'bar'

Internally, you end up multiply dispatching to something like
pluralize($X,'foo') or pluralize($X,'foo','bar').  (Arguably we
could make pluralize interpolate the $X as well, but that only
works for noun agreement, not verb agreement.)

I think that probably handles most of the Indo-European cases, and
anything more complicated can revert to explicit code.  (Or go though
a localization dictionary...)

Any other cute ideas?  

Larry

pluralization idea that keeps bugging me

Reply via email to