Portable upper/lower case regexp matches

Jason Elbaum Thu, 10 Aug 2000 07:26:33 -0700
As far as I know, there is a basic bit of regexp functionality which
Perl should support but doesn't.

Perl regexps support the following features, though they're a bit
obscure to my tastes...

(from perlre:)
    \l          lowercase next char (think vi)
    \u          uppercase next char (think vi)
    \L          lowercase till \E (think vi)
    \U          uppercase till \E (think vi)
    \E          end case modification (think vi)

...but Perl doesn't offer a regexp pattern to match all alphabetical
characters of a particular case. Something like:

    \x          match lowercase alpha char
    \X          match uppercase alpha char

Thus /\X\x*/ would match all capitalized words, while /\X+/ would match
acronyms, and /(\X\x+)+/ would match Java class names.


What's the big deal, you ask? Just use [A-Z] and [a-z]!

Well, perlre notes:

"If use locale is in effect, the case map used by \l, \L, \u and \U is
taken from the current locale."

And to quote perllocale:

"Perl supports language-specific notions of data such as `is this a
letter', `what is the uppercase equivalent of this letter', and `which
of these letters comes first'. These are important issues, especially
for languages other than English--but also for English: it would be
naive to imagine that A-Za-z defines all the `letters' needed to write
in English."

So explicit regexp matching for upper/lower alpha characters is
necessary to support locales, not to mention to be consistent with the
perl docs themselves.

There is no convenient way to imitate this functionality in Perl while
supporting locales. There should be.


Jason Elbaum
[EMAIL PROTECTED]
Portable upper/lower case regexp matches

Reply via email to