On 7/31/07, Nevada <[EMAIL PROTECTED]> wrote:
> Hello,
>
> In the Perl Cookbook recipe 1.14 - "Properly Capitalizing a Title or
> Headline" I see this substitution:
>
>    s/(\pL[\pL']*)/$nocap{$1} ? lc($1) : ucfirst(lc($1))/ge;
>
> if a word appears in the nocap hash, it is made lower case. the \pL
> matches a lower case character but what is [\pL']?
>
> Thanks,
>
> NS

It appears to match a Unicode letter character.

from perldoc perlre
           \pP Match P, named property.  Use \p{Prop} for longer names.

       see perlunicode for more details about "\pP", "\PP", and "\X", and per�\
       luniintro about Unicode in general.  You can define your own "\p" and
       "\P" properties, see perlunicode.

from perldoc perlunicode
      ・   Named Unicode properties, scripts, and block ranges may be used
           like character classes via the "\p{}" "matches property" construct
           and the  "\P{}" negation, "doesn't match property".

           For instance, "\p{Lu}" matches any character with the Unicode "Lu"
           (Letter, uppercase) property, while "\p{M}" matches any character
           with an "M" (mark--accents and such) property.  Brackets are not
           required for single letter properties, so "\p{M}" is equivalent to
           "\pM". Many predefined properties are available, such as "\p{Mir�\
           rored}" and "\p{Tibetan}".

           Here are the basic Unicode General Category properties, followed by
           their long form.  You can use either; "\p{Lu}" and "\p{Uppercase�\
           Letter}", for instance, are identical.

               Short       Long

               L           Letter
               LC          CasedLetter
               Lu          UppercaseLetter
               Ll          LowercaseLetter
               Lt          TitlecaseLetter
               Lm          ModifierLetter
               Lo          OtherLetter

               M           Mark
               Mn          NonspacingMark
               Mc          SpacingMark
               Me          EnclosingMark

               N           Number
               Nd          DecimalNumber
               Nl          LetterNumber
               No          OtherNumber

               P           Punctuation
               Pc          ConnectorPunctuation
               Pd          DashPunctuation
               Ps          OpenPunctuation
               Pe          ClosePunctuation
               Pi          InitialPunctuation
                           (may behave like Ps or Pe depending on usage)
               Pf          FinalPunctuation
                           (may behave like Ps or Pe depending on usage)
               Po          OtherPunctuation

               S           Symbol
               Sm          MathSymbol
               Sc          CurrencySymbol
               Sk          ModifierSymbol
               So          OtherSymbol

               Z           Separator
               Zs          SpaceSeparator
               Zl          LineSeparator
               Zp          ParagraphSeparator

               C           Other
               Cc          Control
               Cf          Format
               Cs          Surrogate   (not usable)
               Co          PrivateUse
               Cn          Unassigned

           Single-letter properties match all characters in any of the two-
           letter sub-properties starting with the same letter.  "LC" and "L&"
           are special cases, which are aliases for the set of "Ll", "Lu", and
           "Lt".

Reply via email to