On 7/31/07, Nevada <[EMAIL PROTECTED]> wrote: > Hello, > > In the Perl Cookbook recipe 1.14 - "Properly Capitalizing a Title or > Headline" I see this substitution: > > s/(\pL[\pL']*)/$nocap{$1} ? lc($1) : ucfirst(lc($1))/ge; > > if a word appears in the nocap hash, it is made lower case. the \pL > matches a lower case character but what is [\pL']? > > Thanks, > > NS
It appears to match a Unicode letter character. from perldoc perlre \pP Match P, named property. Use \p{Prop} for longer names. see perlunicode for more details about "\pP", "\PP", and "\X", and per�\ luniintro about Unicode in general. You can define your own "\p" and "\P" properties, see perlunicode. from perldoc perlunicode ・ Named Unicode properties, scripts, and block ranges may be used like character classes via the "\p{}" "matches property" construct and the "\P{}" negation, "doesn't match property". For instance, "\p{Lu}" matches any character with the Unicode "Lu" (Letter, uppercase) property, while "\p{M}" matches any character with an "M" (mark--accents and such) property. Brackets are not required for single letter properties, so "\p{M}" is equivalent to "\pM". Many predefined properties are available, such as "\p{Mir�\ rored}" and "\p{Tibetan}". Here are the basic Unicode General Category properties, followed by their long form. You can use either; "\p{Lu}" and "\p{Uppercase�\ Letter}", for instance, are identical. Short Long L Letter LC CasedLetter Lu UppercaseLetter Ll LowercaseLetter Lt TitlecaseLetter Lm ModifierLetter Lo OtherLetter M Mark Mn NonspacingMark Mc SpacingMark Me EnclosingMark N Number Nd DecimalNumber Nl LetterNumber No OtherNumber P Punctuation Pc ConnectorPunctuation Pd DashPunctuation Ps OpenPunctuation Pe ClosePunctuation Pi InitialPunctuation (may behave like Ps or Pe depending on usage) Pf FinalPunctuation (may behave like Ps or Pe depending on usage) Po OtherPunctuation S Symbol Sm MathSymbol Sc CurrencySymbol Sk ModifierSymbol So OtherSymbol Z Separator Zs SpaceSeparator Zl LineSeparator Zp ParagraphSeparator C Other Cc Control Cf Format Cs Surrogate (not usable) Co PrivateUse Cn Unassigned Single-letter properties match all characters in any of the two- letter sub-properties starting with the same letter. "LC" and "L&" are special cases, which are aliases for the set of "Ll", "Lu", and "Lt".