On Fri, 30 Jul 2004, Bob Showalter wrote:
> Date: Fri, 30 Jul 2004 13:52:57 -0400 > From: Bob Showalter <[EMAIL PROTECTED]> > To: 'Charlotte Hee' <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED] > Subject: RE: problem with splitting on "words" > > Charlotte Hee wrote: > > Hi Bob, > > > > In one of my tests I added the '>' to the character class [^\w->] but > > I still didn't get 'B0->'. > > I'm guessing it's because that looks like a range. Using [^\w\->] should > work. > > > I've just learned about character classes > > so I am trying to get a better handle on how they work. A lot of my > > titles contain physics terms like B0->K- and I would consider 'B0->' > > a word and 'K-' another word. > > OK. Instead of using split, why not capture the tokens you're interested in. > Something like: > > for my $w ($title =~ /([A-Za-z]+[^A-Za-z\s]*)\s*/g) { > That's amazing! Yes, that works. Let me see if I understand this expression: /([A-Za-z]+ This matches any letter, uppercase or lowercase, 1 or more times. [^A-Za-z\s]*) This matches anything that's not a letter, uppercase or lowercase, or a space, zero or more times. Here is how I will match my '->'. \s*/g This matches a blank space zero or more times and the 'g' means apply the whole thing globally. But why do I need the character classes in parentheses? thanks again! Chee -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>