Re: Regexes

Curtis Poe Mon, 06 Aug 2001 08:04:38 -0700
--- Me <[EMAIL PROTECTED]> wrote:
> > I have the number '08' and I want to serch for the '0' and repalce
> with
> > nothing '' being left with only '8'.  I am an extreme newbie to
> regular
> > expressions.  Could some one explain to me how I would go about
> searching
> > for the 0 and replacing with nothing giving me the final string of 8?
> 
> # put string in the defaut variable
> # search the default variable for 0 and replace with nothing
> # print the default variable
> $_ = '08';
> s/0//;
> print;

Careful about that.  The original question was not very specific.  If the variable 
with always be
"08", then there's no problem, but then, it's not really a variable and you could set 
it by hand. 
The substitution above has a number of cases that could be problematic:

NUMBER   RESULT
008      08
10       1
100      10
0        
1.0      1.
0.1      .1

What should be the appropriate result of substitution for those?  If the original 
poster meant "I
need to eliminate all leading zeros in numbers, then a substitution like "s/^0+//" 
might be more
appropriate.

NUMBER   RESULT
008      8
10       10
100      100
0        
1.0      1.0
0.1      .1

However, if we're eliminating leading zeros, this means we probably need to display 
them.  If so,
we usually want a "stand alone" zero to remain and at least one zero left before a 
decimal point. 
That makes the regular expression more complicated.  What we need to do then is 
specify *exactly*
what our requirements are and create a regular expression to match.

In this case, we need to eliminate all leading zeroes, so long as there is more than 
one digit
(this protects the stand-alone zero) and so long as the zero is *not* followed by a 
decimal point
(this prevents "0.1" from being changed to ".1").

Yeah, you're going to hate this, but this is why crafting a regex is more difficult 
than it first
seems.  I'm adding the /x modifier to the end of the regex so that I can embed 
comments.

    s/^     # The ^ binds the regex to the beginning of the string
       0+   # Find one or more zeros --
       (?!  # -- that are *not* followed by a decimal point (see below)
         \.
       )
       (    # Capture to $1
         .+ #   one or more of anything (this forces a str length > 1)
       )    # end capture
     /$1/x; # substitute above with $1

Or, in one line:

    s/^0+(?!\.)(.+)/$1/;

What it does:

We're looking for one or more zeros bound to the beginning of the string (that's what 
the ^ anchor
does).  The funny looking "(?! ...  )" is called a zero-width negative lookahead.  
What this does
is say "make sure that the regex I specify (replace the ... with the regex) is *not* 
following
what I previously matched".  Further, because it's "zero-width", it doesn't actually 
capture the
information to the match.  What that means is that if all I cared about was 
eliminating leading
zeroes that didn't have a decimal point following, I could do this:

    s/^0+(?!\.)//;

Because the decimal point isn't captured by the zero-width lookahead, it's not 
substituted away.

Taking a look at the regex again:

    s/^0+(?!\.)(.+)/$1/;

Notice that we match one or more zeros (0+) and later match one or more of anything 
(.+) except
newlines (the dot will not match a newline unless the /s modifier is placed on the 
regex).  Since
we have two different parts of the regex where we must match one or more characters, 
we are
ensuring that at least two characters get matched.  This ensures that "0" does not 
become "".

Here's what my regex produces:

NUMBER   RESULT
008      8
10       10
100      100
0        0
1.0      1.0
0.1      0.1

Depending upon the actual needs, my regex might need to change or sprintf, printf, or 
formats may
be considered.

Cheers,
Curtis Poe



=====
Senior Programmer
Onsite! Technology (http://www.onsitetech.com/)
"Ovid" on http://www.perlmonks.org/

__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: Regexes

Reply via email to