On Apr 9, Jason Larson said: > $extension =~ s/(^.+\.?)([^\.]*)$/$2/; > >Your regex will fail for a couple of different reasons: [snip]
Your regex will fail, too. Here's why... assuming $extension is "foobar.txt", here is how the regex matches: ^ matches the beginning of the string .+ matches the entire string \.? matches zero periods (since it can match zero or one) [^\.]* matches zero non-periods $ matches the end of the string So, even though the string DOES have a . in it, your regex fails to match it. The File::Basename module is far more thorough. If, however, you insist on using a regex instead, try something like this: ($extension) = $filename =~ /.*\.(.*)/s; The .* goes through the ENTIRE string, and then the \. forces the regex to backup to the last "." in the string; then the (.*) captures everything after that last ".". I have added the /s modifier in case the filename contains newlines (which is not a crime). Notice that $extension is in parentheses -- this creates a LIST on the left-hand side of the assignment. That means the regex is evaluated in list context. Thus, should the regex FAIL, it returns an EMPTY list, which means $extension becomes undef. A value of undef indicates there was NO extension, since "foo." has an extension of "". BUT! What if $filename is "/foo/bar.blat/gunk"? Oops. We get $extension being "blat/gunk". That's probably a mistake. This means we have to restrict our extension-searching regex to the LAST portion of the path -- the filename. But this requires us knowing what the path separator is; Unix uses /, Windows uses \, Mac uses : I think. Assuming you have the character in $SEP, then you can construct a better regex: ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?).*\.(.*)/; This works for me. The (?>...) part of the regex forces its sub-pattern to match without allowing it to backtrack. (As a simpler example, the regex "aaab" =~ /(?>a+)ab/ will never succeed, because (?>a+) will never allow itself to backtrack and give up one of the "a"s it matches.) The (?>(?:.*\Q$SEP\E)?) part of the regex matches ALL of the filename up to the last occurrence of its path-separator. Then the .* matches the rest, the actual name (and extension) of the file. The \. requires .* to back up to the last ".", and then (.*) matches and captures the extension. There is only ONE more issue to deal with. What do you do with a filename of /foo/bar/blat.txt.bak? Is the extension "txt", "txt.bak", or "bak"? Here are solutions for all three possibilities: # foo.txt.bak => txt ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?)[^.]*\.([^.]*)/; # foo.txt.bak => txt.bak ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?)[^.]*\.(.*)/; # foo.txt.bak => bak ($extension) = $filename =~ /^(?>(?:.*\Q$SEP\E)?).*\.(.*)/; Of course, you needn't use a single regex like this. You could use the split() function instead. Using the following to isolate the NAME from the path: $name = (split /\Q$SEP\E/, $filename)[-1]; we have these three solutions: # foo.txt.bak => txt $extension = (split /\./, $name)[1]; # foo.txt.bak => txt.bak $extension = (split /\./, $name, 2)[1]; # foo.txt.bak => bak $extension = (split /\./, $name)[-1]; Or you could change this second split() to a regex: # foo.txt.bak => txt ($extension) = $name =~ /\.([^.]*)/; # foo.txt.bak => txt.bak ($extension) = $name =~ /\.(.*)/; # foo.txt.bak => bak ($extension) = $name =~ /.*\.([^.]*)/; # or # ($extension) = $name =~ /\.([^.]*)$/; NOW THAT I HAVE SHOWN YOU ALL THE WORK YOU HAVE TO DO TO GET IT RIGHT... Please use File::Basename. Save us all the headache. -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ ** Look for "Regular Expressions in Perl" published by Manning, in 2002 ** <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course. [ I'm looking for programming work. If you like my work, let me know. ] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]