Hi, J77, Your question is not exactly clear, but I will try to answer what *I* think you are asking. First, I will assume that you have a list of web sites with and without "http://www." that you want to index like so:
http://www.microsoft.com http://ibm.com www.ebay.com yahoo.com Second, I will assume that if you have already indexed "http://www.microsoft.com" and you later come across "microsoft.com", that you do NOT want to index Microsoft again because you have already come across one of the six Microsoft formats you listed below. This does what's listed above and will index only 4 sites not 8: ------BEGIN CODE------ #!/usr/bin/perl use warnings; use strict; my %seen; while (<DATA>) { my ($site) = m!^(?:http://)?(?:www\.)?([^/\s]+)! or next; next if exists $seen{$site}; # code to index $site here $seen{$site} = undef; } __DATA__ http://www.microsoft.com http://ibm.com/ www.ebay.com yahoo.com microsoft.com/ www.ibm.com/ http://www.ebay.com www.yahoo.com/ -------END CODE------- If you have another file containing links that have already been indexed, you simply populate %seen using those file entries before the while loop that reads new sites. I hope this helps, ZO <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > if ($siteurl2 =~ /^(?:www.)?$FORM{'siteurl'}\/?$/) { > > print "Matched"; > > } <snip> > Ok, maybe I wasn't clear. What I want to do is check a URL against urls in > a list, so that all six forms of the url will match so duplicates won't be > indexed. > > http:/www.mysite.com/ > http:/www.mysite.com > http:/mysite.com/ > http:/mysite.com > www.mysite.com/ > www.mysite.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>