Re: Multiple matching question

Mumia W. Thu, 10 Aug 2006 09:44:57 -0700

On 08/10/2006 09:12 AM, Roman Daszczyszak wrote:

Hello all,
I have several text files with a few thousand contacts in each, and Iam trying to pull out all the contacts from certain email domains(about 15 of them). I wrote a script that loops through each file,then loops through matching each domain to the line and writes theresults to two files, one for matches, one for non-matches.
I am just curious if there is a way to match all the domains in turn,without having a foreach looping through them?
Here's my code:
#!/perl/bin/perl
use strict;
use warnings;

my $program_time = time();
die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n" unless(@ARGV);
my $domain_filename = "intel_addresses.txt";
my @email_domains;
open(DOMAINS, "<$domain_filename") or die "Cannot open $domain_filename:$!\n";
chomp(@email_domains = <DOMAINS>);


You can simplify this by using File::Slurp, e.g.
use File::Slurp;
...
chomp (@email_domains = read_file($domain_filename));
# Email_domains is more useful as a hash:
my %email_domains = map +($_, 1), @email_domains;

LINE: while (<>)
{
   my $filename = $ARGV;
   $filename =~ s/\.csv//gi;
   open(FOUND, ">>${filename}_match.csv") or die "Cannot open
${filename}_match.csv\n";
   open(NOTFOUND, ">>${filename}_nomatch.csv") or die "Cannot open
${filename}_nomatch.csv\n";

This opens the output files each time a line in found from oneof the input file; that inefficient. I would leave out thereading from <> and do it completely differently.

   foreach my $domain (@email_domains)
   {
       if (m/$domain/i)
       {


I would just use the hash created above.

           print(FOUND $_);
           next LINE;
       }
   }
   print(NOTFOUND $_);
}
print("Run time: ",time() - $program_time,"\n");
---------------------------------------------------------------------------
Additionally, does anyone know of a better way to open the resultsfiles, keeping the practice of making two files for each original,without having to reopen the file on each iteration of the while loop?Does reopening the file cause a performance hit each open?


Yes, reopening will hurt performance--probably a lot.

You didn't post any data. It's not too easy to test a programwithout its data, but this is how I'd approach the problem:


#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp;
use Text::CSV_XS;

# WARNING: UNTESTED CODE

# Remove exit below to run.
exit;

# DOMAIN: Replace with the index value of your domain column.
my $DOMAIN = 6;
my $domain_filename = "intel_addresses.txt";

my $program_time = time();
die "SYNTAX: strip_email_addresses.pl FILE1 FILE2 .. FILE(N)\n"
    unless (@ARGV);

my %email_domains = map { chomp; $_, 1 }read_file($domain_filename);


my $csv = Text::CSV_XS->new();

foreach my $infile (@ARGV) {
    my $filename = $infile;
    $filename =~ s/\.csv//gi;
    my @data =

map { $csv->parse($_); [ $_, $csv->fields ] }read_file($infile);


    my (@found, @notfound);
    foreach my $rec (@data) {
        if ($email_domains{$rec->[$DOMAIN+1]}) {
            push @found, $rec->[0];
        } else {
            push @notfound, $rec->[0];
        }
    }

    write_file("${filename}_found.csv", @found);
    write_file("${filename}_notfound.csv", @notfound);
}


undef $csv;
# WARNING: UNTESTED CODE



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Multiple matching question

Reply via email to