Andrej Kastrin am Samstag, 18. Februar 2006 14.08:
> Dear Perl users,
>
> I try to parse 20.000.000 records file but... To solve my recent Perl
> problem I collect my previous posts on this list.
>
> I have bar separated file (FILE_A):
> name1|10
> name2|20
> name3|5
> name4|30
> etc.
>
> I processed it with the following code:
>
> my %scores;
> while ( <FILE_A> ) {
>     chomp;
>     my ($name, $score) = split /\|/;
>     $scores{$name} = $score;
> }
>
> Then I have another file (FILE_B) which looks like:
> ____________
> ID - 001
> NA - name1
> NA - name2
>
> ID - 002
> NA - name2
> NA - name4
>
> etc.
> ____________
>
> The code below reads each record from FILE_B (ID and NA fields) and sums
> corresponding  NA values from FILE_A:
>
> my ( $ID, %ids );
> while ( <FILE_B> ) {
>     if ( /^ID\s*-\s*(.+)/ ) {
>     $ID = $1;
>    }
>    elseif ( /^NA\s*-\s*(.+)/ ) {
>       $ids{ $ID } += $scores{ $1 };
>    }
> }
>
> for my $id ( keys %ids ) {
>     print "$id | $ids{$id}\n";
> }
>
> So we obtain:
> 001|30  #ID is 001 and 10+20=30
> 002|50  #ID is 002 and 20+30=50
>
> The script works perfect, but when I try to process larger files (eg.
> with 20 milions records!), it hangs. 

This is probably due to insufficient RAM, there are 2 hashes with 20 Mio. 
entries to keep in memory... (swapping? Did you see a lot of hard disk 
activity?)

> How should I modify this script 
> that I could process each record from FILE_B separately.

I did not think thoroughly about your particular example, but I think from a 
certain amount of data on to be processed - or if it's foreseeable that the 
amount of data is growing much - a database solution is more appropriate.

The solution (if you have to stick with text input) could include:
- transform input files into a format suitable to import into the db
- import
- create result table
- write result table data out into file

You could use one of the modules in the DBI group.

Or, redesign your application which produces the input text files to write 
directly to a database.

hth,
joe

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to