Thanks for the input. The only trouble I would have with that is the file size. My files are HUGE. I don't think the admins around here would like me doing that. What I was thinking was to generate an array with a bunch of random numbers in numerical order. Then run through the file and print only those lines.
The array would contain as many elements as the number of samples needed. eg. $#array = $sampsize. Then I was thinking something like this: my $i, $j; select(OUT); while(<FILE>) { $i++; #current line of file if( $i = $array[$j] ) { # where j is the position in the array print; $j++; } } The main trouble I am having is create the random array. I need to make sure there any no dups. my @array; while( $#array <= $sampsize ) { $rnd = rand $sampsize; # check for dups # store in array } I think once I have the array created I should be fine. Any ideas? Thanks .. .. -J-e-s-s- Jess Balint wrote: > > Hello all, I have a file of 3,210,008 CSV records. I need to take a random > sample of this. I tried hacking something together a while ago, but it > seemed to repeat 65,536 different records. When I need a 5mil sample, this > creates a problem. > > Here is my old code: I know the logic allows dups, but what would incur the > limit? I think with 500,000 samples there wouldn't be a problem getting more > than 65536 diff records, but that number is too ironic for me to deal with. > Thanks. > > #!/usr/local/bin/perl -w > > open (FILE,"consumer.sample.sasdump.txt"); > open (NEW,">consumer.new"); > > @data = <FILE>; > > for ( $jess == 1; $jess < 500000; $jess++ ) { > $index = rand @data; > print NEW $data[$index]; > } > > close(FILE); > close(NEW); This should do what you want: #!/usr/local/bin/perl -w use strict; srand; open FILE, 'consumer.sample.sasdump.txt' or die "Cannot open 'consumer.sample.sasdump.txt': $!"; open NEW, '> consumer.new' or die "Cannot open 'consumer.new': $!"; my @data = <FILE>; for ( 1 .. 500_000 ) { print NEW splice @data, rand @data, 1; } close FILE; close NEW; John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]