On 11-10-12 10:01 AM, Nathalie Conte wrote:
HI All,
I have 2 sets of files I want to compare,and I don't know where to start
to get what I want :(
I have a reference file ( see ref for example) with a chromosome name, a
start and a end position
Chr7 115249090 115859515
Chr8 25255496 29565459
Chr13 198276698 298299815
ChrX 109100951 109130998
and I have a file (file_test) file I want to parse against this
reference ref.txt
Chr1 115249098 Chr1 1362705 Chr8 25255996 Chr8 1362714 Chr1 1362735 ChrX
109100997
So if the position on the file_test is found in ref_file it is kept in a
new file, if not discarded.
I am looking for advises /modules I could use to compare those 2 files .
many thanks in advance for any tips
Nat
Try:
#!/usr/bin/env perl
use strict;
use warnings;
# file names; change as needed
my $ref_file = 'ref.txt';
my $data_file = 'test.txt';
# a hash for hold the start and end positions from the ref file
my %ref = ();
# main
load_ref();
scan();
# load the ref file into %ref
sub load_ref {
open my $ref_fh, '<', $ref_file or die "could not open $ref_file: $!\n";
while( my $line = <$ref_fh> ){
# extract the items from the line
my ( $id, $start, $end ) = split ' ', $line;
# store as HoH
$ref{$id} = {
start => $start,
end => $end,
};
}
close $ref_fh;
}
sub scan {
open my $data_fh, '<', $data_file or die "could not open $data_file:
$!\n";
while( my $line = <$data_fh> ){
# extract each pair of IDs and numbers
while( $line =~ m{ \s* (\S+) \s* (\S+) }gmsx ){
my $id = $1;
my $number = $2;
# see if the number is between the start and end
if( exists $ref{$id}
&& $ref{$id}{start} <= $number
&& $number <= $ref{$id}{end}
){
printf "%-7s % 15s\n", $id, $number;
}
}
}
close $data_fh;
}
__END__
--
Just my 0.00000002 million dollars worth,
Shawn
Confusion is the first step of understanding.
Programming is as much about organization and communication
as it is about coding.
The secret to great software: Fail early & often.
Eliminate software piracy: use only FLOSS.
"Make something worthwhile." -- Dear Hunter
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/