Hi,
Thanks a lot for the help, i had one more question. How can add diff
values from multiple lines to the same hash ref? for example in the
snippet data
PATHWAY ko00010 Glycolysis / Gluconeogenesis
ko00071 Fatty acid metabolism
ko00350 Tyrosine metabolism
ko00625 Chloroalkane and chloroalkene degradation
ko00626 Naphthalene degradation
I want it to stored in the following manner:
2' => {
'PATHWAY' => {
'ko00010' => 'Glycolysis /
Gluconeogenesis'
'ko00071' => ' Fatty acid
metabolism'
},
};
Thanks,
Aravind
On 6/2/2011 5:06 PM, Rob Coops wrote:
On Thu, Jun 2, 2011 at 4:41 PM, venkates<venka...@nt.ntnu.no> wrote:
On 6/2/2011 2:44 PM, Rob Coops wrote:
On Thu, Jun 2, 2011 at 1:28 PM, venkates<venka...@nt.ntnu.no> wrote:
On 6/2/2011 12:46 PM, John SJ Anderson wrote:
On Thu, Jun 2, 2011 at 06:41, venkates<venka...@nt.ntnu.no> wrote:
Hi,
I want to parse a file with contents that looks as follows:
[ snip ]
Have you considered using this module? ->
<http://search.cpan.org/dist/BioPerl/Bio/SeqIO/kegg.pm>
Alternatively, I think somebody on the BioPerl mailing list was
working on another KEGG parser...
chrs,
j.
I am doing this as an exercise to learn parsing techniques so guidance
help needed.
Aravind
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/
This is a simple and ugly way of parsing your file:
use strict;
use warnings;
use Carp;
use Data::Dumper;
my $set = parse("ko");
sub parse {
my $keggFile = shift;
my $keggHash;
my $counter = 1;
open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile':
$!");
while (<$fh> ) {
chomp;
if ( $_ =~ m!///! ) {
$counter++;
next;
}
if ( $_ =~ /^ENTRY\s+(.+?)\s/sm ) { ${$keggHash}{$counter} = { 'ENTRY'
=>
$1 }; }
While trying a similar thing for DEFINITION record, instead of appending
current hash with ENTRY and NAME, the DEFINITION record replaces the
contents in the hash?
$VAR1 = {
'4' => {
'DEFINITION' => 'U18 small nucleolar RNA'
},
'1' => {
'DEFINITION' => 'alcohol dehydrogenase [EC:1.1.1.1]'
},
'3' => {
'DEFINITION' => 'U14 small nucleolar RNA'
},
'2' => {
'DEFINITION' => 'alcohol dehydrogenase (NADP+)
[EC:1.1.1.2]'
},
'5' => {
'DEFINITION' => 'U24 small nucleolar RNA'
}
};
code: in addition to what you had suggested -
if($_ =~ /^DEFINITION\s{2}(.+)?/){
${$keggHash}{$counter} = {'DEFINITION' => $1};
}
if ( $_ =~ /^NAME\s+(.*)$/sm ) {
my $temp = $1;
$temp =~ s/,\s/,/g;
my @names = split /,/, $temp;
push @{${$keggHash}{$counter}{'NAME'}}, @names;
}
}
close $fh;
print Dumper $keggHash;
}
The output being:
$VAR1 = {
'1' => {
'NAME' => [
'E1.1.1.1',
'adh'
],
'ENTRY' => 'K00001'
},
'3' => {
'NAME' => [
'U18snoRNA',
'snR18'
],
'ENTRY' => 'K14866'
},
'2' => {
'NAME' => [
'U14snoRNA',
'snR128'
],
'ENTRY' => 'K14865'
}
};
Which to me looks sort of like what you are looking for.
The main thing I did was read the file one line at a time to prevent a
unexpectedly large file from causing memory issues on your machine (in the
end the structure that you are building will cause enough issues
when handling a large file.
You already dealt with the Entry bit so I'll leave that open though I
slightly changed the regex but nothing spectacular there.
The Name bit is simple as I just pull out all of them then then remove all
spaces and split them into an array, feed the array to the hash and hop
time
for the next step which is up to you ;-)
I hope it helps you a bit, regards,
Rob
What you do: ${$keggHash}{$counter} = {'DEFINITION' => $1};
Try the following: $keggHash}{$counter}{'DEFINITION'} = $1;
To make things a little clearer look at the following example.
my %hash;
$hash{'Key 1'} = { 'Nested Key 1' => 'Value 1' };
What you do is say: $hash{'Key 1'} = { 'Nested Key 2' => 'Value 2' }
What I do is: $hash{'Key 1'}{'Nested Key 2'} = 'Value 2'}
In your script you will end up with the following:
$VAR1 = {
'Key 1' => {
'Nested Key 2' => 'Value 2',
},
};
Where mine will result in:
$VAR1 = {
'Key 1' => {
'Nested Key 1' => 'Value 1',
'Nested Key 2' => 'Value 2',
},
};
Not that much different but you are basically over writting the value (
{NAME=>[], ENTRY=>''} ) associated with your key ($counter) with {
'DESCRIPTION' => ''}. If you instead add a new key to the hash that is
associated with your main key ($counter) then you will get the result you
are looking for.
Regards,
Rob
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/