Hi,

This is a snippet of the data

ENTRY       K00001                      KO
NAME        E1.1.1.1, adh
DEFINITION  alcohol dehydrogenase [EC:1.1.1.1]
PATHWAY     ko00010  Glycolysis / Gluconeogenesis
            ko00071  Fatty acid metabolism
            ko00350  Tyrosine metabolism
            ko00625  Chloroalkane and chloroalkene degradation
            ko00626  Naphthalene degradation
            ko00830  Retinol metabolism
            ko00980  Metabolism of xenobiotics by cytochrome P450
            ko00982  Drug metabolism - cytochrome P450
///
ENTRY       K14865                      KO
NAME        U14snoRNA, snR128
DEFINITION  U14 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis [BR:ko03009]
///

I am trying to store this in the following data structure by splitting the file along the "///" and have each record in a hash with primary key as the ENTRY number and storing all the other info under that key :

$VAR1 = {
                    K00001    =>    {
                                                    'NAME'    =>    [
'E1.1.1.1', 'adh' ], 'DEFINITION' => 'alcohol dehydrogenase [EC:1.1.1.1]',
                                                    'PATHWAY'    =>    {
'ko00010' => 'Glycolysis / Gluconeogenesis',
                                                                                   
  'ko00071' =>  'Fatty acid metabolism'
}

I have started off with the following code:

sub parse{
    my $kegg_file_path = shift;
    my %keggData;
open my $fh, '<', $kegg_file_path || croak ("Cannot open file '$kegg_file_path': $!");
    my $contents = do{local $/, <$fh>};
    my @dataArray = split ('///', $contents);
    foreach my $currentLine (@dataArray){
        if ($currentLine =~ /^ENTRY\s{7}(.+?)\s+/){
            my $value = $1;
            $keggData{'ENTRY'} = $value;
        }
    }
print Dumper(%keggData);
close $fh;
}


but not sure how to proceed further and bring it to the data structure mentioned above, I am new to perl and trying to learn ways of parsing files so any help would be much appreciated.

thanks,

Aravind



Reply via email to