Hi,

I want to parse a file with contents that looks as follows:

ENTRY       K00001                      KO
NAME        E1.1.1.1, adh
DEFINITION  alcohol dehydrogenase [EC:1.1.1.1]
PATHWAY     ko00010  Glycolysis / Gluconeogenesis
                      ko00071  Fatty acid metabolism
///
ENTRY       K14865                      KO
NAME        U14snoRNA, snR128
DEFINITION  U14 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis [BR:ko03009]
///
ENTRY       K14866                      KO
NAME        U18snoRNA, snR18
DEFINITION  U18 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis [BR:ko03009]
///

each record ends with "///". The ultimate aim is to store information from each record (for instance ENTRY, NAME) in a data structure (hash) such as (ENTRY => K14865; NAME => [U14snoRNA, snR128]... so on)

so to start of  I have produced the following snippet:

use strict;
use warnings;
use Carp;
use Data::Dumper;

my $set = &parse("D:/workspace/KEGG_Parser/data/ko");


sub parse {
    my $keggFile = shift;
    my $keggHash;
open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile': $!");
    my $contents = do {local $/; <$fh>};
    my @rec = split ('///', $contents);

       foreach my $line (@{rec}){
           next if ($line =~ /^\s*$/);
           if ($line =~ /^ENTRY\s{7}(.+?)\s+/){
               $keggHash->{'ENTRY'}= $1;
           }
           elsif ($line =~ /^NAME\s{8}(.+?)$/){

               push @{$keggHash->{'NAME'}}, $1;
               }
               else{}
print Dumper($keggHash);
close $fh;
}

The output I get is

$VAR1 = {
          'ENTRY' => 'K00001'
        };

Not all the lines in each element of @rec is getting read.I would appreciate if somebody could guide me through this.

Thank to all,

Aravind

Reply via email to