Hi,

data snippet:

ENTRY       K00002                      KO
NAME        E1.1.1.2, adh
DEFINITION  alcohol dehydrogenase (NADP+) [EC:1.1.1.2]
PATHWAY     ko00010  Glycolysis / Gluconeogenesis
            ko00561  Glycerolipid metabolism
            ko00930  Caprolactam degradation
CLASS Metabolism; Carbohydrate Metabolism; Glycolysis / Gluconeogenesis [PATH:ko00010] Metabolism; Lipid Metabolism; Glycerolipid metabolism [PATH:ko00561] Metabolism; Xenobiotics Biodegradation and Metabolism; Caprolactam degradation [PATH:ko00930]
DBLINKS     RN: R00746 R01041 R05231
            COG: COG0656
            GO: 0008106
GENES       HSA: 10327(AKR1A1)
            PTR: 741418(AKR1A1)
            PON: 100173796(AKR1A1)
            MCC: 693380(AKR1A1)
            MMU: 58810(Akr1a4)
            RNO: 78959(Akr1a1)
            CFA: 610537
///
ENTRY       K00730                      KO
NAME        OST4
DEFINITION  oligosaccharyl transferase complex subunit OST4
PATHWAY     ko00510  N-Glycan biosynthesis
            ko00513  Various types of N-glycan biosynthesis
            ko04141  Protein processing in endoplasmic reticulum
MODULE      M00072  Oligosaccharyltransferase
CLASS Metabolism; Glycan Biosynthesis and Metabolism; N-Glycan biosynthesis [PATH:ko00510] Metabolism; Glycan Biosynthesis and Metabolism; Various types of N-glycan biosynthesis [PATH:ko00513] Genetic Information Processing; Folding, Sorting and Degradation; Protein processing in endoplasmic reticulum [PATH:ko04141]
DBLINKS     GO: 0008250
GENES       SCE: YDL232W(OST4)
            AGO: AGOS_ABL170C
            KLA: KLLA0A01287g
            VPO: Kpol_1054p35
            SSL: SS1G_13465
REFERENCE   PMID:15001703
  AUTHORS   Zubkov S, Lennarz WJ, Mohanty S
TITLE Structural basis for the function of a minimembrane protein subunit of yeast oligosaccharyltransferase.
  JOURNAL   Proc Natl Acad Sci U S A 101:3821-6 (2004)
///

I need to retrieve all the gene entries to add it to a hash ref. My code does that in the first record but in the second case it also pulls out the REFERENCE information. I have provided the code below. If some one could tell me where exactly I am going wrong (is it in the regex? or otherwise) I would be glad!!

code :

use strict;
use warnings;
use Carp;
use Data::Dumper;


my $set = parse("/home/venkates/workspace/KEGG_Parser/data/ko");

sub parse {

    my $kegg_file_path = shift;
    my $keggData; # Hash ref

open my $fh, '<', $kegg_file_path or croak("Cannot open file '$kegg_file_path': $!");
    local $/ = "\n///\n";
    while (<$fh>){
        chomp;
        my $record = $_;
        $record =~ m/^ENTRY\s{7}(.+?)\s+/xms;
        my $entries = $1;
        if ($record =~ m/^GENES\s{7}(.+)$/xms){
            my $gene = $1;
            ${$keggData}{$entries}{'GENE'} = $gene;
            my @genes = split ('\s{13}', $gene);
            foreach my $gene_element (@genes){
                my $taxon_label = substr($gene_element, 0, 3);
                my $gene_label = substr($gene_element, 5);
                my @gene_label_array = split '\s', $gene_label;
push @{${$keggData}{$entries}{'GENES'}{$taxon_label}}, @gene_label_array;
            }
        }

    }
    print Dumper($keggData);
    close $fh;
}

 Thanks,

Aravind

Reply via email to