Hi,
data snippet:
ENTRY K00002 KO
NAME E1.1.1.2, adh
DEFINITION alcohol dehydrogenase (NADP+) [EC:1.1.1.2]
PATHWAY ko00010 Glycolysis / Gluconeogenesis
ko00561 Glycerolipid metabolism
ko00930 Caprolactam degradation
CLASS Metabolism; Carbohydrate Metabolism; Glycolysis /
Gluconeogenesis [PATH:ko00010]
Metabolism; Lipid Metabolism; Glycerolipid metabolism
[PATH:ko00561]
Metabolism; Xenobiotics Biodegradation and Metabolism;
Caprolactam degradation [PATH:ko00930]
DBLINKS RN: R00746 R01041 R05231
COG: COG0656
GO: 0008106
GENES HSA: 10327(AKR1A1)
PTR: 741418(AKR1A1)
PON: 100173796(AKR1A1)
MCC: 693380(AKR1A1)
MMU: 58810(Akr1a4)
RNO: 78959(Akr1a1)
CFA: 610537
///
ENTRY K00730 KO
NAME OST4
DEFINITION oligosaccharyl transferase complex subunit OST4
PATHWAY ko00510 N-Glycan biosynthesis
ko00513 Various types of N-glycan biosynthesis
ko04141 Protein processing in endoplasmic reticulum
MODULE M00072 Oligosaccharyltransferase
CLASS Metabolism; Glycan Biosynthesis and Metabolism; N-Glycan
biosynthesis [PATH:ko00510]
Metabolism; Glycan Biosynthesis and Metabolism; Various
types of N-glycan biosynthesis [PATH:ko00513]
Genetic Information Processing; Folding, Sorting and
Degradation; Protein processing in endoplasmic reticulum [PATH:ko04141]
DBLINKS GO: 0008250
GENES SCE: YDL232W(OST4)
AGO: AGOS_ABL170C
KLA: KLLA0A01287g
VPO: Kpol_1054p35
SSL: SS1G_13465
REFERENCE PMID:15001703
AUTHORS Zubkov S, Lennarz WJ, Mohanty S
TITLE Structural basis for the function of a minimembrane protein
subunit of yeast oligosaccharyltransferase.
JOURNAL Proc Natl Acad Sci U S A 101:3821-6 (2004)
///
I need to retrieve all the gene entries to add it to a hash ref. My code
does that in the first record but in the second case it also pulls out
the REFERENCE information. I have provided the code below. If some one
could tell me where exactly I am going wrong (is it in the regex? or
otherwise) I would be glad!!
code :
use strict;
use warnings;
use Carp;
use Data::Dumper;
my $set = parse("/home/venkates/workspace/KEGG_Parser/data/ko");
sub parse {
my $kegg_file_path = shift;
my $keggData; # Hash ref
open my $fh, '<', $kegg_file_path or croak("Cannot open file
'$kegg_file_path': $!");
local $/ = "\n///\n";
while (<$fh>){
chomp;
my $record = $_;
$record =~ m/^ENTRY\s{7}(.+?)\s+/xms;
my $entries = $1;
if ($record =~ m/^GENES\s{7}(.+)$/xms){
my $gene = $1;
${$keggData}{$entries}{'GENE'} = $gene;
my @genes = split ('\s{13}', $gene);
foreach my $gene_element (@genes){
my $taxon_label = substr($gene_element, 0, 3);
my $gene_label = substr($gene_element, 5);
my @gene_label_array = split '\s', $gene_label;
push @{${$keggData}{$entries}{'GENES'}{$taxon_label}},
@gene_label_array;
}
}
}
print Dumper($keggData);
close $fh;
}
Thanks,
Aravind