web crawler

2012-12-15 Thread venkates
Hi, I am trying to write a minimal web crawler. The aim is to discover new URLs from the seed and crawl these new URLs further. The code is as follows: use strict; use warnings; use Carp; use Data::Dumper; use WWW::Mechanize; my $url = "http://foobar.com";; # example my %links; my $mech = W

Sluggish code

2012-06-11 Thread venkates
Hi all, I am trying to filter files from a directory (code provided below) by comparing the contents of each file with a hash ref (a parsed id map file provided as an argument). The code is working however, is extremely slow. The .csv files (81 files) that I am reading are not very large (l

Parsing repetitive lines

2012-05-03 Thread venkates
Hi all, I am trying to parse a tab-delimited file which has repeating lines. This is causing problems while parsing it to the data structure (see below). I would appreciate if you could help me solve this. Thanks, Aravind sub parse { my $pazar_file_path = shift; my $pazar_data; #

Error message: Use of uninitialized value in concatenation (.) or string

2012-02-13 Thread venkates
Hi All, foreach my $ncbi_tax_id ( keys %{$new_proteins}) { my %kegg_map = (); my $up_tax_map = read_map ( "$up_maps_dir/$taxon_labels{$ncbi_tax_id}.map"); foreach my $gene_id ( keys %{$new_proteins->{$ncbi_tax_id}}) { foreach my $up_ac ( keys

solution for Regex

2011-06-09 Thread venkates
second case it also pulls out the REFERENCE information. I have provided the code below. If some one could tell me where exactly I am going wrong (is it in the regex? or otherwise) I would be glad!! code : use strict; use warnings; use Carp; use Data::Dumper; my $set = parse("/home/ven

Adding file contents into hashes

2011-06-07 Thread venkates
Hi, This is a snippet of the data ENTRY K1 KO NAMEE1.1.1.1, adh DEFINITION alcohol dehydrogenase [EC:1.1.1.1] PATHWAY ko00010 Glycolysis / Gluconeogenesis ko00071 Fatty acid metabolism ko00350 Tyrosine metabolism

Re: Parsing file

2011-06-02 Thread venkates
gt; 'Glycolysis / Gluconeogenesis' 'ko00071' => ' Fatty acid metabolism' }, }; Thanks, Aravind On 6/2/2011 5:06 PM, Rob Coops wrote: On Thu, Jun 2, 2011 at 4:41 PM, venkates wrote: On 6/2/2011 2:44 PM,

Re: Parsing file

2011-06-02 Thread venkates
On 6/2/2011 2:44 PM, Rob Coops wrote: On Thu, Jun 2, 2011 at 1:28 PM, venkates wrote: On 6/2/2011 12:46 PM, John SJ Anderson wrote: On Thu, Jun 2, 2011 at 06:41, venkates wrote: Hi, I want to parse a file with contents that looks as follows: [ snip ] Have you considered using this

Re: Parsing file

2011-06-02 Thread venkates
On 6/2/2011 12:46 PM, John SJ Anderson wrote: On Thu, Jun 2, 2011 at 06:41, venkates wrote: Hi, I want to parse a file with contents that looks as follows: [ snip ] Have you considered using this module? -> <http://search.cpan.org/dist/BioPerl/Bio/SeqIO/kegg.pm> Alternatively

Parsing file

2011-06-02 Thread venkates
Hi, I want to parse a file with contents that looks as follows: ENTRY K1 KO NAMEE1.1.1.1, adh DEFINITION alcohol dehydrogenase [EC:1.1.1.1] PATHWAY ko00010 Glycolysis / Gluconeogenesis ko00071 Fatty acid metabolism /// ENTRY

How do I remove repreating lines from an array?

2010-09-30 Thread venkates
Hi all, I have written a piece of code which takes in to list of terms and retrieves the intersection between the two list of terms. The code works fine but it the intersection list has redundant in its terms. How do I fix this? The code is given below. #!/usr/bin/perl -w use Carp; use stri