Extracting Columns from tab delimited files

Tiago Hori Sun, 10 Feb 2013 17:58:29 -0800

Hi All,

I am trying to force myself to not use one of perl's modules to parse tab
delimited files (like TXT::CSV), so please be patient and don't tell me
just to go and use them. I am trying to re-ivent the wheel, so to speak,
because as we do with science, we repeat experiments to lean about the
process even tough we know the outcome.


So I started by putting reading in the files and go one line at time,
putting those line in arrays and matching a specific line of interest. With
join I could then turn the array of interest in a scalar and print that
out. That is almost what I wanted (see code below):

#! /usr/bin/perl
use strict;
use warnings;

my $filename_data = $ARGV[0];
my $filename_target = $ARGV[1];
my $line_number = 1;
my @targets;

open FILE, "<", $filename_data or die $!;
open TARGET, "<", $filename_target or die $!;

while (<TARGET>){
    push (@targets, $_);
}

close (TARGET);

while (<FILE>){
    chomp;
    my $line = $_;
    my @elements = split ("\t", $line);
    my $row_name = $elements[0];
    if ($line_number == 1){
my $header = join("\t", @elements);
print $header, "\n";
$line_number = 2;}
    elsif($line_number = 2){
          foreach (@targets){
      chomp;
              my $target = $_;
              if ($row_name eq $target){
  my $data = join("\t", @elements);
          print $data,"\n";
      }
  }
    }
}

close (FILE);

Realistic, I don't want the whole row. So I started thinking about how to
get specific columns. I started reading on the internet and the ideas seems
to be placing the arrays containing the lines in a hash indexed by the row
names. So I did this:

#! /usr/bin/perl

use strict;
use warnings;

my %hash;
open FILE, "test.txt" or die $!;
my $key;

while (my $line = <FILE>) {
     chomp($line);
     my @array = split("\t", $line);
     $key = shift(@array);
     print $key, "\n";
     push @{ $hash{$key} }, @array;
}

my $out = join("\t",@{$hash{Row}});
print $out,"\n";
print $hash{Row}[1];


close FILE;

That works fine and reads the lines into an array with the elements
separated into the hash.

So my questions are: is this the best way of accomplishing this two tasks?
I am really looking for suggestions to improve what I am doing. Is using
the hashes the best way?

I am have trying to figure out a way of extracting the columns and placing
them in a hash or a hash of arrays or an array. I can't figure out or find
it in the internet. Can someone give me directions? I am not asking for
code, but some ideas or some pseudocode would be great!

Thanks!

T.


-- 
"Education is not to be used to promote obscurantism." - Theodonius
Dobzhansky.

"Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio"

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori. PhD.
Ocean Science Center-Memorial University of Newfoundland

Extracting Columns from tab delimited files

Reply via email to