Re: combining data from more than one file...

Johan Viklund Wed, 19 May 2004 01:23:38 -0700

Hi,

See code

On Tue, 18 May 2004 13:16:37 -0400, Michael Robeson <[EMAIL PROTECTED]> wrote:

Ok great. Most of what you show does make sense. However, there are some bits of code that I need further clarification with. Some bits I am able to tell what they are doing but I do not quite know how or why they work they way they do. I'll state these areas in the code we've got together at this point.

Hopefully, I have copied over the bits you wrote correctly. I find this is like learning Spanish. I can read and (roughly) get the gist of the code. But when it comes to writing the original code on my own is when I have trouble. I am sure this will go away when I practice more. :-)

I didn't finish everything because I just need some code explained / clarified.
 >>>Start PERL code<<<<<
#!usr/bin/perl -w
use strict;
use FileHandle;
# I am unsure of what this module is. I've tried looking it up
# in the Camel and Llama book to no avail, not enough description.
# I guess I have to figure out the whole object thing?


# write 'perldoc FileHandle' on the commandline to see
# (you can do this with (hopefully) all new modules you come across).


my %organisms;

print "Enter in a list of files to be processed:\n";

# For example:
# Cytb.fasta
# NADH1.fasta
# ...

# chomp (my @infiles = <STDIN>);
# TODO we should make this nicer later
my @infiles = ('genetics.txt');

foreach my $infile(@infiles) {
        my $FASTA = new FileHandle;
        
        # Does the above statement tell PERL to create a new
        # filehandle for each file it finds? I guess I need to understand
        # what "new" and the module "FileHandle" are doing.


Right on.

        open ($FASTA, $infile)
                or die "Can't open INFILE:$!";
                
#$/='>' #Set input operator

my $orgID;

while (defined($_ = <$FASTA>)) {
        
        # Above I am unsure of why the "defined function
        # helps us here? I know it has something to do with an
        # expression containing a valid string, but I am unsure
        # of it's function here. This is something I would have
        # never thought to do.  :-)


It's what
while (<$FASTA>)
actually do.

the defined function checks wheter $_ gets set or not.

        chomp;
        print "\nworking on >>$_<<\n";
        
        if (\s*>(\w+)/) {
                $orgID=$1;
                print "Found a new organism start line ('$orgID')\n";
        
        # The above regex makes complete sense. Actually, I was going to put
        # something similar to that in my original post but wasn't sure
        # if this was appropriate at the time. I guess it was!
                
        } else {
                print "This is just some data: $_\n";
                print "This data needs to be appended to the hash entry for $orgID/n";

                # okay, in the above you are taking the left over
                # sequence ($_) and linking it as a "value" to "$orgID" ?

This if- then else statement should do what you want. I would do it like this instead: $organism{$orgID} .= $_;

no if and no else just that single line. Perl will just make it work the wat it's supposed to work; if the hashkey don't exists it gets created and the contents of $_ is inserted in it (as a string).

                if (exists ($organsims{$orgID})) {
                #TODO append the data to the hash here
                
                # I guess I would put the following to append to
                # the already existing hash:
                # $organism{$orgID} .= $_;
                
                } else {
                        #create new hash entry for this data
                        $organsims{$orgID} = $_;
                        }
                }       
        }
        
# Do not forget to close the input file
close ($FASTA)
        or die "Could not close INFILE: $!";

# We've processed all input files... print the resulting hash

print "\n*****************************************************\n";

while (my($orgID, $sequence) = each(%organisms)) {
        # since I want the output as:
        # >cat
        # actgac---cgatc-ag-cttag---acg
        # >dog
        # actatc---actat-at-accta---atc
        # I would change the print statement to:
        print "> . $orgID\n $sequence\n";

Hmm, you're trying to do string concatenation here but in that case it should be: print ">" . $orgID . "\n" . $sequence . "\n"; but it's much easier to just do it like: print ">$orgID\n$sequence\n";

}
end;
 >>>end PERL code<<<
Thanks for all your help so far! Most of this is starting help my thinking. I will be doing a lot more of this multi-file parsing as most of my work entails manipulating data in several files or folders at once.
-Mike

/Johan

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: combining data from more than one file...

Reply via email to