On Jan 21, 2008 1:27 PM, Kevin Viel <[EMAIL PROTECTED]> wrote:
snip
> > while (<DATA>) {
> >     my ($snp, $genotype) = split;
> >     $data{$snp}{$genotype}++
> > }
snip
> >     $data{$snp}{$genotype}++
>
> Is the semicolon unnecessary for this line?
snip

The semi-colon in Perl is a statement separator not a statement
terminator (i.e. Pascal-like rather than ANSI C-like).  So the last
statement in a block does not require a semi-colon; however, the lack
of a semi-colon on that line was an oversight on my part (I am from an
ANSI C background so I tend to treat it like a terminator) even though
the code is fine.   The comma is also odd in Perl.  Saying

my @a = (1,2,3,4,5,);

is the same as saying

my @a = (1,2,3,4,5);


snip
> So, if I understand correctly  $data{$snp} is a value in a hash.  That value
> is a scalar that happens, in this case, to be a reference to an anonymous
> hash?  The key of this anonymous hash is $genotype?  It does not seem like
> the simplicity of this lines relates the complexity of the object:
>
> $data{ $snp }
> $data{ $snp }{ $genotype }
snip

I think I understand what you are saying, and it seems correct.  I am
going to restate everything about $data{$snp}{$genotype} just to make
sure we are on the same page.

%data is a hash.
$snp is a scalar
$data{$snp} yields a hash ref
$data{$snp}->{$genotype} (aka $data{$snp}{$genotype}) yields a scalar value

>
>
> ####
>
> Consider the code below:
>
> #! /usr/bin/perl
>
> use strict ;
> use warnings ;
>
> use Data::Dumper ;
>
> my %outer ;
>
> while ( <DATA> ) {
>   my ( $snp , $genotype ) = split , /\s+/ ;
snip

The default split for split is /\s+/, plus some magic to make spaces
at the start of a line disappear.  It is generally preferable to say

my ($snp, $genotype) = split;

unless you need padding spaces in $snp.

snip
> My goal is to print all of the SNP (keys of outer) that have more than two
> alleles (keys of the second anonymous hash).  How can I achieve this?
>
> for my $SNP_keys ( sort { $a cmp $b ) keys %outer ){
>
>   my $num_alleles = keys _______ ;
>
> }
>
> The values of outer, for instance $outer{ $num_alleles }, are scalar
> (references to anonymous hashes).  However, I cannot seem to dereference
> them.
snip

I am not sure which count you wanted, so I gave you both.  If you want
to know how many keys are in a given hash you can call keys in a
scalar context:

my $count = keys %hash;

For a hashrefs, you need to tell Perl to use the hashref like a hash
with %{} (in some cases you don't need the {})

my $count = keys %{$hashref};

When you want to iterate over all of the keys in a hash you can use
the keys in a list context:

for my $key (keys %hash) {
}

If it is a Hash of hashes you can just keep nesting (remembering to use %{}):

for my $k1 (keys %hash) {
    for my $k2 (keys %{$hash{$k1}}) {
        for my $k3 (keys %{$hash{$k1}{$k2}}) {
            print "the value at $k1, $k2, $k3 is $hash{$k1}{$k2}{$k3}\n";
        }
    }
}

Since hash keys are unordered, it is often useful to sort them before
iterating over them.


#! /usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

#this needs a better more descriptive name
#maybe %alleles_by_snp, I don't know your
#domain well enough to make a better suggestion
my %outer;
while ( <DATA> ) {
        my ($snp, $genotype) = split;
        $outer{$snp}{$_}++ for $genotype =~ /(.)/g;
}

for my $snp (sort keys %outer) {
        my $count = keys %{$outer{$snp}};
        my $total;
        $total += $outer{$snp}{$_} for keys %{$outer{$snp}};
        print "snp $snp has $count different alleles and a total of $total\n";
}

__DATA__
1 CC
1 CT
1 TT
1 NN
2 CC

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to