Hi Nat,

some comments about your code.

On Wed, 1 Jul 2015 13:00:53 +0100
nco...@ebi.ac.uk wrote:

> Hi,
> I need some help with a hash of array of array.
> this is my input data structure:
> gene a        al1     data1   data2   data9
> gene b        al2     data3   data4   data10
> gene b        al3     data5   data6   data12
> gene b        al4     data7   data8   data12
> 
> I take each data variable, see above, from a sql query and parse the data
> to build a new data structure: a hash of arrays of arrays.
> In the input data presented here, the first column will be the key of the
> hash and the other 4 columns should compose 4 arrays.
> example :Each gene (gene a gene b ..) should be the keys, the al column
> should be the first array, data1,3,5,7 should be in the second array and
> so on for the third and fourth array. further more the data in the arrays
> should be unique. Data should be grouped depending on their keys, I am
> expecting this structure below, furthermore each variable in an array
> should be unique. ex: for gene b key in the 4th array data12 should appear
> once.
> 
> Thanks for any tips.
> Nat
> 
> this is what I would like to achieve (dataDumper format)
> 
> $VAR1 = {
>   gene b => [

Note that you need to quote the hash key if it contains whitespace:

        'gene b' => [

>     [
>       'al4','al2','al3'
>     ],
>     [
>       'data7','data3','data5'
>     ],
>     [
>       'data8','data4','data6'
>     ],
>     [
>       'data12','data10'
>     ]
>   ],
> 
>   gene a => [
>     ['al1'],
>     ['data1'],
>     ['data2'],
>     ['data9']
>   ]
> };
> 
> 
> using the script below and the data dumper, I create a structure which is
> not correct some of the data is erased and other grouped wrongly, 'gene a'
> data goes into  'gene b' . gene a is empty.
> 
> $VAR1 = {
>   gene b => [
>     [
>       'al1',
>       'al2',
>       'al3'
>     ],
>     [
>       'data1',
>       'data3',
>       'data5'
>     ],
>     [
>       'data2',
>       'data4',
>       'data6'
>     ],
>     [
>       'data9',
>       'data10',
>       'data12'
>     ]
>   ],
>   gene a => [
>     [],
>     [],
>     [],
>     []
>   ]
> };
> 
> 
> 
> here is my code
> 
> #!/usr/local/bin/perl
> use strict;
> use warnings;

It's good that you are using strict and warnings.

> use DBI;
> use Data::Dumper;
> use List::MoreUtils qw(uniq);
> 
> 
> 
> my $subrow_hash;
> my $row_hash;
> my %hasharray;


You should not predeclare your variables:

http://perl-begin.org/tutorials/bad-elements/#declaring_all_vars_at_top

> 
> 
> my $geno_dbh = DBI->connect( credential...}    ) || die "Database
> connection not made: $DBI::errstr";
> print STDERR "Connection...\n";
> 
> 
> my $subsql = "SELECT * FROM table_2015";
> #this is the table structure
> #gene a       al1     data1   data2   data9
> #gene b       al2     data3   data4   data10
> #gene b       al3     data5   data6   data12
> #gene b       al4     data7   data8   data12
> 

In general "SELECT *" is not recommended and you should list your fields
explicitly.

> 
> my $subresult = $geno_dbh->prepare($subsql);
> $subresult->execute() or die "SQL Error: $DBI::errstr\n";
> 
>       my @gene_name_list;
>       my @allele_list;
>       my @mp_list;
>       my @mp_list_def;
>       my @unique_mp_def;
>       my $Xref;
>       my @unique_gene_name_list ; # unique gene only
>       my @unique_allele_list;# unique allele only
>       my @unique_mp_list;# unique MP terms only
>       my $list;

More predeclarations.

> while ( $subrow_hash = $subresult->fetchrow_hashref) {
> 

Better have a "my $subrow_hash" here.

>       my $symbol_id=$subrow_hash->{symbol};#this is the first set of data
> for the first array

1. The line is too long.

2. There are no spaces around the equal sign "=".

3. Perhaps make it an array.

> 
>       my $allele_id=$subrow_hash->{allele_symbol};#this is the 2nd set of
> data for the 2nd array
> 
>       my $mp_id=$subrow_hash->{phenotype_acc};#this is the 3rd set of data
> for the 3rd array
> 
>       my $mp_def=$subrow_hash->{name};#this is the fourth set of data for
> the fourth array
> 
>       $Xref=$subrow_hash->{xref_acc}; #this is the key of the hash (gene a
> and gene b)
> 
> 
> $list=[[@gene_name_list], [@allele_list], [@mp_list], 
> [@mp_list_def]];#create arrays of arrays
> 

$list here accumulates several lists.


> if ($Xref){
> 
> 
>               $hasharray{$Xref} = $list; #create a hash of arrays of arrays
> for a specific key
>               @gene_name_list =@{$list->[0]}; #maybe not necessary to
> declare this. @allele_list =@{$list->[1]};
>               @mp_list =@{$list->[2]};
>               @mp_list_def =@{$list->[3]};
> 

Why is the assignment to the arrays again necessary. 

> 
> 
>       if ($symbol_id){

It's better to use defined here:

http://perldoc.perl.org/functions/defined.html

>               push (@gene_name_list, $symbol_id); #fill arrays with data
>       }
>       if ($allele_id){
>               push (@allele_list, $allele_id);
>       }
>       if ($mp_id){
>               push (@mp_list, $mp_id);
>       }
>       if ($mp_def){
>               push (@mp_list_def, $mp_def);
>       }
> }
> 
> 

Perhaps you wish to peruse the resources in:

http://perl-begin.org/topics/references/

Regards,

        Shlomi Fish


-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
List of Text Editors and IDEs - http://shlom.in/IDEs

Chuck Norris can read Perl code that was RSA encrypted.
    — http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to