Hi Nat, some comments about your code.
On Wed, 1 Jul 2015 13:00:53 +0100 nco...@ebi.ac.uk wrote: > Hi, > I need some help with a hash of array of array. > this is my input data structure: > gene a al1 data1 data2 data9 > gene b al2 data3 data4 data10 > gene b al3 data5 data6 data12 > gene b al4 data7 data8 data12 > > I take each data variable, see above, from a sql query and parse the data > to build a new data structure: a hash of arrays of arrays. > In the input data presented here, the first column will be the key of the > hash and the other 4 columns should compose 4 arrays. > example :Each gene (gene a gene b ..) should be the keys, the al column > should be the first array, data1,3,5,7 should be in the second array and > so on for the third and fourth array. further more the data in the arrays > should be unique. Data should be grouped depending on their keys, I am > expecting this structure below, furthermore each variable in an array > should be unique. ex: for gene b key in the 4th array data12 should appear > once. > > Thanks for any tips. > Nat > > this is what I would like to achieve (dataDumper format) > > $VAR1 = { > gene b => [ Note that you need to quote the hash key if it contains whitespace: 'gene b' => [ > [ > 'al4','al2','al3' > ], > [ > 'data7','data3','data5' > ], > [ > 'data8','data4','data6' > ], > [ > 'data12','data10' > ] > ], > > gene a => [ > ['al1'], > ['data1'], > ['data2'], > ['data9'] > ] > }; > > > using the script below and the data dumper, I create a structure which is > not correct some of the data is erased and other grouped wrongly, 'gene a' > data goes into 'gene b' . gene a is empty. > > $VAR1 = { > gene b => [ > [ > 'al1', > 'al2', > 'al3' > ], > [ > 'data1', > 'data3', > 'data5' > ], > [ > 'data2', > 'data4', > 'data6' > ], > [ > 'data9', > 'data10', > 'data12' > ] > ], > gene a => [ > [], > [], > [], > [] > ] > }; > > > > here is my code > > #!/usr/local/bin/perl > use strict; > use warnings; It's good that you are using strict and warnings. > use DBI; > use Data::Dumper; > use List::MoreUtils qw(uniq); > > > > my $subrow_hash; > my $row_hash; > my %hasharray; You should not predeclare your variables: http://perl-begin.org/tutorials/bad-elements/#declaring_all_vars_at_top > > > my $geno_dbh = DBI->connect( credential...} ) || die "Database > connection not made: $DBI::errstr"; > print STDERR "Connection...\n"; > > > my $subsql = "SELECT * FROM table_2015"; > #this is the table structure > #gene a al1 data1 data2 data9 > #gene b al2 data3 data4 data10 > #gene b al3 data5 data6 data12 > #gene b al4 data7 data8 data12 > In general "SELECT *" is not recommended and you should list your fields explicitly. > > my $subresult = $geno_dbh->prepare($subsql); > $subresult->execute() or die "SQL Error: $DBI::errstr\n"; > > my @gene_name_list; > my @allele_list; > my @mp_list; > my @mp_list_def; > my @unique_mp_def; > my $Xref; > my @unique_gene_name_list ; # unique gene only > my @unique_allele_list;# unique allele only > my @unique_mp_list;# unique MP terms only > my $list; More predeclarations. > while ( $subrow_hash = $subresult->fetchrow_hashref) { > Better have a "my $subrow_hash" here. > my $symbol_id=$subrow_hash->{symbol};#this is the first set of data > for the first array 1. The line is too long. 2. There are no spaces around the equal sign "=". 3. Perhaps make it an array. > > my $allele_id=$subrow_hash->{allele_symbol};#this is the 2nd set of > data for the 2nd array > > my $mp_id=$subrow_hash->{phenotype_acc};#this is the 3rd set of data > for the 3rd array > > my $mp_def=$subrow_hash->{name};#this is the fourth set of data for > the fourth array > > $Xref=$subrow_hash->{xref_acc}; #this is the key of the hash (gene a > and gene b) > > > $list=[[@gene_name_list], [@allele_list], [@mp_list], > [@mp_list_def]];#create arrays of arrays > $list here accumulates several lists. > if ($Xref){ > > > $hasharray{$Xref} = $list; #create a hash of arrays of arrays > for a specific key > @gene_name_list =@{$list->[0]}; #maybe not necessary to > declare this. @allele_list =@{$list->[1]}; > @mp_list =@{$list->[2]}; > @mp_list_def =@{$list->[3]}; > Why is the assignment to the arrays again necessary. > > > if ($symbol_id){ It's better to use defined here: http://perldoc.perl.org/functions/defined.html > push (@gene_name_list, $symbol_id); #fill arrays with data > } > if ($allele_id){ > push (@allele_list, $allele_id); > } > if ($mp_id){ > push (@mp_list, $mp_id); > } > if ($mp_def){ > push (@mp_list_def, $mp_def); > } > } > > Perhaps you wish to peruse the resources in: http://perl-begin.org/topics/references/ Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ List of Text Editors and IDEs - http://shlom.in/IDEs Chuck Norris can read Perl code that was RSA encrypted. — http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/ Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/