Hi Nathalie.

Please, next time create gist with updated code, and send link to it. Now
your code is messy and its really hard to tell anything about it.

One more comment - you mix variable  with computerish names and with names
from subject area. It's bad, you must always name your variables with names
from subject area.

Finally, its look like working version. I can't understand your aproach
with $list variable (and I think, it's bad way to solve your problem), so I
removed it and store variable directly into hasharray (again, rename it to
something understandable and useful from your problem domain).

Final code
https://gist.github.com/elcamlost/54f32cd48ae1106052e0



чт, 2 июля 2015 г. в 13:03, Nathalie Conte <nco...@ebi.ac.uk>:

> HI Shlomi,
> Thanks for your comments about best practise which I have implemented, Any
> ideas on why my hash of arrays of arrays is misbehaving?
> Thanks
> Nat
>
> On 1 Jul 2015, at 15:42, Shlomi Fish <shlo...@shlomifish.org> wrote:
>
> > Hi Nat,
> >
> > some comments about your code.
> >
> > On Wed, 1 Jul 2015 13:00:53 +0100
> > nco...@ebi.ac.uk wrote:
> >
> >> Hi,
> >> I need some help with a hash of array of array.
> >> this is my input data structure:
> >> gene a       al1     data1   data2   data9
> >> gene b       al2     data3   data4   data10
> >> gene b       al3     data5   data6   data12
> >> gene b       al4     data7   data8   data12
> >>
> >> I take each data variable, see above, from a sql query and parse the
> data
> >> to build a new data structure: a hash of arrays of arrays.
> >> In the input data presented here, the first column will be the key of
> the
> >> hash and the other 4 columns should compose 4 arrays.
> >> example :Each gene (gene a gene b ..) should be the keys, the al column
> >> should be the first array, data1,3,5,7 should be in the second array and
> >> so on for the third and fourth array. further more the data in the
> arrays
> >> should be unique. Data should be grouped depending on their keys, I am
> >> expecting this structure below, furthermore each variable in an array
> >> should be unique. ex: for gene b key in the 4th array data12 should
> appear
> >> once.
> >>
> >> Thanks for any tips.
> >> Nat
> >>
> >> this is what I would like to achieve (dataDumper format)
> >>
> >> $VAR1 = {
> >>  gene b => [
> >
> > Note that you need to quote the hash key if it contains whitespace:
> >
> >       'gene b' => [
> >
> >>    [
> >>      'al4','al2','al3'
> >>    ],
> >>    [
> >>      'data7','data3','data5'
> >>    ],
> >>    [
> >>      'data8','data4','data6'
> >>    ],
> >>    [
> >>      'data12','data10'
> >>    ]
> >>  ],
> >>
> >>  gene a => [
> >>    ['al1'],
> >>    ['data1'],
> >>    ['data2'],
> >>    ['data9']
> >>  ]
> >> };
> >>
> >>
> >> using the script below and the data dumper, I create a structure which
> is
> >> not correct some of the data is erased and other grouped wrongly, 'gene
> a'
> >> data goes into  'gene b' . gene a is empty.
> >>
> >> $VAR1 = {
> >>  gene b => [
> >>    [
> >>      'al1',
> >>      'al2',
> >>      'al3'
> >>    ],
> >>    [
> >>      'data1',
> >>      'data3',
> >>      'data5'
> >>    ],
> >>    [
> >>      'data2',
> >>      'data4',
> >>      'data6'
> >>    ],
> >>    [
> >>      'data9',
> >>      'data10',
> >>      'data12'
> >>    ]
> >>  ],
> >>  gene a => [
> >>    [],
> >>    [],
> >>    [],
> >>    []
> >>  ]
> >> };
> >>
> >>
> >>
> >> here is my code
> >>
> >> #!/usr/local/bin/perl
> >> use strict;
> >> use warnings;
> >
> > It's good that you are using strict and warnings.
> >
> >> use DBI;
> >> use Data::Dumper;
> >> use List::MoreUtils qw(uniq);
> >>
> >>
> >>
> >> my $subrow_hash;
> >> my $row_hash;
> >> my %hasharray;
> >
> >
> > You should not predeclare your variables:
> >
> > http://perl-begin.org/tutorials/bad-elements/#declaring_all_vars_at_top
> >
> >>
> >>
> >> my $geno_dbh = DBI->connect( credential...}    ) || die "Database
> >> connection not made: $DBI::errstr";
> >> print STDERR "Connection...\n";
> >>
> >>
> >> my $subsql = "SELECT * FROM table_2015";
> >> #this is the table structure
> >> #gene a      al1     data1   data2   data9
> >> #gene b      al2     data3   data4   data10
> >> #gene b      al3     data5   data6   data12
> >> #gene b      al4     data7   data8   data12
> >>
> >
> > In general "SELECT *" is not recommended and you should list your fields
> > explicitly.
> >
> >>
> >> my $subresult = $geno_dbh->prepare($subsql);
> >> $subresult->execute() or die "SQL Error: $DBI::errstr\n";
> >>
> >>      my @gene_name_list;
> >>      my @allele_list;
> >>      my @mp_list;
> >>      my @mp_list_def;
> >>      my @unique_mp_def;
> >>      my $Xref;
> >>      my @unique_gene_name_list ; # unique gene only
> >>      my @unique_allele_list;# unique allele only
> >>      my @unique_mp_list;# unique MP terms only
> >>      my $list;
> >
> > More predeclarations.
> >
> >> while ( $subrow_hash = $subresult->fetchrow_hashref) {
> >>
> >
> > Better have a "my $subrow_hash" here.
> >
> >>      my $symbol_id=$subrow_hash->{symbol};#this is the first set of data
> >> for the first array
> >
> > 1. The line is too long.
> >
> > 2. There are no spaces around the equal sign "=".
> >
> > 3. Perhaps make it an array.
> >
> >>
> >>      my $allele_id=$subrow_hash->{allele_symbol};#this is the 2nd set of
> >> data for the 2nd array
> >>
> >>      my $mp_id=$subrow_hash->{phenotype_acc};#this is the 3rd set of
> data
> >> for the 3rd array
> >>
> >>      my $mp_def=$subrow_hash->{name};#this is the fourth set of data for
> >> the fourth array
> >>
> >>      $Xref=$subrow_hash->{xref_acc}; #this is the key of the hash (gene
> a
> >> and gene b)
> >>
> >>
> >> $list=[[@gene_name_list], [@allele_list], [@mp_list],
> >> [@mp_list_def]];#create arrays of arrays
> >>
> >
> > $list here accumulates several lists.
> >
> >
> >> if ($Xref){
> >>
> >>
> >>              $hasharray{$Xref} = $list; #create a hash of arrays of
> arrays
> >> for a specific key
> >>              @gene_name_list =@{$list->[0]}; #maybe not necessary to
> >> declare this. @allele_list =@{$list->[1]};
> >>              @mp_list =@{$list->[2]};
> >>              @mp_list_def =@{$list->[3]};
> >>
> >
> > Why is the assignment to the arrays again necessary.
> >
> >>
> >>
> >>      if ($symbol_id){
> >
> > It's better to use defined here:
> >
> > http://perldoc.perl.org/functions/defined.html
> >
> >>              push (@gene_name_list, $symbol_id); #fill arrays with data
> >>      }
> >>      if ($allele_id){
> >>              push (@allele_list, $allele_id);
> >>      }
> >>      if ($mp_id){
> >>              push (@mp_list, $mp_id);
> >>      }
> >>      if ($mp_def){
> >>              push (@mp_list_def, $mp_def);
> >>      }
> >> }
> >>
> >>
> >
> > Perhaps you wish to peruse the resources in:
> >
> > http://perl-begin.org/topics/references/
> >
> > Regards,
> >
> >       Shlomi Fish
> >
> >
> > --
> > -----------------------------------------------------------------
> > Shlomi Fish       http://www.shlomifish.org/
> > List of Text Editors and IDEs - http://shlom.in/IDEs
> >
> > Chuck Norris can read Perl code that was RSA encrypted.
> >    — http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/
> >
> > Please reply to list if it's a mailing list post - http://shlom.in/reply
> .
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>

Reply via email to