On 6/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
hello all--

I have been trying to wrap my head around the right approach to doing what I
would normally do with SAS's "PROC TRANSPOSE" in Perl. The below toy dataset is
summarized from http://www.ats.ucla.edu/stat/sas/modules/wtol_transpose.htm.

I have searched CPAN and google and the list to the best of my ability for any
hints related to transposing a "long" dataset to a "wide" dataset and vice
versa. I assume I am missing an obvious solution to what I presume is a common
dilemma?

LONG:
famid year faminc
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000

desired output,WIDE: .
Obs    famid    _NAME_    faminc96    faminc97    faminc98
 1       1      faminc      40000       40500       41000
 2       2      faminc      45000       45400       45800
 3       3      faminc      75000       76000       77000

Variables I think I would need to make use of would be:
1) BY--uniquely identifies an individual, an observation, a primary key etc
2) ID--the category or grouping
2) var--the variable values to be transposed
3) prefix--(concatenated to the id for naming column output in long to wide)

if anyone has any thoughts on how to approach this problem or pointers to an
already rolled solution I would be most appreciative.

best,
sg


It sounds like you need a hash of HoAs (hash of arrays).  Look at
perldoc perldsc for more detailed info, but here is some ugly code
that seems to do what you want.  It is not very flexible, but probably
could be made so.

#!/usr/bin/perl

use strict;
use warnings;

my %ds;

#the first line is the headers, so read it
#in to use as keys to the hash
#FIXME: remove the hardcodes and make this a function
my $id    = 'famid';
my %joins = (
       _NAME_ => [ 'faminc', 'year' ]
);
my @headers = split ' ', <DATA>;
my $observation = 1; #aka rows if remember SAS correctly
while (<DATA>) {
       my %rec = zip([EMAIL PROTECTED], [split ' ']);
       my $id = $rec{$id}; #this masks the earlier $id
       unless ($ds{$id}) {
               $ds{$id}{obs} = $observation++;
       }
       for my $name (sort keys %joins) {
               my ($col, $var) = @{$joins{$name}};
               my $key = "$col$rec{$var}";
               $ds{$id}{$name} = $col;
               $ds{$id}{$key} = $rec{$col};
       }
}

my @keys = sort(uniq(map { keys %{$ds{$_}} } keys %ds));

print join("\t", $id, @keys), "\n";
for my $id (sort keys %ds) {
       print join("\t", $id, map { $ds{$id}{$_} } @keys), "\n";
}


#in Perl 6 a function like this, but better,
#is part of the core, I can't wait
sub zip {
       my ($a1, $a2) = @_;
       my $min = @$a1 < @$a2 ? $#$a1 : $#$a2;
       my @result;
       for my $i (0 .. $min) {
               push @result, $a1->[$i], $a2->[$i];
       }
       return @result;
}

sub uniq {
       my %hash = map { $_ => 1 } @_;
       return keys %hash;
}

__DATA__
famid year faminc
1 96 40000
1 97 40500
1 98 41000
2 96 45000
2 97 45400
2 98 45800
3 96 75000
3 97 76000
3 98 77000

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to