Re: Pivoting Data for use with gnuplot (solved)

Chuck Fox Sat, 15 Nov 2003 12:18:02 -0800

Chuck Fox wrote:

Fellow perl coders,

I am having trouble wrapping my head around how to accomplish the following: Given a dbi result set that consists of datetime, servername, databasename, procedurename and average elapsed time. Convert this into an data structure suitable for export to a file for use with gnuplot. The data is hourly and there may be gaps in the hours. The base problem is that I have to pivot the result sets so that for each unique occurrence of server/database/procedure I can associate the elapsed time to a reporting datetime. I think I understand what I have to feed gnuplot, ie. a file with a column for each server/database/procedure of elapsed times for each hour in the period. I believe that I want to make a hash of arrays, but I am confused as to how to pivot the data and accumulate/assign the arrays.

Here is a sample of the test data that I am using, (the full set for one application for a two week period is over 15,000 rows) <sql_result> report_datetime server_name database_name procedure_name avg_elapsed 10/30/2003 1:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 CommentEntry_Del 0.0004 10/30/2003 3:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 CommentEntry_Del 0.0020 10/30/2003 4:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 CommentEntry_Del 0.0004 10/30/2003 9:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 CommentEntry_Del 0.0013 10/30/2003 12:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 Comment_RtrByEntry 0.0007 10/30/2003 1:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 Comment_RtrByEntry 0.0007 10/30/2003 2:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 Comment_RtrByEntry 0.0011 10/30/2003 3:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 Comment_RtrByEntry 0.0012 10/30/2003 4:00:00.000 PM BLOGSDB_D01A_SYB BlogData001 Comment_RtrByEntry 0.0007 10/30/2003 2:00:00.000 PM BLOGSDB_D01A_SYB BlogData002 Comment_Rtr 0.0008 10/30/2003 9:00:00.000 PM BLOGSDB_D01A_SYB BlogData002 Comment_Rtr 0.0007 10/30/2003 10:00:00.000 PM BLOGSDB_D01A_SYB BlogData002 Comment_Rtr 0.0006 10/30/2003 11:00:00.000 PM BLOGSDB_D01A_SYB BlogData002 Comment_Rtr 0.0007 </sql_result>

What I am having difficulty with is how do I end up with :

<some_fancy_data_structure> BLOGSDB_D01A_SYB BLOGSDB_D01A_SYB BLOGSDB_D01A_SYB BlogData001 BlogData001 BlogData002 report_datetime CommentEntry_Del Comment_RtrByEntry Comment_Rtr 10/30/2003 12:00:00.000 PM 0.0013 0.0007 under 10/30/2003 1:00:00.000 PM 0.0004 0.0007 under 10/30/2003 2:00:00.000 PM undef 0.0011 0.0008 10/30/2003 3:00:00.000 PM 0.0020 0.0012 under 10/30/2003 4:00:00.000 PM 0.0004 0.0007 under 10/30/2003 9:00:00.000 PM undef undef 0.0007 10/30/2003 10:00:00.000 PM undef undef 0.0006 10/30/2003 11:00:00.000 PM undef undef 0.0007 </some_fancy_data_structure>

Ultimately, I am writing the above structure to a file so that I can take advantage of gnuplot (wish I had the luxury of GD or Chart, but system calls will have to do). I think I have to parse the data twice, once to determine the names of all the nodes and then again to populate each node. Is there a single pass solution ?

<groveling> Thanks in advance, this is a wonderful list, I have over 500 tips saved for the 3 or 4 months that I have been listening in. </groveling>

Chuck Fox

Got some sleep, some coffee and now that my head is properly wrapped, I am getting the output I want. Now my question is, how to make sure that I am not doing too much extraneous stuff and make it run as fast as possible. I think the only solution involves at least two passes through the data. One to determine the number of rows and columns and one to populate the table. Does anyone see optimizations for a one pass solution ?

Heres the code,
<code>
#!/usr/local/bin/perl

# always, always !
use strict;
use warnings;

# Declare some storage
my @ary = ();
my %times = ();
my %cols = ();
my @colnames = ();
my @row = ();

# Create some data $ary[0] = [( "10/30/03 12:00", "BLOGSDB_M01A_SYB", "BlogData001", "Comment_Ins", 0.0004 )]; $ary[1] = [( "10/30/03 2:00", "BLOGSDB_M01A_SYB", "BlogData001", "Comment_Ins", 0.0007 )]; $ary[2] = [( "10/30/03 3:00", "BLOGSDB_M01A_SYB", "BlogData001", "Comment_Ins", 0.0011 )]; $ary[3] = [( "10/30/03 7:00", "BLOGSDB_M01A_SYB", "BlogData001", "Comment_Ins", 0.0007 )]; $ary[4] = [( "10/30/03 12:00", "BLOGSDB_M01A_SYB", "BlogData002", "Comment_Ins", 0.0004 )]; $ary[5] = [( "10/30/03 1:00", "BLOGSDB_M01A_SYB", "BlogData002", "Comment_Ins", 0.0006 )]; $ary[6] = [( "10/30/03 2:00", "BLOGSDB_M01A_SYB", "BlogData002", "Comment_Ins", 0.0009 )]; $ary[7] = [( "10/30/03 12:00", "BLOGSDB_M01A_SYB", "BlogData003", "Comment_Ins", 0.0005 )]; $ary[8] = [( "10/30/03 2:00", "BLOGSDB_M01A_SYB", "BlogData003", "Comment_Ins", 0.0003 )]; $ary[9] = [( "10/30/03 5:00", "BLOGSDB_M01A_SYB", "BlogData003", "Comment_Ins", 0.0006 )]; $ary[10] = [( "10/30/03 7:00", "BLOGSDB_M01A_SYB", "BlogData003", "Comment_Ins", 0.0005 )]; $ary[11] = [( "10/30/03 9:00", "BLOGSDB_M01A_SYB", "BlogData003", "Comment_Ins", 0.0004 )];

# Load up the map of times we are going to use
foreach my $elem  ( @ary ) {
#    print @{$elem}[2] . "\n";
   $times{ @{ $elem }[0] }++;
   push( @colnames, join( ".", @{ $elem }[1..3] ) )
       unless $cols{ join( ".", @{ $elem }[1..3] ) }++;
}

# Assign the array index number to the hash
for( my $idx = 0; $idx < $#colnames + 1; $idx++ ) {
   $cols{ $colnames[ $idx ] } = $idx;
#    print "creating hash index for $colnames[ $idx ], $idx\n";
}

# Declare some more storage
$#row = $#colnames;
for( my $idx = 0; $idx <= $#colnames; $idx++ ){
   $row[$idx] = "0.0000";
}

# assign for each row of time
foreach my $time ( keys %times ) {
   $times{$time} = [ @row ];
}

# Now load each value into the structure foreach my $elem ( @ary ) { @{ $times{ @{ $elem }[0] } }[ $cols{ join( ".", @{ $elem }[1..3] ) } ] = @{ $elem }[4]; }

# And voila, results pivoted
foreach my $row ( sort keys %times ) {
   print $row . ", " . join( ", ", @{ $times{$row} } ) . "\n" ;
}

</code>

Regards,

Chuck


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Pivoting Data for use with gnuplot (solved)

Reply via email to