On Thu, Sep 15, 2011 at 3:48 PM, Rob <[email protected]> wrote:
> I have a file of test results it is formatted as follows:
>
> School |fname| lname | sub| testnum|score| grade|level
> MLK School | John | Smith | RE | Test 1| 95| A | Prof
> MLK School | John | Smith | RE | Test 2| 97| A | Prof
> MLK School | John | Smith | RE | Test 3| 93| A | Prof
> MLK School | John | Smith | RE | Test 4| 89| B | NP
>
> What I would like to come out with is as follows:
>
> SCHOOL |fname| lname | sub|
> testnum|score| grade|level
> MLK School |John|Smith|RE|Test 1| 95| A | Prof| Test 2| 97|A|Prof|
> Test 3|93|A|Prof|Test4|89|B|NP
There are a number of CSV modules available on CPAN. I've only
ever used Text::CSV::Slurp and the file was the standard (?)
quoted-columns-comma-separated format. Regardless, if necessary,
you should be able to configure your module of choice to parse
whichever delimiters that you need.
To critique your code:
You should begin all programs with:
use strict;
use warnings;
> $file_to_read ="E:/My Documents/KNOWS/Second Run at Knows/
> KNOWS_All_Student_Benchmark_Results_Improved_2011091402.csv";
> $file_to_write ="E:/My Documents/KNOWS/Second Run at Knows/
> KNOWS_All_Student_Benchmark_Results_One_File_2011091402.csv";
With the above pragmas you would need to declare these variables
with 'my' (which you should be doing anyway).
> open( file1, $file_to_read) || die ("could not open file1");
You should usually use the 3-argument open. In this case
$file_to_read is known to be safe, but the program could be
changed later to allow the user to input that. There's no good
reason here to not use the 3-argument open so you might as well.
:)
You should use lexical file handles with open.
It would be useful to output $! in your die so that the user can
get a hint as to /why/ file1 could not be opened (you might
choose a more descriptive name than file1 also).
open my $in_fh, '<', $file_to_read or
die "Couldn't open '$file_to_read': $!";
> open( file2, '>>',$file_to_write);
You should test /every/ call of open for success (e.g., with `or
die "open: $!"'). Again, a lexical file handle is preferred.
> while($line= <file1>) {
Again, with 'strict' you would need to declare $line with 'my':
while(my $line = <$in_fh>) {
> chomp $line;
>
> (
> $schoolname,
> $studentkey,
> $sfirstname,
> $slastname,
> $subject_code,
> $testkey,
> $test_grade,
> $test_score,
> $test_level
> )=split /\|/, $line;
You appear to have lost some indentation here (or just aren't
indenting this code). Indention is important for the readability
of your code.
> if length($studentkey gt 0) {
I'm not sure exactly what you meant that by. What you're doing
though is checking whether or not $studentkey is alphanumerically
after the string '0', then passing that boolean result into
length... Assuming you actually wanted to test whether or not
$studentkey is a non-zero length string, you should do
`length($studentkey) > 0'.
I'm not certain about this, but I thought that a compound if
statement required the expression to be surrounded by
parenthesis:
if(length($studentkey) > 0) {
perldoc perlsyn appears to agree:
> if (EXPR) BLOCK
AFAICT, $studentkey should be the first name of the student.
> while ($line2 = <file>) {
Again, $line2 should be declared with 'my'.
> chomp $line2;
> (
> $studentkey_file2,
> $testkey_file2,
> $rest_file2
> ) = split/\|/, $line2;
These names are misleading and confusing. :) You should probably
be storing these in a data structure anyway, perhaps as an array
of hash references.
> if ($studentkey_file2 gt 0 && $studentkey eq
> $studentkey_file2) {
Now you're apparently comparing a file name with 0 in an
alphanumerical sense. :-/ Doesn't really make sense.
> -Studentkey is the information I want to match on but can not figure
> out what direction to go.
Since I'm not familiar with any of the CSV modules I won't bother
trying to offer an example that uses them. You should look into
it though, especially if you encounter data of this nature often.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
main() unless caller;
sub copy_hash_elements
{
my ($src, $dest, @elements) = @_;
$dest->{$_} = $src->{$_} for @elements;
return $dest;
}
sub main
{
my $header_line = <>;
my @column_headers = map { $_ = trim($_); $_; }
split /\|/, $header_line;
my %data;
while(my $line = <>)
{
chomp $line;
my @column_values = split /\|/, $line;
my %record;
for my $i (0 .. $#column_headers)
{
$record{$column_headers[$i]} =
trim($column_values[$i]);
}
my $name = "$record{fname} $record{lname}";
unless(defined $data{$name})
{
$data{$name} = copy_hash_elements(
\%record,
{},
qw(School fname lname sub));
}
$data{$name}->{data} ||= [];
push @{$data{$name}->{data}}, copy_hash_elements(
\%record,
{},
qw(testnum grade score level));
}
print STDERR Data::Dumper->Dump([\%data], ['data']);
}
sub trim
{
my $string = shift || '';
$string =~ s/\A\s+//g;
$string =~ s/\s+\z//g;
return $string;
}
__DATA__
School |fname| lname | sub| testnum|score| grade|level
MLK School | John | Smith | RE | Test 1| 95| A | Prof
MLK School | John | Smith | RE | Test 2| 97| A | Prof
MLK School | John | Smith | RE | Test 3| 93| A | Prof
MLK School | John | Smith | RE | Test 4| 89| B | NP
Example run:
C:\Users\bamccaig>perl -e "do 'test.pl'; print <DATA>;" | perl test.pl
$data = {
'John Smith' => {
'School' => 'MLK School',
'sub' => 'RE',
'lname' => 'Smith',
'fname' => 'John',
'data' => [
{
'level' => 'Prof',
'grade' => 'A',
'score' => '95',
'testnum' => 'Test 1'
},
{
'level' => 'Prof',
'grade' => 'A',
'score' => '97',
'testnum' => 'Test 2'
},
{
'level' => 'Prof',
'grade' => 'A',
'score' => '93',
'testnum' => 'Test 3'
},
{
'level' => 'NP',
'grade' => 'B',
'score' => '89',
'testnum' => 'Test 4'
}
]
}
};
That should get you on your way. You just need to loop over each
student (with the built-in keys sub), print the basic data, and
then loop over the array referenced by the 'data' hash element to
get the extra data to append.
Disclaimer: In parsing this CSV myself I have made certain
assumptions about the data. I don't work with CSV or know of any
'standards' or whatever for it so I don't know the rules for
properly parsing it without losing information. For example, I'm
trimming excess whitespace off because I think the result is
prettier, but perhaps that whitespace is important? Keep that in
mind. Using a CSV module would probably be easier and more
correct (once you learned how to use it).
--
Brandon McCaig <http://www.bamccaig.com/> <[email protected]>
V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl.
Castopulence Software <http://www.castopulence.org/> <[email protected]>
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/