On Thu, Sep 15, 2011 at 3:48 PM, Rob <rdavis7...@gmail.com> wrote: > I have a file of test results it is formatted as follows: > > School |fname| lname | sub| testnum|score| grade|level > MLK School | John | Smith | RE | Test 1| 95| A | Prof > MLK School | John | Smith | RE | Test 2| 97| A | Prof > MLK School | John | Smith | RE | Test 3| 93| A | Prof > MLK School | John | Smith | RE | Test 4| 89| B | NP > > What I would like to come out with is as follows: > > SCHOOL |fname| lname | sub| > testnum|score| grade|level > MLK School |John|Smith|RE|Test 1| 95| A | Prof| Test 2| 97|A|Prof| > Test 3|93|A|Prof|Test4|89|B|NP
There are a number of CSV modules available on CPAN. I've only ever used Text::CSV::Slurp and the file was the standard (?) quoted-columns-comma-separated format. Regardless, if necessary, you should be able to configure your module of choice to parse whichever delimiters that you need. To critique your code: You should begin all programs with: use strict; use warnings; > $file_to_read ="E:/My Documents/KNOWS/Second Run at Knows/ > KNOWS_All_Student_Benchmark_Results_Improved_2011091402.csv"; > $file_to_write ="E:/My Documents/KNOWS/Second Run at Knows/ > KNOWS_All_Student_Benchmark_Results_One_File_2011091402.csv"; With the above pragmas you would need to declare these variables with 'my' (which you should be doing anyway). > open( file1, $file_to_read) || die ("could not open file1"); You should usually use the 3-argument open. In this case $file_to_read is known to be safe, but the program could be changed later to allow the user to input that. There's no good reason here to not use the 3-argument open so you might as well. :) You should use lexical file handles with open. It would be useful to output $! in your die so that the user can get a hint as to /why/ file1 could not be opened (you might choose a more descriptive name than file1 also). open my $in_fh, '<', $file_to_read or die "Couldn't open '$file_to_read': $!"; > open( file2, '>>',$file_to_write); You should test /every/ call of open for success (e.g., with `or die "open: $!"'). Again, a lexical file handle is preferred. > while($line= <file1>) { Again, with 'strict' you would need to declare $line with 'my': while(my $line = <$in_fh>) { > chomp $line; > > ( > $schoolname, > $studentkey, > $sfirstname, > $slastname, > $subject_code, > $testkey, > $test_grade, > $test_score, > $test_level > )=split /\|/, $line; You appear to have lost some indentation here (or just aren't indenting this code). Indention is important for the readability of your code. > if length($studentkey gt 0) { I'm not sure exactly what you meant that by. What you're doing though is checking whether or not $studentkey is alphanumerically after the string '0', then passing that boolean result into length... Assuming you actually wanted to test whether or not $studentkey is a non-zero length string, you should do `length($studentkey) > 0'. I'm not certain about this, but I thought that a compound if statement required the expression to be surrounded by parenthesis: if(length($studentkey) > 0) { perldoc perlsyn appears to agree: > if (EXPR) BLOCK AFAICT, $studentkey should be the first name of the student. > while ($line2 = <file>) { Again, $line2 should be declared with 'my'. > chomp $line2; > ( > $studentkey_file2, > $testkey_file2, > $rest_file2 > ) = split/\|/, $line2; These names are misleading and confusing. :) You should probably be storing these in a data structure anyway, perhaps as an array of hash references. > if ($studentkey_file2 gt 0 && $studentkey eq > $studentkey_file2) { Now you're apparently comparing a file name with 0 in an alphanumerical sense. :-/ Doesn't really make sense. > -Studentkey is the information I want to match on but can not figure > out what direction to go. Since I'm not familiar with any of the CSV modules I won't bother trying to offer an example that uses them. You should look into it though, especially if you encounter data of this nature often. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; main() unless caller; sub copy_hash_elements { my ($src, $dest, @elements) = @_; $dest->{$_} = $src->{$_} for @elements; return $dest; } sub main { my $header_line = <>; my @column_headers = map { $_ = trim($_); $_; } split /\|/, $header_line; my %data; while(my $line = <>) { chomp $line; my @column_values = split /\|/, $line; my %record; for my $i (0 .. $#column_headers) { $record{$column_headers[$i]} = trim($column_values[$i]); } my $name = "$record{fname} $record{lname}"; unless(defined $data{$name}) { $data{$name} = copy_hash_elements( \%record, {}, qw(School fname lname sub)); } $data{$name}->{data} ||= []; push @{$data{$name}->{data}}, copy_hash_elements( \%record, {}, qw(testnum grade score level)); } print STDERR Data::Dumper->Dump([\%data], ['data']); } sub trim { my $string = shift || ''; $string =~ s/\A\s+//g; $string =~ s/\s+\z//g; return $string; } __DATA__ School |fname| lname | sub| testnum|score| grade|level MLK School | John | Smith | RE | Test 1| 95| A | Prof MLK School | John | Smith | RE | Test 2| 97| A | Prof MLK School | John | Smith | RE | Test 3| 93| A | Prof MLK School | John | Smith | RE | Test 4| 89| B | NP Example run: C:\Users\bamccaig>perl -e "do 'test.pl'; print <DATA>;" | perl test.pl $data = { 'John Smith' => { 'School' => 'MLK School', 'sub' => 'RE', 'lname' => 'Smith', 'fname' => 'John', 'data' => [ { 'level' => 'Prof', 'grade' => 'A', 'score' => '95', 'testnum' => 'Test 1' }, { 'level' => 'Prof', 'grade' => 'A', 'score' => '97', 'testnum' => 'Test 2' }, { 'level' => 'Prof', 'grade' => 'A', 'score' => '93', 'testnum' => 'Test 3' }, { 'level' => 'NP', 'grade' => 'B', 'score' => '89', 'testnum' => 'Test 4' } ] } }; That should get you on your way. You just need to loop over each student (with the built-in keys sub), print the basic data, and then loop over the array referenced by the 'data' hash element to get the extra data to append. Disclaimer: In parsing this CSV myself I have made certain assumptions about the data. I don't work with CSV or know of any 'standards' or whatever for it so I don't know the rules for properly parsing it without losing information. For example, I'm trimming excess whitespace off because I think the result is prettier, but perhaps that whitespace is important? Keep that in mind. Using a CSV module would probably be easier and more correct (once you learned how to use it). -- Brandon McCaig <http://www.bamccaig.com/> <bamcc...@gmail.com> V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl. Castopulence Software <http://www.castopulence.org/> <bamcc...@castopulence.org> -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/