Re: Files and Arrays - Search for values and write to the right

Brandon McCaig Fri, 16 Sep 2011 08:33:14 -0700

On Thu, Sep 15, 2011 at 3:48 PM, Rob <rdavis7...@gmail.com> wrote:
> I have a file of test results it is formatted as follows:
>
>        School |fname| lname | sub| testnum|score| grade|level
> MLK School | John | Smith | RE | Test 1| 95| A | Prof
> MLK School | John | Smith | RE | Test 2| 97| A | Prof
> MLK School | John | Smith | RE | Test 3| 93| A | Prof
> MLK School | John | Smith | RE | Test 4| 89| B | NP
>
> What I would like to come out with is as follows:
>
>                SCHOOL                          |fname| lname | sub|
> testnum|score| grade|level
> MLK School |John|Smith|RE|Test 1| 95| A | Prof| Test 2| 97|A|Prof|
> Test 3|93|A|Prof|Test4|89|B|NP


There are a number of CSV modules available on CPAN. I've only
ever used Text::CSV::Slurp and the file was the standard (?)
quoted-columns-comma-separated format. Regardless, if necessary,
you should be able to configure your module of choice to parse
whichever delimiters that you need.

To critique your code:

You should begin all programs with:

  use strict;
  use warnings;

> $file_to_read ="E:/My Documents/KNOWS/Second Run at Knows/
> KNOWS_All_Student_Benchmark_Results_Improved_2011091402.csv";
> $file_to_write ="E:/My Documents/KNOWS/Second Run at Knows/
> KNOWS_All_Student_Benchmark_Results_One_File_2011091402.csv";

With the above pragmas you would need to declare these variables
with 'my' (which you should be doing anyway).

> open( file1, $file_to_read) || die ("could not open file1");

You should usually use the 3-argument open. In this case
$file_to_read is known to be safe, but the program could be
changed later to allow the user to input that. There's no good
reason here to not use the 3-argument open so you might as well.
:)

You should use lexical file handles with open.

It would be useful to output $! in your die so that the user can
get a hint as to /why/ file1 could not be opened (you might
choose a more descriptive name than file1 also).

  open my $in_fh, '<', $file_to_read or
          die "Couldn't open '$file_to_read': $!";

> open( file2, '>>',$file_to_write);

You should test /every/ call of open for success (e.g., with `or
die "open: $!"'). Again, a lexical file handle is preferred.

> while($line= <file1>) {

Again, with 'strict' you would need to declare $line with 'my':

  while(my $line = <$in_fh>) {

> chomp $line;
>
> (
> $schoolname,
> $studentkey,
> $sfirstname,
> $slastname,
> $subject_code,
> $testkey,
> $test_grade,
> $test_score,
> $test_level
> )=split /\|/, $line;

You appear to have lost some indentation here (or just aren't
indenting this code). Indention is important for the readability
of your code.

> if length($studentkey gt 0) {

I'm not sure exactly what you meant that by. What you're doing
though is checking whether or not $studentkey is alphanumerically
after the string '0', then passing that boolean result into
length... Assuming you actually wanted to test whether or not
$studentkey is a non-zero length string, you should do
`length($studentkey) > 0'.

I'm not certain about this, but I thought that a compound if
statement required the expression to be surrounded by
parenthesis:

if(length($studentkey) > 0) {

perldoc perlsyn appears to agree:
> if (EXPR) BLOCK

AFAICT, $studentkey should be the first name of the student.

>    while ($line2 = <file>) {

Again, $line2 should be declared with 'my'.

>        chomp $line2;
>        (
>            $studentkey_file2,
>            $testkey_file2,
>            $rest_file2
>        ) = split/\|/, $line2;

These names are misleading and confusing. :) You should probably
be storing these in a data structure anyway, perhaps as an array
of hash references.

>        if ($studentkey_file2 gt 0 && $studentkey eq
> $studentkey_file2) {

Now you're apparently comparing a file name with 0 in an
alphanumerical sense. :-/ Doesn't really make sense.

> -Studentkey is the information I want to match on but can not figure
> out what direction to go.

Since I'm not familiar with any of the CSV modules I won't bother
trying to offer an example that uses them. You should look into
it though, especially if you encounter data of this nature often.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

main() unless caller;

sub copy_hash_elements
{
    my ($src, $dest, @elements) = @_;

    $dest->{$_} = $src->{$_} for @elements;

    return $dest;
}

sub main
{
    my $header_line = <>;

    my @column_headers = map { $_ = trim($_); $_; }
            split /\|/, $header_line;

    my %data;

    while(my $line = <>)
    {
        chomp $line;

        my @column_values = split /\|/, $line;

        my %record;

        for my $i (0 .. $#column_headers)
        {
            $record{$column_headers[$i]} =
                    trim($column_values[$i]);
        }

        my $name = "$record{fname} $record{lname}";


        unless(defined $data{$name})
        {
            $data{$name} = copy_hash_elements(
                    \%record,
                    {},
                    qw(School fname lname sub));
        }

        $data{$name}->{data} ||= [];

        push @{$data{$name}->{data}}, copy_hash_elements(
                \%record,
                {},
                qw(testnum grade score level));
    }

    print STDERR Data::Dumper->Dump([\%data], ['data']);
}

sub trim
{
    my $string = shift || '';

    $string =~ s/\A\s+//g;
    $string =~ s/\s+\z//g;

    return $string;
}

__DATA__
       School |fname| lname | sub| testnum|score| grade|level
MLK School | John | Smith | RE | Test 1| 95| A | Prof
MLK School | John | Smith | RE | Test 2| 97| A | Prof
MLK School | John | Smith | RE | Test 3| 93| A | Prof
MLK School | John | Smith | RE | Test 4| 89| B | NP


Example run:

C:\Users\bamccaig>perl -e "do 'test.pl'; print <DATA>;" | perl test.pl
$data = {
          'John Smith' => {
                            'School' => 'MLK School',
                            'sub' => 'RE',
                            'lname' => 'Smith',
                            'fname' => 'John',
                            'data' => [
                                        {
                                          'level' => 'Prof',
                                          'grade' => 'A',
                                          'score' => '95',
                                          'testnum' => 'Test 1'
                                        },
                                        {
                                          'level' => 'Prof',
                                          'grade' => 'A',
                                          'score' => '97',
                                          'testnum' => 'Test 2'
                                        },
                                        {
                                          'level' => 'Prof',
                                          'grade' => 'A',
                                          'score' => '93',
                                          'testnum' => 'Test 3'
                                        },
                                        {
                                          'level' => 'NP',
                                          'grade' => 'B',
                                          'score' => '89',
                                          'testnum' => 'Test 4'
                                        }
                                      ]
                          }
        };

That should get you on your way. You just need to loop over each
student (with the built-in keys sub), print the basic data, and
then loop over the array referenced by the 'data' hash element to
get the extra data to append.

Disclaimer: In parsing this CSV myself I have made certain
assumptions about the data. I don't work with CSV or know of any
'standards' or whatever for it so I don't know the rules for
properly parsing it without losing information. For example, I'm
trimming excess whitespace off because I think the result is
prettier, but perhaps that whitespace is important? Keep that in
mind. Using a CSV module would probably be easier and more
correct (once you learned how to use it).


-- 
Brandon McCaig <http://www.bamccaig.com/> <bamcc...@gmail.com>
V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl.
Castopulence Software <http://www.castopulence.org/> <bamcc...@castopulence.org>

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

Reply via email to