Re: adding an array as a hash value

Rob Dixon Tue, 14 Oct 2003 07:58:03 -0700

Hi Dermot.

Dermot Paikkos wrote:
>
> I am trying to create a hash and one element is an array. The hash
> comes from a csv file.
>
> I use $/ = ",\"\"\n" because there is a new line in each record. The
> first 4 elements of each record are always the same and these are
> easily assigned to the  hash. After that there are 20 other fields
> (keywords) for the record, some blank (that I'd rather discard).
>
> I have 2 problems. One is looping through each record and managing to
> loop through $fields[4-20] while assigning them to an array. Two, is
> the reverse, accessing the array elements while printing/looping
> through the hash. I know a reference is needed here but my attempt at
> using the cookbook have failed me.
>
> Below is what I have done so far and some sample data. If anyone has
> the time any advice would be appreciated.
>
>
> #!/bin/perl
>
> use strict;


And

  use warnings;

> # Open a cvs file and put contents into images.
> # File should have fields in this following order:
> # 1, SPLNUM 2, Photographer, 3, Title, 4, Credit and captions,
> # 5, keywords.
>
> my ($l,$i,$k,$v,$w); # Predefined global variables;

These variables need to have better names so that it's more
obvious what they're for.

> my %headers;
> my (@words);
>
> $/ = ",\"\"\n"; # This defines the record separator as

This is a lot clearer written:

  $/ = qq(,""\n)

> # ,""\n. Note: the csv file will have a newline
> # between the credit line and the caption.
>
> my $infile = shift; # Input file
>
> open(FH,"$infile") or die "Can't open $infile: $!\n";
>
> # Read in the file, a record at a time.
>
> while (defined($i = <FH>) ) {
>
> #print "Record $. = $i\n";
>
> # parse_csv (stolen from a book) takes care of any
> # unusual characters that might appear in the keywords
> # such as commas, &, [ ...etc.
>
> my @lines = parse_csv($i);
>
> # print "\$lines[0] = $lines[0]\n";
>
> # Fields 0,1,2,3 are all pre-defined. 0 => number, 1 => photog,
> # 2 => title, 3 => credit and caption, 4 and on => keywords.
>
> $headers{'splnum'} = $lines[0];
>
> $headers{'photog'} = $lines[1];
>
> $headers{'title'} = $lines[2];
>
> ($headers{'credit'}) = ($lines[3] =~ /CREDIT: (.*\/.*RARY)/);
> (my $cap) = ($lines[3] =~ /.RARY.(.*)/s);
> (my $caption = $cap) =~ s/\n//g; # Throw away any extra newlines.
>
> ($headers{'caption'}) = ($caption);

It's simpler to assign slices of the hash. Also, using 'splice'
will remove the used elements from the @lines array.

  @headers{ qw/ splnum photog title /} = splice @lines, 0, 3;
  @headers{ qw/ credit caption / } =
      shift (@lines) =~ /CREDIT:\s+(.*\/.*?LIBRARY)[\s\xA0]+(.*\S)/;

Note the character class [\s\xA0] which is all whitespace characters
plus non-breakable space \xA0 which appears between the credit and
caption fields of our data.

You have mutiple spaces in the 'caption' text, compress them to single
spaces like this if you need to:

  $headers{caption} =~ s/\s+/ /g;

>
> # The rest of the fields in the records will be keywords and these
> may
> # vary in number but there shouldn't be more than 20.

All of the elements of @lines are now keywords, so there's no need
to start at offset 4.

> my $sizeof = 4;
> my ($d,$keyw_ref);
> my @keywords;
>
> for ($d = 4; $d <= 20; ++$d ) {
> print "\$lines = $lines[$d] \$sizeof = $sizeof\n";
> push(@keywords,$lines[$sizeof]);
>
> }
> $keyw_ref = [EMAIL PROTECTED];
>
> $headers{'keywords'} = $keyw_ref;

Strip leading and trailing spaces from each element and push the
result directly into the hash field if the result has a non-zero
length:

  foreach (@lines) {
    s/^\s+//;
    s/\s+$//;
    push @{$headers{keywords}}, $_ if length;
  }

> # Print the hash keys and values so I can see if everything has been
> # assigned correctly
>
> while ( ($k,$v) = each %headers ) {
> if ( $k =~ /keywords/ ) {
> my $item;
> foreach $item (@$keyw_ref) {
> print "$item\n";
> } i # End of foeach $item
> } # End of if
>
> print "$k => $v\n";
> } # End of while
>
> } # End of mail while.

There's not much wrong with this except that you're using $keyw_ref
which is left over from previous code instead of using the array
reference out of the hash. Also you'd have no need to comment each
block's closing brace if you indented the blocks. Try this:

  while ( my ($k, $v) = each %headers ) {

    print "$k => ";

    if ( $k eq 'keywords' ) {
      print "\n  $_" foreach @$v;
      print "\n";
    }
    else {
      print "$v\n";
    }
  }

> # ==== SUBS =====
>
> sub parse_csv {
>
> my $text = shift;
> my @new = ();
> push(@new, $+) while $text =~ m{
> "([^\"\\]*(?:\\.[^\"\\]*)*)",?
> | ([^,]+),?
> | ,
> }gx;
>
> #push(@new, undef) if substr($text, -1,1) eq ',';  # omitted this as
> I hoped to discard the
>    # empty fields.
>
> return @new;
> }

I think that should do what you need.

I hope it helps,

Rob



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: adding an array as a hash value

Reply via email to