Andrew Gaffney wrote: > I'm writing a web-based client information system for a lawyer.
[OT} Eek!! Be very, very careful about security, please. I trust your employer has taken "duty of care" isuues into account. [/OT] > His current client list is > in a 137 page Word document with an embedded table. I can get it into a somewhat > usable > format by copying the entire table, pasting into Excel, and saving as tab delimeted > text, > but this has its problems. > > Some of the cells in the table have newlines in them. Because of this, when it's > exported > from Excel, the 2nd line will appear in the correct field, but on a line by itself: > > Row 1, Firstname, Lastname, Address, City, State, Zip, Phone, > AnotherPhone Don't export as space,-delimited. Use CSV, which is much more portable. > > Row 2, First middle, Last, addy, City, State, Zip, > 555-5555 > > So it looks like 3 records instead of 2. Does anyone have any ideas on how to pick > apart > the data to get it into the DB? > > -- > Andrew Gaffney Hi Andrew, The problem here is more one of datqa design, specifically data normalization, than formatting. Neither documents nor spreadsheets are really good storage methods for large scale data. For long-term robustness, this data should probably be in a normalized database. Multi-valued fields generally indicate data that should be broken out into a table of its own: Assuming some unique ID for each client [*not* the name; names do not guarantee uniquesness], such a table could be very simple. client_id auto_number unique primary key client_phone character[20] primary_number boolean I'm a little rusty on my SQL today, so the types I list above are logical types. You will have to check your SQL reference to find the appropriate available types. These can easily be linked with simple SQL for presentation purposes. This structure also better supports working with the data of the fields broken out this way. I know this isn't a very Perl-ish response, but if you are going to rake on the task of working with this data anyway, you might as well do it right, and build something that will hold up over time. Then you can use querying tools or speadheets to prepare any needed presentation format. Now as far as the particular mehtod to use for picking the data apart, that depends. Is the phone number the only multivalued field? If so it should be fairly straightforward. It would help if we could actually see some of the data you have to work with. Might be a bit of work, since you will have to sustitute aliases--but realistic please, for any identifying information. Greetings! C:\Documents and Settings\rjnewton>perl -w my @rows; push @rows, $_ while <STDIN>; chomp @rows; my %phones; my $current_row_tag; my %data_rows; foreach $row (@rows) { last unless $row =~ /\S/; if ($row =~ /^\w/) { my ($row_tag, $given_name, $last_name, $streetaddress, $city, $state, $zip, $phone) = split /,\s*/, $row; $phones{$row_tag} = []; push @{$phones{$row_tag}}, $phone; my $customer_data = {}; $customer_data->{'given name'} = $given_name; $customer_data->{'last name'} = $last_name; $customer_data->{'street address'} = $streetaddress; $customer_data->{'City'} = $city; $customer_data->{'State'} = $state; $customer_data->{'Zip'} = $zip; $data_rows{$row_tag} = $customer_data; $current_row_tag = $row_tag; } else { $row =~ s/^\s*//; $row =~ s/\s*$//; push @{$phones{$current_row_tag}}, $row; } } foreach $client_id (sort keys %data_rows) { my $client = $data_rows{$client_id}; print "$client->{'given name'} $client->{'last name'}\n"; print " $_\n" foreach @{$phones{$client_id}}; } ^Z Row 1, Firstname, Lastname, Address, City, State, Zip, Phone, AnotherPhone Row 2, First middle, Last, addy, City, State, Zip, 555-5555 ^Z Firstname Lastname Phone AnotherPhone First middle Last 555-5555 HTH, Joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>