Re: Parsing problem

R. Joseph Newton Thu, 11 Dec 2003 20:56:31 -0800

Larry Sandwick wrote:

>
> I know I can split the file on "|" but because the data is not
> consistent and my skill set is limiting me to re-parse this file into a
> file I can upload into MySql


Just Don't Do It.

This data is not ready for entry into a database.  MySQL is an RDBMS engine,
and RDBMS is designed for normalized data.

The presence of so many nulls, and your professed need to have colums filled
in with the same data, indicates that the data in these columns is probably
not informative.  Sure, you can use MySQL or any other database temporarily
to store even badly organized material, but it should not remain that way.
This is really outside of Perl, but there are some serious issues of data
structure involved here.

I'm not going to try to plumb the details, but it's pretty clear that this
file has the data rows grouped, by some object with two fields.  That object
should have a table of its own, and only the primary key of that object
[preferably *not* a meaningful data field] should appear in the rows of the
data table.  As long as you are not intending to change the structure of
these files, you may as well leave them in a flat file.


> I am asking for help and suggestions I do I
> detect that column 1 has change ? Some how I need to read in the first
> three lines for every backorder, before I can start outputting the data?
> .
>
> This is the file before ( see below what is should look like after the
> parser.  (Before)
>
>
>
> 24165| DEF    |       |                          |     |     |
>
>      |O18580  |259    |LEATHER BOOK SIDE TABLE   |    1|    1|   295.00
>
>      |05/30/03|1774   |FUNCTIONAL TABLE LAMP     |    1|    0|    35.00
>
>      |        |1773   |FUNCTIONAL FLOOR LAMP     |    1|    0|    62.50
>
>      |        |1302   |MOROCCAN FLORAL BX,BRASS  |    1|    0|    29.00
>
>      |        |1666   |CUBA COFFEE TABLE         |    1|    1|   290.00
>
>      |        |1666   |CUBA SIDE TABLE           |    1|    1|   147.50
>
> 24310| ABC    |       |                          |     |     |
>
>      |O18813  |1145   |FLEUR-DE-LIS DOCUMENT BOX |    1|    0|    52.50
>
>      |07/29/03|1549   |TAOS CENTERPIECE          |    1|    1|    65.00
>
>      |        |1729L  |FRENCH BOX BOOKEND, LEFT  |    1|    1|    69.00

I'd suggest instead something like

Table account:
ID       letters
24310    DEF
24310    ABC
5522     XYZ

Table invoice:
ID       account_id    date
O18580   24165         05/30/03

Table line_item
ID      invoice_id     description                          etc1   etc2
...yada, yada
259     O18580         Leather Bokk Side Table                  1|    1|
295.00
...


Please design something more stable to receive the data you extract.  You
should only have to deal with a file like this once, then get it into
something more sensible.  Once you have designed a more appropriate data
structure, I'd suggest that you pull this data out by:

Checking column 0 [the first] of each line for an account number.  If one is
found, then you can parse that line for characteristics of the account
object.
Checking column 1 for all other colums to see if there is a new invoice
number. You should start a new invoice object each time one is encountered.
The date attached to this invoice will have to wait till you read the next
line.
Take column 2 through the end of each of the data rows as data for the line
item itself.

Before you can get any real oomph out of your coding, you will have to have
a design that doesn't waste your efforts.

Joseph


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Parsing problem

Reply via email to