Jonathan Paton [JP], on Tuesday, November 30, 2004 at 16:04 (+0000)
thinks about:

JP> As always, programmer time is more valuable than computer time.

I have to agree, I now this. My office computer never stops count
something (on idle time it counts mersenne prime number:)

JP> Correctness is also handy.  Even if it takes you longer to learn a
JP> module than hack away, next time you encounter a similar problem you
JP> should be quicker.

Yes, that's right. I get this idea seriously, I can spend on something
<= time, when writing script, as do it manually :)

JP> HTML::Tree is slow, it builds up a collection of objects representing
JP> the document.  However, it does it the "correct" way... I would
JP> sacrifice speed for a better program.  Perl probably is far from the
JP> fastest language for this task, so if you REALLY need speed then look
JP> elsewhere.

We will see, after script will be done, if it is slow. Do you ahve
some benchmarks, how much time it takes to parse one html document ?

JP> The HTML you have to parse is poor quality.  The first <tr> isn't
JP> closed properly.  Hopefully there is nothing that will stumble
JP> HTML::Tree.

first <tr> is closed at the before last line, I tried w3c validator on
that page, there are some minor errors on it, but this is ok (open and
close tags)

JP> From my experiments with HTML::Tree, you need to really understand the
JP> way the HTML is structured.  Best to work top down, extracting the top
JP> level of tables, test if each table contains the data you want, then
JP> extract all the data in to the format you require.

yes, first I tried HTML::Tokeparser, it works, but I stop at one
point.

JP> As it would take hours+ to write the code for your problem, I will
JP> help with an example of mine.  It is a work in progress for extracting
JP> data from the Maintain Account page of Americas Army web site.

nice code, quite easy to understand, it will definitely helps me.
Also, when you send me (private) html test page, which it parse, it
would be nice, I will go fast ahead. I saw only some really simple
scripts, how to extract h1 and so on, this seems nice. So please dont
forget to send me that page. Thanks again for nice reply and good
night :)

-- 

 ...m8s, cu l8r, Brano.

["I will slash boondoggle projects" - Bill Clinton.]



-=x=-
Skontrolované antivírovým programom NOD32


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to