Jonathan Paton [JP], on Tuesday, November 30, 2004 at 16:04 (+0000) thinks about:
JP> As always, programmer time is more valuable than computer time. I have to agree, I now this. My office computer never stops count something (on idle time it counts mersenne prime number:) JP> Correctness is also handy. Even if it takes you longer to learn a JP> module than hack away, next time you encounter a similar problem you JP> should be quicker. Yes, that's right. I get this idea seriously, I can spend on something <= time, when writing script, as do it manually :) JP> HTML::Tree is slow, it builds up a collection of objects representing JP> the document. However, it does it the "correct" way... I would JP> sacrifice speed for a better program. Perl probably is far from the JP> fastest language for this task, so if you REALLY need speed then look JP> elsewhere. We will see, after script will be done, if it is slow. Do you ahve some benchmarks, how much time it takes to parse one html document ? JP> The HTML you have to parse is poor quality. The first <tr> isn't JP> closed properly. Hopefully there is nothing that will stumble JP> HTML::Tree. first <tr> is closed at the before last line, I tried w3c validator on that page, there are some minor errors on it, but this is ok (open and close tags) JP> From my experiments with HTML::Tree, you need to really understand the JP> way the HTML is structured. Best to work top down, extracting the top JP> level of tables, test if each table contains the data you want, then JP> extract all the data in to the format you require. yes, first I tried HTML::Tokeparser, it works, but I stop at one point. JP> As it would take hours+ to write the code for your problem, I will JP> help with an example of mine. It is a work in progress for extracting JP> data from the Maintain Account page of Americas Army web site. nice code, quite easy to understand, it will definitely helps me. Also, when you send me (private) html test page, which it parse, it would be nice, I will go fast ahead. I saw only some really simple scripts, how to extract h1 and so on, this seems nice. So please dont forget to send me that page. Thanks again for nice reply and good night :) -- ...m8s, cu l8r, Brano. ["I will slash boondoggle projects" - Bill Clinton.] -=x=- Skontrolované antivírovým programom NOD32 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>