On Mon, Apr 15, 2002 at 10:42:45AM +0200, Martin A. Hansen wrote: > > im trying to write a script that does a search on a remote website. the script needs >to fill in a form field with a search word and save all the results. there is three >different form fields on the webpage, but im only interested in the first one. the >results comes in 10 per page only, and i would like the script to follow the links so >i get all results. > > can this be done with perl? (i think so).
Yepp, as can with about any other language... > can anyone help me out here? Sure, but since this is the beginners list and not the 'free code' list, I'll only give you some pointers for a start. You'll a more or less uptodate perl. And at least the two Modules LWP::UserAgent # This one dl's the pages... HTML::Parser # ...and this one parses them and all their dependencies. Actually, if you know what the form look like - and you most likely will know, since you have to fill it - there's no need to read and parse the form page, except if there are some hidden values that are generated dynamically. Basically, you create an Request Object (read 'perldoc lwpcook' to learn how) where you stuff all your form values in, then you let LWP download the answer page. Depending on how much details you need to know about the structure of your document, you could then try to match the data you need with regular expressions (perldoc perlrequick, perldoc perlretut, perldoc perlre), provide your own parser which you can subclass from HTML::Parser (perldoc HTML::Parser) or use an already existing special parser that fits your needs - like e.g. HTML::LinkExtor which harvests all links from out of a HTML page. Once you have your list of links, the remaining part is mostly repetitive stuff: create a request object, download, optionally parse, .... and again... and again... Ok, if you're still with me that far down the mail, feel free to ask about details... -- If we fail, we will lose the war. Michael Lamertz | +49 221 445420 / +49 171 6900 310 Nordstr. 49 | [EMAIL PROTECTED] 50733 Cologne | http://www.lamertz.net Germany | http://www.perl-ronin.de -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]