On Mon, Apr 15, 2002 at 10:42:45AM +0200, Martin A. Hansen wrote:
> 
> im trying to write a script that does a search on a remote website. the script needs 
>to fill in a form field with a search word and save all the results. there is three 
>different form fields on the webpage, but im only interested in the first one. the 
>results comes in 10 per page only, and i would like the script to follow the links so 
>i get all results.
> 
> can this be done with perl? (i think so).

Yepp, as can with about any other language...

> can anyone help me out here?

  Sure, but since this is the beginners list and not the 'free code'
list, I'll only give you some pointers for a start.

You'll a more or less uptodate perl.

And at least the two Modules 
    LWP::UserAgent              # This one dl's the pages...
    HTML::Parser                # ...and this one parses them

and all their dependencies.

  Actually, if you know what the form look like - and you most likely
will know, since you have to fill it - there's no need to read and
parse the form page, except if there are some hidden values that are
generated dynamically.

  Basically, you create an Request Object (read 'perldoc lwpcook' to
learn how) where you stuff all your form values in, then you let LWP
download the answer page.

  Depending on how much details you need to know about the structure of
your document, you could then try to match the data you need with
regular expressions (perldoc perlrequick, perldoc perlretut, perldoc
perlre), provide your own parser which you can subclass from
HTML::Parser (perldoc HTML::Parser) or use an already existing special
parser that fits your needs - like e.g. HTML::LinkExtor which harvests
all links from out of a HTML page.

  Once you have your list of links, the remaining part is mostly
repetitive stuff: create a request object, download, optionally parse,
.... and again... and again...


  Ok, if you're still with me that far down the mail, feel free to ask
about details...

-- 
                       If we fail, we will lose the war.

Michael Lamertz                        |      +49 221 445420 / +49 171 6900 310
Nordstr. 49                            |                       [EMAIL PROTECTED]
50733 Cologne                          |                 http://www.lamertz.net
Germany                                |               http://www.perl-ronin.de 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to