Hi Tatiana,
2007/1/22, Tatiana Lloret Iglesias <[EMAIL PROTECTED]>:
Regarding the performance problem:
The schema of my application is:
1. I execute perl script which performs a search in a public database. It
gets total results in *several pages*. Pressing "Next Page" button (with
perl script) i get a list of all the links related to my query (70.000 more
or less) I write down all these links in a unique text file.
2. From the Java i read each of the 70.000 links and i create a new file
containing the current i'm reading. Then i call a perl script which uses
this link as input parameter. It browses it and get website content saving
it in a local html file.
I'm having performance problems ,,,, i've tried to don't create a single
file containing url for each of the 70.000 links and pass it automatically
to perl script as input parameter but it fails...
I've heard about LWP module? do you recomend me to use it??
Have you ever done something similar to this? can you give me some advice?
Thanks
T.
I can't see the point you are using Java for that. If your code with
WWW::Mechanize is already working, why don't you do everything in
Perl?
I would do this:
1. Read all links you want to open, it is ok to store it on a single
file IMHO. You can use Tie::File to append lines, it can make your
life easier.
open my $links_file, ">", $filename or die $!;
while (my $link = my_mechanize_get_link()) {
print {$links_file} $link, "\n";
}
close $links_file or warn $!;
2. After that, you can read from that tied array, from beginning to
end and use LWP::UserAgent or LWP::Simple to retrieve the data you
want to store:
use LWP::Simple;
sub filename_from_url {
# your code here. logic to compose the filename
# from url.
}
open my $input, "<", $filename or die $!;
while (my $url = <$input>) {
chomp($url);
my $content = get($url);
if ($content) {
open my $output, ">", filename_from_url($url) or die $!;
print {$output} $content;
close $output or warn $!;
}
}
HTH!
--
Igor Sutton Lopes <[EMAIL PROTECTED]>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/