I do it from the JAva and not from the PErl because i need to perform an insert into the database each time i process a link and also i have to inform via rss about the progress of the global download process (23.343 out of 70.000 files have been downloaded) ....
On 1/22/07, Igor Sutton <[EMAIL PROTECTED]> wrote:
Hi Tatiana, 2007/1/22, Tatiana Lloret Iglesias <[EMAIL PROTECTED]>: > Regarding the performance problem: > > The schema of my application is: > > 1. I execute perl script which performs a search in a public database. It > gets total results in *several pages*. Pressing "Next Page" button (with > perl script) i get a list of all the links related to my query (70.000more > or less) I write down all these links in a unique text file. > > 2. From the Java i read each of the 70.000 links and i create a new file > containing the current i'm reading. Then i call a perl script which uses > this link as input parameter. It browses it and get website content saving > it in a local html file. > > I'm having performance problems ,,,, i've tried to don't create a single > file containing url for each of the 70.000 links and pass it automatically > to perl script as input parameter but it fails... > > I've heard about LWP module? do you recomend me to use it?? > Have you ever done something similar to this? can you give me some advice? > Thanks > > T. > I can't see the point you are using Java for that. If your code with WWW::Mechanize is already working, why don't you do everything in Perl? I would do this: 1. Read all links you want to open, it is ok to store it on a single file IMHO. You can use Tie::File to append lines, it can make your life easier. open my $links_file, ">", $filename or die $!; while (my $link = my_mechanize_get_link()) { print {$links_file} $link, "\n"; } close $links_file or warn $!; 2. After that, you can read from that tied array, from beginning to end and use LWP::UserAgent or LWP::Simple to retrieve the data you want to store: use LWP::Simple; sub filename_from_url { # your code here. logic to compose the filename # from url. } open my $input, "<", $filename or die $!; while (my $url = <$input>) { chomp($url); my $content = get($url); if ($content) { open my $output, ">", filename_from_url($url) or die $!; print {$output} $content; close $output or warn $!; } } HTH! -- Igor Sutton Lopes <[EMAIL PROTECTED]>