"Nagrale, Ajay" schreef:

> I am working on optimization of one of the file parsing script. There
> are around 4,50,000 lines present in the input file. This file size
> is bound to increase in coming days. New entries would be added every
> 2 minutes.
>
> Current script is taking around 60 seconds (average) and 150 seconds
> (max) time for parsing the input file and writing into the output
> file. Since this script is executed every two minutes (have to :( and
> very important script), times in seconds itself is costing me.
>
> The flow in the script is something like this:
>
> 1. Open the input file handle using the open function
> 2. In while loop parse the entries (using the file handle directly in
> while), parse the input entries. Do the sanity check required (sanity
> check involved is a combination of specific line format and a few
> regular expression check). If sanity check is successful, load the
> required entries into the hash. Close the input file handle.
> 3. Open the output file handle
> 4. Sorting of the hash in the required order.
> 5. Print hash into the file
> 6. Close output file handle.
>
> Any help/suggestion would be appreciated.


Does the file have (mostly) totally new data, evey 2 minutes? If so, go
to (2).

(1) You re-process old lines over and over again. For what?
Cache the old lines in a database, only insert (or append) the new data,
create the output.

(2) I would use a yacc/lex solution, because that normally does it in a
few seconds.

-- 
Affijn, Ruud

"Gewoon is een tijger."



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to