"Nagrale, Ajay" schreef: > I am working on optimization of one of the file parsing script. There > are around 4,50,000 lines present in the input file. This file size > is bound to increase in coming days. New entries would be added every > 2 minutes. > > Current script is taking around 60 seconds (average) and 150 seconds > (max) time for parsing the input file and writing into the output > file. Since this script is executed every two minutes (have to :( and > very important script), times in seconds itself is costing me. > > The flow in the script is something like this: > > 1. Open the input file handle using the open function > 2. In while loop parse the entries (using the file handle directly in > while), parse the input entries. Do the sanity check required (sanity > check involved is a combination of specific line format and a few > regular expression check). If sanity check is successful, load the > required entries into the hash. Close the input file handle. > 3. Open the output file handle > 4. Sorting of the hash in the required order. > 5. Print hash into the file > 6. Close output file handle. > > Any help/suggestion would be appreciated.
Does the file have (mostly) totally new data, evey 2 minutes? If so, go to (2). (1) You re-process old lines over and over again. For what? Cache the old lines in a database, only insert (or append) the new data, create the output. (2) I would use a yacc/lex solution, because that normally does it in a few seconds. -- Affijn, Ruud "Gewoon is een tijger." -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>