Asad wrote: > Hi All , > > I have the following code to search for an error and prin the > solution . > > /A/B/file1.log size may vary from 5MB -5 GB > > f4 = open (r" /A/B/file1.log ", 'r' ) > string2=f4.readlines()
Do not read the complete file into memory. Read one line at a time and keep only those lines around that you may have to look at again. > for i in range(len(string2)): > position=i > lastposition =position+1 > while True: > if re.search('Calling rdbms/admin',string2[lastposition]): > break > elif lastposition==len(string2)-1: > break > else: > lastposition += 1 You are trying to find a group of lines. The way you do it for a file of the structure foo bar baz end-of-group-1 ham spam end-of-group-2 you find the groups foo bar baz end-of-group-1 bar baz end-of-group-1 baz end-of-group-1 ham spam end-of-group-2 spam end-of-group-2 That looks like a lot of redundancy which you can probably avoid. But wait... > errorcheck=string2[position:lastposition] > for i in range ( len ( errorcheck ) ): > if re.search ( r'"error(.)*13?"', errorcheck[i] ): > print "Reason of error \n", errorcheck[i] > print "script \n" , string2[position] > print "block of code \n" > print errorcheck[i-3] > print errorcheck[i-2] > print errorcheck[i-1] > print errorcheck[i] > print "Solution :\n" > print "Verify the list of objects belonging to Database " > break > else: > continue > break you throw away almost all the hard work to look for the line containing those four lines? It looks like you only need the "error...13" lines, the three lines that precede it and the last "Calling..." line occuring before the "error...13". > The problem I am facing in performance issue it takes some minutes to > print out the solution . Please advice if there can be performance > enhancements to this script . If you want to learn the Python way you should try hard to write your scripts without a single for i in range(...): ... loop. This style is usually the last resort, it may work for small datasets, but as soon as you have to deal with large files performance dives. Even worse, these loops tend to make your code hard to debug. Below is a suggestion for an implementation of what your code seems to be doing that only remembers the four recent lines and works with a single loop. If that saves you some time use that time to clean the scripts you have lying around from occurences of "for i in range(....): ..." ;) from __future__ import print_function import re import sys from collections import deque def show(prompt, *values): print(prompt) for value in values: print(" {}".format(value.rstrip("\n"))) def process(filename): tail = deque(maxlen=4) # the last four lines script = None with open(filename) as instream: for line in instream: tail.append(line) if "Calling rdbms/admin" in line: script = line elif re.search('"error(.)*13?"', line) is not None: show("Reason of error:", tail[-1]) show("Script:", script) show("Block of code:", *tail) show( "Solution", "Verify the list of objects belonging to Database" ) break if __name__ == "__main__": filename = sys.argv[1] process(filename) _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor