Am 09.05.2012 10:36, schrieb lilin Yi: > //final_1 is a list of Identifier which I need to find corresponding > files(four lines) in x(x is the file) and write following four lines > in a new file. > > //because the order of the identifier is the same, so after I find the > same identifier in x , the next time I want to start from next index > in x,which will save time. That is to say , when the if command > satisfied ,it can automatically jump out out the second while loop and > come to the next identifier of final_1 ,meanwhile the j should start > not from the beginning but the position previous. > > //when I run the code it takes too much time more than one hour and > give the wrong result....so could you help me make some improvement of > the code?
If the code takes too much time and gives the wrong results, you must fix and improve it. In order to do that, the first thing you should do is get familiar with "test-driven development" and Python's unittest library. You can start by fixing the code, but chances are that you will break it again trying to make it fast then. Having tests that tell you after each step that the code still works correctly is invaluable. Some more comments below... > i=0 > > offset_1=0 > > > while i <len(final_1): > j = offset_1 > while j <len(x1): > if final_1[i] == x1[j]: > new_file.write(x1[j]) > new_file.write(x1[j+1]) > new_file.write(x1[j+2]) > new_file.write(x1[j+3]) > offset_1 = j+4 > quit_loop="True" > if quit_loop == "True":break > else: j=j +1 > i=i+1 Just looking at the code, there are a few things to note: 1. You are iterating "i" from zero to len(final_1)-1. The pythonic way to code this is using "for i in range(len(final_1)):...". However, since you only use the index "i" to look up an element inside the "final_1" sequence, the proper way is "for f in final_1:..." 2. Instead of writing four lines separately, you could write them in a loop: "for x in x1[j:j+4]: new_file.write(x)". 3. "x1" is a list, right? In that case, there is a member function "index()" that searches for an element and accepts an optional start position. 4. The "quit_loop" is useless, and you probably are getting wrong results because you don't reset this value. If you use "break" at the place where you assign "True" to it, you will probably get what you want. Also, Python has real boolean variables with the two values "True" and "False", you don't have to use strings. Concerning the speed, you can probably improve it by not storing the lines of the input file in "x1", but rather creating a dictionary mapping between the input value and the according four lines: content = open(...).readlines() d = {} for i in range(0, len(content), 4): d[content[i]] = tuple(content[i, i+4]) Then, drop the "offset_1" (at least do that until you have the code working correctly), as it doesn't work with a dictionary and the dictionary will probably be faster anyway. The whole loop above then becomes: for idf in final_1: for l in d.get(idf): new_file.write(l) ;) I hope I gave you a few ideas, good luck! Uli -- http://mail.python.org/mailman/listinfo/python-list