Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto: > Hallo, > > > > I have an html file on my pc and I want to read it to extract some text. > > Can you help on which libs I have to use and how can I do it? > > > > thank you so much. > > > > Michele
Hi Peter and thanks for your precious help. Fortunately, there aren't runs of "X" with repeats other than 2 or 4. Starting from your code, I wrote this code (I post it, so it could be helpful for other people): f = open(fileorig, 'r') nomefile = f.read() start = nomefile.find("XX") start2 = nomefile.find("NOT PASSED") c0 = 0 c1 = 0 c2 = 0 while (start != -1) | (start2 != -1): if nomefile[start:start+4] == "XXXX": print "XXXX found at location", start start += 4 c0 +=1 elif nomefile[start:start+2] == "XX": print "XX found at location", start start += 2 c1 +=1 if nomefile[start2:start2+10] == "NOT PASSED": print "NOT PASSED found at location", start2 start2 += 10 c2 +=1 start = nomefile.find("XX", start) start2 = nomefile.find("NOT PASSED", start2) print "XXXX %s founded" % c0, "\nXX %s founded" % c1, "\nNOT PASSED %s founded" % c2 Now, I'm able to find all occurences of strings: "XXXX", "XX" and "NOT PASSED" Thank you so much. -- http://mail.python.org/mailman/listinfo/python-list