Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto: > Hallo, > > > > I have an html file on my pc and I want to read it to extract some text. > > Can you help on which libs I have to use and how can I do it? > > > > thank you so much. > > > > Michele
Thank you to all. Hi Chris, thank you for your hint. I'll try to do as you said and to be clear: I have to work on an HTML File. This file is not a website-file, neither it comes from internet. It is a file created by a local software (where "local" means "on my pc"). On this file, I need to do this operation: 1) Open the file 2) Check the occurences of the strings: 2a) XXXX, in this case I have this code: <tr style="font-size: 10" align="left"> <th> </th><th> DTC CODE Read: </th> <td> <samp> </samp> XXXX </td> </tr> 2b) NOT PASSED, in this case I have this code: <tr style="color: red" align="left"> <th> </th><th> CODE CHECK </th> <th> : NOT PASSED </th> </tr> Note: color in "<tr style="color: red" align="left">" can be "red" or "orange" 2c) OK or PASSED 3) Then, I need to fill an excel file following this rules: 3a) If 2a or 2b occurs on htmlfile, I'll write NOK in excel file 3b) If 2c occurs on htmlfile, I'll write OK in excel file Note: 1) In this example, in 2b case, I have "CODE CHECK" in the code, but I could also have "TEXT CHECK" or "CHAR CHECK". 2) The research of occurences can be done either by tag ("<tr style="color: red" align="left">") or via (NOT PASSED, PASSED). But I would to use the first method. ================================================== In my script I have used the second way to looking for, i.e.: ** fileorig = "C:\Users\Mike\Desktop\\2012_05_16_1___p0201_13.html" f = open(fileorig, 'r') nomefile = f.read() for x in nomefile: if 'XXXX' in nomefile: print 'NOK' else : print 'OK' ** But this one works on charachters and not on strings (i.e.: in this way I have searched NOT string by string, but charachters-by-charachters). =============================================== I hope I was clear. Thank for your help Michele -- http://mail.python.org/mailman/listinfo/python-list