Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
> Hallo,
> 
> 
> 
> I have an html file on my pc and I want to read it to extract some text.
> 
> Can you help on which libs I have to use and how can I do it?
> 
> 
> 
> thank you so much.
> 
> 
> 
> Michele

Thank you to all.

Hi Chris, thank you for your hint. I'll try to do as you said and to be clear:

I have to work on an HTML File. This file is  not a website-file, neither it 
comes from internet.
It is a file created by a local software (where "local" means "on my pc").

On this file, I need to do this operation:

        1) Open the file
        2) Check the occurences of the strings:
                2a) XXXX, in this case I have this code:
                                        
                                        <tr style="font-size: 10" align="left">
                                        <th>
                                        </th><th>
                                        DTC CODE Read:
                                        </th>
                                        <td>
                                        <samp>
                                        &nbsp;
                                        &nbsp;
                                        &nbsp;
                                        &nbsp;
                                        &nbsp;
                                        </samp>
                                        XXXX
                                        </td>
                                        </tr>

                2b)     NOT PASSED, in this case I have this code:
                
                                        <tr style="color: red" align="left">
                                        <th>
                                        </th><th>
                                        CODE CHECK
                                        </th>
                                        <th>
                                        : NOT PASSED
                                        </th>
                                        </tr>
                        Note: color in "<tr style="color: red" align="left">" 
can be "red" or "orange"
                        
                2c) OK or PASSED
           
        3) Then, I need to fill an excel file following this rules:
                3a) If 2a or 2b occurs on htmlfile, I'll write NOK in excel file
                3b) If 2c occurs on htmlfile, I'll write OK in excel file

Note:
1) In this example, in 2b case, I have "CODE CHECK" in the code, but I could 
also have "TEXT CHECK" or "CHAR CHECK".
2) The research of occurences can be done either by tag ("<tr style="color: 
red" align="left">") or via  (NOT PASSED, PASSED). But I would to use the first 
method.
==================================================

In my script I have used the second way to looking for, i.e.:

**
fileorig = "C:\Users\Mike\Desktop\\2012_05_16_1___p0201_13.html"

f = open(fileorig, 'r')
nomefile = f.read()

for x in nomefile:
    if 'XXXX' in nomefile:
        print 'NOK'
    else :
        print 'OK'
**
But this one works on charachters and not on strings (i.e.: in this way I have 
searched NOT string by string, but charachters-by-charachters).
                
===============================================

I hope I was clear.

Thank for your help
Michele
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to