I'm reading a fixed format text file, line by line. I hereunder present the code. I have <snipped> out part not related to the file reading.
Only relevant detail left out is the lstCutters. It looks like this:
[[1, 9], [11, 21], [23, 48], [50, 59], [61, 96], [98, 123], [125, 150]]
It specifies the first and last character position of each token in the fixed format of the input line.
All this works fine, and is only to explain where I'm going.

The code, in the function definition, is broken up in more lines than necessary, to be able to monitor the variables, step by step.

--- Code start ------

import codecs

<snip>

def CutLine2List(strIn,lstCut):
    strIn = strIn.strip()
    print '>InNextLine>',strIn
    # skip if line is empty
    if len(strIn)<1:
        return False
    lstIn = list()
    for cc in lstCut:
        strSubline =strIn[cc[0]-1:cc[1]-1].strip()
        lstIn.append(strSubline)
        print '>InSubline2>'+lstIn[len(lstIn)-1]+'<'
    del strIn, lstCut,cc
    print '>InReturLst>',lstIn
    return lstIn

<snip>

filIn = codecs.open(
                    strFileNameIn,
                    mode='r',
                    encoding='utf-8',
                    errors='strict',
                    buffering=1)
 for linIn in filIn:
    lstIn = CutLine2List(linIn,lstCutters)

--- Code end ------

A sample output, representing one line from the input file looks like this:

>InNextLine> I 30 2002-12-11 20:01:19.280 563 FANØ 2001-12-12-15.46.12.734502 2001-12-12-15.46.12.734502
>InSubline2>I<
>InSubline2>30<
>InSubline2>2002-12-11 20:01:19.280<
>InSubline2>563<
>InSubline2>FANØ<
>InSubline2>2001-12-12-15.46.12.73450<
>InSubline2>2001-12-12-15.46.12.73450<
>InReturLst> [u'I', u'30', u'2002-12-11 20:01:19.280', u'563', u'FAN\xd8', u'2001-12-12-15.46.12.73450', u'2001-12-12-15.46.12.73450']


Question:
In the last printout, tagged >InReturLst> all entries turn into uni-code. What happens here? Look for the word 'FANØ'. This word changes from 'FANØ' to u'FAN\xd8' -- That's a problem to me, and I don't want it to change like this.

What do I do to stop this behavior?

Best Regards
Martin

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to