out utf-8 chars

blatt Sun, 07 Jul 2013 17:29:12 -0700

Hi all,
but a particular hello to Chris Angelino which with their critics and
suggestions pushed me to make a full revision of my application on
hex dump in presence of utf-8 chars.
If you are not using python 3, the utf-8 codec can add further programming
problems, especially if you are not a guru....
The script seems very long but I commented too much ... sorry.
It is very useful (at least IMHO...)
It works under Linux. but there is still a little problem which I didn't
solve (at least programmatically...).



# -*- coding: utf-8 -*-
# px.py vers. 11 (pxb.py)   # python 2.6.6
# hex-dump w/ or w/out utf-8 chars
# Using spaces as separators, this script shows
# (better than tabnanny)  uncorrect  indentations.

# to save output > python pxb.py hex.txt > px9_out_hex.txt

nLenN=3          # n. of digits for lines

# version almost thoroughly rewritten on the ground of
# the critics and modifications suggested by Chris Angelico

# in the first version the utf-8 conversion to hex was shown horizontaly:

# 005 # qwerty: non è unicode bensì ascii
#     2 7767773 666 ca 7666666 6667ca 676660
#     3 175249a efe 38 5e93f45 25e33c 13399a

# ... but I had to insert additional chars to keep the
#     synchronization between the literal and the hex part

# 005 # qwerty: non è. unicode bensì. ascii
#     2 7767773 666 ca 7666666 6667ca 676660
#     3 175249a efe 38 5e93f45 25e33c 13399a

# in the second version I followed Chris suggestion:
# "to show the hex utf-8 vertically"

# 005 # qwerty: non è unicode bensì ascii
#     2 7767773 666 c 7666666 6667c 676660
#     3 175249a efe 3 5e93f45 25e33 13399a
#                   a             a
#                   8             c

# between the two solutions, I selected the first one + syncronization,
#     which seems more compact and easier to program (... I'm lazy...)

# various run options:
# std      :             python px.py file
# bash cat : cat  file | python px.py (alias hex)
# bash echo: echo line | python px.py    "    "

# works on any n. of bytes for utf-8

# For the user: it is helpful to have in a separate file
# all special characters of interest, together with their names.

# error:

# echo '345"789"'|hex    > 345"789"              345"789"
#                          33323332  instead of  333233320
#                          3452789 a    "    "   34527892a

# ... correction: avoiding "\n at end of test-line
# echo "345'789'"|hex   >  345'789'
#                          333233320
#                          34577897a

# same error in every run option

# If someone can solve this bug...

###################


import fileinput
import sys, commands

lF=[]                           # input file as list
for line in fileinput.input():  # handles all the details of args-or-stdin
    lF.append(line)
sSpacesXLN = ' ' * (nLenN+1)


for n in xrange(len(lF)):
    sLineHexND=lF[n].encode('hex')     # ND = no delimiter (space)
    sLineHex  =lF[n].encode('hex').replace('20','  ')
    sLineHexH =sLineHex[::2]
    sLineHexL =sLineHex[1::2]

    sSynchro=''
    for k in xrange(0,len(sLineHexND),2):
        if sLineHexND[k]<'8':
            sSynchro+= sLineHexND[k]+sLineHexND[k+1]
            k+=1
        elif sLineHexND[k]=='c':
            sSynchro+='c'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+'2e'
            k+=3
        elif sLineHexND[k]=='e':
            sSynchro+='e'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+\
                          sLineHexND[k+4]+sLineHexND[k+5]+'2e2e'
            k+=5

    # text output (synchroinized)
    print str(n+1).zfill(nLenN)+' '+sSynchro.decode('hex'),
    print sSpacesXLN + sLineHexH
    print sSpacesXLN + sLineHexL+ '\n'


If there are problems of understanding, probably due to fonts, the best
thing is import it in an editor with "mono" fonts...

As I already told to Chris... critics are welcome!

Bye, Blatt.










-- 
http://mail.python.org/mailman/listinfo/python-list

hex dump w/ or w/out utf-8 chars

Reply via email to