Hi all, but a particular hello to Chris Angelino which with their critics and suggestions pushed me to make a full revision of my application on hex dump in presence of utf-8 chars. If you are not using python 3, the utf-8 codec can add further programming problems, especially if you are not a guru.... The script seems very long but I commented too much ... sorry. It is very useful (at least IMHO...) It works under Linux. but there is still a little problem which I didn't solve (at least programmatically...).
# -*- coding: utf-8 -*- # px.py vers. 11 (pxb.py) # python 2.6.6 # hex-dump w/ or w/out utf-8 chars # Using spaces as separators, this script shows # (better than tabnanny) uncorrect indentations. # to save output > python pxb.py hex.txt > px9_out_hex.txt nLenN=3 # n. of digits for lines # version almost thoroughly rewritten on the ground of # the critics and modifications suggested by Chris Angelico # in the first version the utf-8 conversion to hex was shown horizontaly: # 005 # qwerty: non è unicode bensì ascii # 2 7767773 666 ca 7666666 6667ca 676660 # 3 175249a efe 38 5e93f45 25e33c 13399a # ... but I had to insert additional chars to keep the # synchronization between the literal and the hex part # 005 # qwerty: non è. unicode bensì. ascii # 2 7767773 666 ca 7666666 6667ca 676660 # 3 175249a efe 38 5e93f45 25e33c 13399a # in the second version I followed Chris suggestion: # "to show the hex utf-8 vertically" # 005 # qwerty: non è unicode bensì ascii # 2 7767773 666 c 7666666 6667c 676660 # 3 175249a efe 3 5e93f45 25e33 13399a # a a # 8 c # between the two solutions, I selected the first one + syncronization, # which seems more compact and easier to program (... I'm lazy...) # various run options: # std : python px.py file # bash cat : cat file | python px.py (alias hex) # bash echo: echo line | python px.py " " # works on any n. of bytes for utf-8 # For the user: it is helpful to have in a separate file # all special characters of interest, together with their names. # error: # echo '345"789"'|hex > 345"789" 345"789" # 33323332 instead of 333233320 # 3452789 a " " 34527892a # ... correction: avoiding "\n at end of test-line # echo "345'789'"|hex > 345'789' # 333233320 # 34577897a # same error in every run option # If someone can solve this bug... ################### import fileinput import sys, commands lF=[] # input file as list for line in fileinput.input(): # handles all the details of args-or-stdin lF.append(line) sSpacesXLN = ' ' * (nLenN+1) for n in xrange(len(lF)): sLineHexND=lF[n].encode('hex') # ND = no delimiter (space) sLineHex =lF[n].encode('hex').replace('20',' ') sLineHexH =sLineHex[::2] sLineHexL =sLineHex[1::2] sSynchro='' for k in xrange(0,len(sLineHexND),2): if sLineHexND[k]<'8': sSynchro+= sLineHexND[k]+sLineHexND[k+1] k+=1 elif sLineHexND[k]=='c': sSynchro+='c'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+'2e' k+=3 elif sLineHexND[k]=='e': sSynchro+='e'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+\ sLineHexND[k+4]+sLineHexND[k+5]+'2e2e' k+=5 # text output (synchroinized) print str(n+1).zfill(nLenN)+' '+sSynchro.decode('hex'), print sSpacesXLN + sLineHexH print sSpacesXLN + sLineHexL+ '\n' If there are problems of understanding, probably due to fonts, the best thing is import it in an editor with "mono" fonts... As I already told to Chris... critics are welcome! Bye, Blatt. -- http://mail.python.org/mailman/listinfo/python-list