how to transfer my utf8 code saved in a file to gbk code
My file contains such strings : \xe6\x97\xa5\xe6\x9c\x9f\xef\xbc\x9a I want to read the content of this file and transfer it to the corresponding gbk code,a kind of Chinese character encode style. Everytime I was trying to transfer, it will output the same thing no matter which method was used. It seems like that when Python reads it, Python will taks '\' as a common char and this string at last will be represented as "\\xe6\\x97\ \xa5\\xe6\\x9c\\x9f\\xef\\xbc\\x9a" , then the "\" can be 'correctly' output,but that's not what I want to get. Anyone can help me? Thanks in advance. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to transfer my utf8 code saved in a file to gbk code
On Jun 7, 11:25 pm, John Machin wrote: > On Jun 7, 10:55 pm, higer wrote: > > > My file contains such strings : > > \xe6\x97\xa5\xe6\x9c\x9f\xef\xbc\x9a > > Are you sure? Does that occupy 9 bytes in your file or 36 bytes? > It was saved in a file, so it occupy 36 bytes. If I just use a variable to contain this string, it can certainly work out correct result,but how to get right answer when reading from file. > > > > I want to read the content of this file and transfer it to the > > corresponding gbk code,a kind of Chinese character encode style. > > Everytime I was trying to transfer, it will output the same thing no > > matter which method was used. > > It seems like that when Python reads it, Python will taks '\' as a > > common char and this string at last will be represented as "\\xe6\\x97\ > > \xa5\\xe6\\x9c\\x9f\\xef\\xbc\\x9a" , then the "\" can be 'correctly' > > output,but that's not what I want to get. > > > Anyone can help me? > > try this: > > utf8_data = your_data.decode('string-escape') > unicode_data = utf8_data.decode('utf8') > # unicode derived from your sample looks like this 日期: is that what > you expected? You are right , the result is 日期 which I just expect. If you save the string in a variable, you surely can get the correct result. But it is just a sample, so I give a short string, what if so many characters in a file? > gbk_data = unicode_data.encode('gbk') > I have tried this method which you just told me, but unfortunately it does not work(mess code). > If that "doesn't work", do three things: > (1) give us some unambiguous hard evidence about the contents of your > data: > e.g. # assuming Python 2.x My Python versoin is 2.5.2 > your_data = open('your_file.txt', 'rb').read(36) > print repr(your_data) > print len(your_data) > print your_data.count('\\') > print your_data.count('x') > The result is: '\\xe6\\x97\\xa5\\xe6\\x9c\\x9f\\xef\\xbc\\x9a' 36 9 9 > (2) show us the source of the script that you used def UTF8ToChnWords(): f = open("123.txt","rb") content=f.read() print repr(content) print len(content) print content.count("\\") print content.count("x") pass if __name__ == '__main__': UTF8ToChnWords() > (3) Tell us what "doesn't work" means in this case It doesn't work because no matter in what way we deal with it we often get 36 bytes string not 9 bytes.Thus, we can not get the correct answer. > > Cheers, > John Thank you very much, higer -- http://mail.python.org/mailman/listinfo/python-list
Re: how to transfer my utf8 code saved in a file to gbk code
On Jun 8, 8:20 am, MRAB wrote: > John Machin wrote: > > On Jun 8, 12:13 am, "R. David Murray" wrote: > >> higer wrote: > >>> My file contains such strings : > >>> \xe6\x97\xa5\xe6\x9c\x9f\xef\xbc\x9a > >> If those bytes are what is in the file (and it sounds like they are), > >> then the data in your file is not in UTF8 encoding, it is in ASCII > >> encoded as hexidecimal escape codes. > > > OK, I'll bite: what *ASCII* character is encoded as either "\xe6" or > > r"\xe6" by what mechanism in which parallel universe? > > Maybe he means that the file itself is in ASCII. Yes,my file itself is in ASCII. -- http://mail.python.org/mailman/listinfo/python-list
Re: how to transfer my utf8 code saved in a file to gbk code
Thank you Mark, that works. Firstly using 'string-escape' to decode the content is the key point,so I can get the Chinese characters now. Regards, -higer -- http://mail.python.org/mailman/listinfo/python-list
How should I compare two txt files separately coming from windows/dos and linux/unix
I just want to compare two files,one from windows and the other from unix. But I do not want to compare them through reading them line by line. Then I found there is a filecmp module which is used as file and directory comparisons. However,when I use two same files (one from unix,one from windows,the content of them is the same) to test its cmp function, filecmp.cmp told me false. Later, I found that windows use '\n\r' as new line flag but unix use '\n', so filecmp.cmp think that they are different,then return false. So, can anyone tell me that is there any method like IgnoreNewline which can ignore the difference of new line flag in diffrent platforms? If not,I think filecmp may be not a good file comparison module. Thanks, higer -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I compare two txt files separately coming from windows/dos and linux/unix
On Jun 11, 1:08 pm, John Machin wrote: > Chris Rebert rebertia.com> writes: > > > > > > > On Wed, Jun 10, 2009 at 8:11 PM, higer gmail.com> wrote: > > > I just want to compare two files,one from windows and the other from > > > unix. But I do not want to compare them through reading them line by > > > line. Then I found there is a filecmp module which is used as file and > > > directory comparisons. However,when I use two same files (one from > > > unix,one from windows,the content of them is the same) to test its cmp > > > function, filecmp.cmp told me false. > > > > Later, I found that windows use '\n\r' as new line flag but unix use > > > '\n', so filecmp.cmp think that they are different,then return false. > > > So, can anyone tell me that is there any method like IgnoreNewline > > > which can ignore the difference of new line flag in diffrent > > > platforms? If not,I think filecmp may be not a good file comparison > > > Nope, there's no such flag. You could run the files through either > > `dos2unix` or `unix2dos` beforehand though, which would solve the > > problem. > > Or you could write the trivial line comparison code yourself and just > > make sure to open the files in Universal Newline mode (add 'U' to the > > `mode` argument to `open()`). > > You could also file a bug (a patch to add newline insensitivity would > > probably be welcome). > > Or popen diff ... > > A /very/ /small/ part of the diff --help output: > > -E --ignore-tab-expansion Ignore changes due to tab expansion. > -b --ignore-space-change Ignore changes in the amount of white space. > -w --ignore-all-space Ignore all white space. > -B --ignore-blank-lines Ignore changes whose lines are all blank. > -I RE --ignore-matching-lines=RE Ignore changes whose lines all match RE. > --strip-trailing-cr Strip trailing carriage return on input. > > Cheers, > John Tool can certainly be used to compare two files,but I just want to compare them using Python code. -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I compare two txt files separately coming from windows/dos and linux/unix
On Jun 11, 11:44 am, Chris Rebert wrote: > On Wed, Jun 10, 2009 at 8:11 PM, higer wrote: > > I just want to compare two files,one from windows and the other from > > unix. But I do not want to compare them through reading them line by > > line. Then I found there is a filecmp module which is used as file and > > directory comparisons. However,when I use two same files (one from > > unix,one from windows,the content of them is the same) to test its cmp > > function, filecmp.cmp told me false. > > > Later, I found that windows use '\n\r' as new line flag but unix use > > '\n', so filecmp.cmp think that they are different,then return false. > > So, can anyone tell me that is there any method like IgnoreNewline > > which can ignore the difference of new line flag in diffrent > > platforms? If not,I think filecmp may be not a good file comparison > > Nope, there's no such flag. You could run the files through either > `dos2unix` or `unix2dos` beforehand though, which would solve the > problem. > Or you could write the trivial line comparison code yourself and just > make sure to open the files in Universal Newline mode (add 'U' to the > `mode` argument to `open()`). > You could also file a bug (a patch to add newline insensitivity would > probably be welcome). > > Cheers, > Chris > --http://blog.rebertia.com Thank you very much. Adding 'U' argument can perfectly work, and I think it is definitely to report this as a bug to Python.org as you say. Cheers, higer -- http://mail.python.org/mailman/listinfo/python-list
failed to build decompyle/unpyc project on WindowsXP
Maybe everyone know that decompyle(hosted on SourceForge.net) is a tool to transfer a .pyc file to .py file and now it does only support Python 2.3 or the below. I have found a project named unpyc which can support Python version 2.5. Unpyc project is build on decompyle which is hosted on google code and if you want you can download it. I build unpyc on Ubuntu successfully and can run it ok. But with some purpose, I just want to use this tool on my WindowsXP, so I tried to build it. I have tried many times and methods, with .net2003 or MingGW, but I failed. So,I come here looking for sombody can help me.I will give the showing error message with different method on the following: 1 Using command : python setup.py install F:\unpyc>python setup.py install running install running build running build_py creating build\lib.win32-2.5 creating build\lib.win32-2.5\unpyc copying unpyc\dis_15.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_16.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_20.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_21.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_22.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_23.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_24.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_25.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_26.py -> build\lib.win32-2.5\unpyc copying unpyc\dis_files.py -> build\lib.win32-2.5\unpyc copying unpyc\magics.py -> build\lib.win32-2.5\unpyc copying unpyc\marshal_files.py -> build\lib.win32-2.5\unpyc copying unpyc\opcode_23.py -> build\lib.win32-2.5\unpyc copying unpyc\opcode_24.py -> build\lib.win32-2.5\unpyc copying unpyc\opcode_25.py -> build\lib.win32-2.5\unpyc copying unpyc\opcode_26.py -> build\lib.win32-2.5\unpyc copying unpyc\Parser.py -> build\lib.win32-2.5\unpyc copying unpyc\Scanner.py -> build\lib.win32-2.5\unpyc copying unpyc\spark.py -> build\lib.win32-2.5\unpyc copying unpyc\verify.py -> build\lib.win32-2.5\unpyc copying unpyc\Walker.py -> build\lib.win32-2.5\unpyc copying unpyc\__init__.py -> build\lib.win32-2.5\unpyc running build_ext building 'unpyc/marshal_25' extension creating build\temp.win32-2.5 creating build\temp.win32-2.5\Release creating build\temp.win32-2.5\Release\unpyc f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c / nologo /Ox /MD /W3 /GX /DNDEBU G -IF:\Python25\include -IF:\Python25\PC /Tcunpyc/marshal_25.c /Fobuild \temp.win32-2.5\Release\unpyc /marshal_25.obj marshal_25.c unpyc\marshal_25.c(401) : warning C4273: 'PyMarshal_WriteLongToFile' : inconsistent dll linkage unpyc\marshal_25.c(413) : warning C4273: 'PyMarshal_WriteObjectToFile' : inconsistent dll linkage unpyc\marshal_25.c(1004) : warning C4273: 'PyMarshal_ReadShortFromFile' : inconsistent dll linkage unpyc\marshal_25.c(1015) : warning C4273: 'PyMarshal_ReadLongFromFile' : inconsistent dll linkage unpyc\marshal_25.c(1044) : warning C4273: 'PyMarshal_ReadLastObjectFromFile' : inconsistent dll link age unpyc\marshal_25.c(1087) : warning C4273: 'PyMarshal_ReadObjectFromFile' : inconsistent dll linkage unpyc\marshal_25.c(1101) : warning C4273: 'PyMarshal_ReadObjectFromString' : inconsistent dll linkag e unpyc\marshal_25.c(1116) : warning C4273: 'PyMarshal_WriteObjectToString' : inconsistent dll linkage f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\link.exe / DLL /nologo /INCREMENTAL:NO /LI BPATH:F:\Python25\libs /LIBPATH:F:\Python25\PCBuild /EXPORT:initunpyc/ marshal_25 build\temp.win32-2. 5\Release\unpyc/marshal_25.obj /OUT:build\lib.win32-2.5\unpyc/ marshal_25.pyd /IMPLIB:build\temp.win3 2-2.5\Release\unpyc\marshal_25.lib marshal_25.obj : error LNK2001: unresolved external symbol initunpyc/ marshal_25 build\temp.win32-2.5\Release\unpyc\marshal_25.lib : fatal error LNK1120: 1 unresolved externals LINK : fatal error LNK1141: failure during build of exports file error: command '"f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\link.exe"' failed with e xit status 1141 2Using command: python setup.py build -c mingw32 F:\unpyc>python setup.py build -c mingw32 running build running build_py running build_ext building 'unpyc/marshal_25' extension F:\mingw\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IF:\Python25\include - IF:\Python25\PC -c unpyc/mars hal_25.c -o build\temp.win32-2.5\Release\unpyc\marshal_25.o unpyc/marshal_25.c:1087: warning: 'PyMarshal_ReadObjectFromFile' defined locally after being referen ced with dllimport linkage unpyc/marshal_25.c:1101: warning: 'PyMarshal_ReadObjectFromString' defined locally after being refer enced with dllimport linkage writing build\temp.win32-2.5\Release\unpyc\marshal_25.def F:\mingw\bin\gcc.exe -mno-cygwin -shared -s build \temp.win32-2.5\Release\unpyc\marshal_25.o build\te mp.win32-2.5\Release\unpyc\marshal_25.def -LF:\Python25\libs -LF: \Python25\PCBuild -lpython25 -lmsvc r71 -o build\lib.win32-2.5\unpyc/marshal_25.pyd F:\Python25\libs/libpython25.a(dcbbs00336.o):(.text+0x0): multiple definition of `PyMarshal_ReadObje ctFromStr
Re: failed to build decompyle/unpyc project on WindowsXP
On Jun 12, 4:55 pm, higer wrote: > Maybe everyone know that decompyle(hosted on SourceForge.net) is a > tool to transfer a .pyc file to .py file and now it does only support > Python 2.3 or the below. I have found a project named unpyc which can > support Python version 2.5. Unpyc project is build on decompyle which > is hosted on google code and if you want you can download it. > > I build unpyc on Ubuntu successfully and can run it ok. But with some > purpose, I just want to use this tool on my WindowsXP, so I tried to > build it. I have tried many times and methods, with .net2003 or > MingGW, but I failed. So,I come here looking for sombody can help me.I > will give the showing error message with different method on the > following: > > 1 Using command : python setup.py install > F:\unpyc>python setup.py install > running install > running build > running build_py > creating build\lib.win32-2.5 > creating build\lib.win32-2.5\unpyc > copying unpyc\dis_15.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_16.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_20.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_21.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_22.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_23.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_24.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_25.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_26.py -> build\lib.win32-2.5\unpyc > copying unpyc\dis_files.py -> build\lib.win32-2.5\unpyc > copying unpyc\magics.py -> build\lib.win32-2.5\unpyc > copying unpyc\marshal_files.py -> build\lib.win32-2.5\unpyc > copying unpyc\opcode_23.py -> build\lib.win32-2.5\unpyc > copying unpyc\opcode_24.py -> build\lib.win32-2.5\unpyc > copying unpyc\opcode_25.py -> build\lib.win32-2.5\unpyc > copying unpyc\opcode_26.py -> build\lib.win32-2.5\unpyc > copying unpyc\Parser.py -> build\lib.win32-2.5\unpyc > copying unpyc\Scanner.py -> build\lib.win32-2.5\unpyc > copying unpyc\spark.py -> build\lib.win32-2.5\unpyc > copying unpyc\verify.py -> build\lib.win32-2.5\unpyc > copying unpyc\Walker.py -> build\lib.win32-2.5\unpyc > copying unpyc\__init__.py -> build\lib.win32-2.5\unpyc > running build_ext > building 'unpyc/marshal_25' extension > creating build\temp.win32-2.5 > creating build\temp.win32-2.5\Release > creating build\temp.win32-2.5\Release\unpyc > f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c / > nologo /Ox /MD /W3 /GX /DNDEBU > G -IF:\Python25\include -IF:\Python25\PC /Tcunpyc/marshal_25.c /Fobuild > \temp.win32-2.5\Release\unpyc > /marshal_25.obj > marshal_25.c > unpyc\marshal_25.c(401) : warning C4273: 'PyMarshal_WriteLongToFile' : > inconsistent dll linkage > unpyc\marshal_25.c(413) : warning C4273: > 'PyMarshal_WriteObjectToFile' : inconsistent dll linkage > unpyc\marshal_25.c(1004) : warning C4273: > 'PyMarshal_ReadShortFromFile' : inconsistent dll linkage > unpyc\marshal_25.c(1015) : warning C4273: > 'PyMarshal_ReadLongFromFile' : inconsistent dll linkage > unpyc\marshal_25.c(1044) : warning C4273: > 'PyMarshal_ReadLastObjectFromFile' : inconsistent dll link > age > unpyc\marshal_25.c(1087) : warning C4273: > 'PyMarshal_ReadObjectFromFile' : inconsistent dll linkage > unpyc\marshal_25.c(1101) : warning C4273: > 'PyMarshal_ReadObjectFromString' : inconsistent dll linkag > e > unpyc\marshal_25.c(1116) : warning C4273: > 'PyMarshal_WriteObjectToString' : inconsistent dll linkage > > f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\link.exe / > DLL /nologo /INCREMENTAL:NO /LI > BPATH:F:\Python25\libs /LIBPATH:F:\Python25\PCBuild /EXPORT:initunpyc/ > marshal_25 build\temp.win32-2. > 5\Release\unpyc/marshal_25.obj /OUT:build\lib.win32-2.5\unpyc/ > marshal_25.pyd /IMPLIB:build\temp.win3 > 2-2.5\Release\unpyc\marshal_25.lib > marshal_25.obj : error LNK2001: unresolved external symbol initunpyc/ > marshal_25 > build\temp.win32-2.5\Release\unpyc\marshal_25.lib : fatal error > LNK1120: 1 unresolved externals > LINK : fatal error LNK1141: failure during build of exports file > error: command '"f:\Program Files\Microsoft Visual Studio .NET > 2003\Vc7\bin\link.exe"' failed with e > xit status 1141 > > 2 Using command: python setup.py build -c mingw32 > > F:\unpyc>python setup.py build -c mingw32 > running build > running build_py > running build_ext > building 'unpyc/marshal_25' extension > F:\mingw\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IF:\Python25\include - > IF:\Python25\PC -c unpyc/mars > hal_25.c -o build\temp.w
question about a command like 'goto ' in Python's bytecode or it's just a compiler optimization?
My Python version is 2.5.2; When I reading the bytecode of some pyc file, I always found that there are many jump command from different position,but to the same position. You can see this situation in following code(this bytecode is just from one .pyc file and I don't have its source .py file): . 526 POP_TOP '' 527 LOAD_FAST 'imeHandle' 530 LOAD_ATTR 'isCnInput' 533 CALL_FUNCTION_0 '' 536 JUMP_IF_FALSE '574' 539 POP_TOP '' 540 LOAD_FAST 'GUIDefine' 543 LOAD_ATTR 'CandidateIsOpen' 546 JUMP_IF_TRUE '574' 549 POP_TOP '' 550 LOAD_FAST 'GUIDefine' 553 LOAD_ATTR 'CompositionWndIsOpen' 556 JUMP_IF_TRUE '574' 559 POP_TOP '' 560 LOAD_FAST 'isWanNengWB' 563 JUMP_IF_FALSE '574' 566 POP_TOP '' 567 LOAD_FAST 'state' 570 LOAD_CONST1 573 BINARY_AND'' 574_0 COME_FROM '' 574_1 COME_FROM '' 574_2 COME_FROM '' 574_3 COME_FROM '' ... >From the above bytecode,we know that line 574 is the point that many position jumps to.So,it just looks like the 'goto' function in C, but we know that there is none such function in Python. One 'JUMP**' command is companied with a 'COME_FROM' command,so more than one 'COME_FROM' OPs are listed on line 574... But ,the question is, I have tried a lot of ways(e.g.for loop,while loop and mixed) to re-present 'goto' style bytecodes like this, but the result depressed me. So,I think maybe it is just a compiler optimization in Python2.5? I'm not sure,so I'm appreciated that if anyone can help me. -- http://mail.python.org/mailman/listinfo/python-list
Re: question about a command like 'goto ' in Python's bytecode orit's just a compiler optimization?
Hi,all: I'm sorry that I did not make my question clear. What I mean is that what the souce code would look like that will be compiled to such bytecodes. Regards, higer -- http://mail.python.org/mailman/listinfo/python-list
Re: question about a command like 'goto ' in Python's bytecode or it's just a compiler optimization?
On Jun 17, 8:29 pm, John Machin wrote: > On Jun 17, 1:40 pm, higer wrote: > > > My Python version is 2.5.2; When I reading the bytecode of some pyc > > file, I always found that there are many jump command from different > > position,but to the same position. You can see this situation in > > following code(this bytecode is just from one .pyc file and I don't > > have its source .py file): > > Why don't you (a) read the answers you got on stackoverflow to the > identical question (b) WRITE some code instead of inspecting the > entrails of the code of others? Thanks, I read the answer just now. And thank everbody for your suggestion! -- http://mail.python.org/mailman/listinfo/python-list