how to transfer my utf8 code saved in a file to gbk code

2009-06-07 Thread higer
My file contains such strings :
\xe6\x97\xa5\xe6\x9c\x9f\xef\xbc\x9a

I want to read the content of this file and transfer it to the
corresponding gbk code,a kind of Chinese character encode style.
Everytime I was trying to transfer, it will output the same thing no
matter which method was used.
 It seems like that when Python reads it, Python will taks '\' as a
common char and this string at last will be represented as "\\xe6\\x97\
\xa5\\xe6\\x9c\\x9f\\xef\\xbc\\x9a" , then the "\" can be 'correctly'
output,but that's not what I want to get.

Anyone can help me?


Thanks in advance.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to transfer my utf8 code saved in a file to gbk code

2009-06-07 Thread higer
On Jun 7, 11:25 pm, John Machin  wrote:
> On Jun 7, 10:55 pm, higer  wrote:
>
> > My file contains such strings :
> > \xe6\x97\xa5\xe6\x9c\x9f\xef\xbc\x9a
>


> Are you sure? Does that occupy 9 bytes in your file or 36 bytes?
>

It was saved in a file, so it occupy 36 bytes. If I just use a
variable to contain this string, it can certainly work out correct
result,but how to get right answer when reading from file.

>
>
> > I want to read the content of this file and transfer it to the
> > corresponding gbk code,a kind of Chinese character encode style.
> > Everytime I was trying to transfer, it will output the same thing no
> > matter which method was used.
> >  It seems like that when Python reads it, Python will taks '\' as a
> > common char and this string at last will be represented as "\\xe6\\x97\
> > \xa5\\xe6\\x9c\\x9f\\xef\\xbc\\x9a" , then the "\" can be 'correctly'
> > output,but that's not what I want to get.
>
> > Anyone can help me?
>
> try this:
>
> utf8_data = your_data.decode('string-escape')
> unicode_data = utf8_data.decode('utf8')
> # unicode derived from your sample looks like this 日期: is that what
> you expected?

You are right , the result is 日期 which I just expect. If you save the
string in a variable, you surely can get the correct result. But it is
just a sample, so I give a short string, what if so many characters in
a file?

> gbk_data = unicode_data.encode('gbk')
>

I have tried this method which you just told me, but unfortunately it
does not work(mess code).


> If that "doesn't work", do three things:
> (1) give us some unambiguous hard evidence about the contents of your
> data:
> e.g. # assuming Python 2.x

My Python versoin is 2.5.2

> your_data = open('your_file.txt', 'rb').read(36)
> print repr(your_data)
> print len(your_data)
> print your_data.count('\\')
> print your_data.count('x')
>

The result is:

'\\xe6\\x97\\xa5\\xe6\\x9c\\x9f\\xef\\xbc\\x9a'
36
9
9

> (2) show us the source of the script that you used

def UTF8ToChnWords():
f = open("123.txt","rb")
content=f.read()
print repr(content)
print len(content)
print content.count("\\")
print content.count("x")

pass
if __name__ == '__main__':
UTF8ToChnWords()

> (3) Tell us what "doesn't work" means in this case

It doesn't work because no matter in what way we deal with it we often
get 36 bytes string not 9 bytes.Thus, we can not get the correct
answer.

>
> Cheers,
> John

Thank you very much,
higer
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to transfer my utf8 code saved in a file to gbk code

2009-06-07 Thread higer
On Jun 8, 8:20 am, MRAB  wrote:
> John Machin wrote:
> > On Jun 8, 12:13 am, "R. David Murray"  wrote:
> >> higer  wrote:
> >>> My file contains such strings :
> >>> \xe6\x97\xa5\xe6\x9c\x9f\xef\xbc\x9a
> >> If those bytes are what is in the file (and it sounds like they are),
> >> then the data in your file is not in UTF8 encoding, it is in ASCII
> >> encoded as hexidecimal escape codes.
>
> > OK, I'll bite: what *ASCII* character is encoded as either "\xe6" or
> > r"\xe6" by what mechanism in which parallel universe?
>
> Maybe he means that the file itself is in ASCII.

Yes,my file itself is in ASCII.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to transfer my utf8 code saved in a file to gbk code

2009-06-08 Thread higer
Thank you Mark,
that works.

Firstly using 'string-escape' to decode the content is the key
point,so I can get the Chinese characters now.




Regards,
-higer
-- 
http://mail.python.org/mailman/listinfo/python-list


How should I compare two txt files separately coming from windows/dos and linux/unix

2009-06-10 Thread higer
I just want to compare two files,one from windows and the other from
unix. But I do not want to compare them through reading them line by
line. Then I found there is a filecmp module which is used as file and
directory comparisons. However,when I use two same files (one from
unix,one from windows,the content of them is the same) to test its cmp
function, filecmp.cmp told me false.

Later, I found that windows use '\n\r' as new line flag but unix use
'\n', so filecmp.cmp think that they are different,then return false.
So, can anyone tell me that is there any method like IgnoreNewline
which can ignore the difference of new line flag in diffrent
platforms? If not,I think filecmp may be not a good file comparison
module.


Thanks,
higer


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How should I compare two txt files separately coming from windows/dos and linux/unix

2009-06-11 Thread higer
On Jun 11, 1:08 pm, John Machin  wrote:
> Chris Rebert  rebertia.com> writes:
>
>
>
>
>
> > On Wed, Jun 10, 2009 at 8:11 PM, higer gmail.com> wrote:
> > > I just want to compare two files,one from windows and the other from
> > > unix. But I do not want to compare them through reading them line by
> > > line. Then I found there is a filecmp module which is used as file and
> > > directory comparisons. However,when I use two same files (one from
> > > unix,one from windows,the content of them is the same) to test its cmp
> > > function, filecmp.cmp told me false.
>
> > > Later, I found that windows use '\n\r' as new line flag but unix use
> > > '\n', so filecmp.cmp think that they are different,then return false.
> > > So, can anyone tell me that is there any method like IgnoreNewline
> > > which can ignore the difference of new line flag in diffrent
> > > platforms? If not,I think filecmp may be not a good file comparison
>
> > Nope, there's no such flag. You could run the files through either
> > `dos2unix` or `unix2dos` beforehand though, which would solve the
> > problem.
> > Or you could write the trivial line comparison code yourself and just
> > make sure to open the files in Universal Newline mode (add 'U' to the
> > `mode` argument to `open()`).
> > You could also file a bug (a patch to add newline insensitivity would
> > probably be welcome).
>
> Or popen diff ...
>
> A /very/ /small/ part of the diff --help output:
>
>   -E  --ignore-tab-expansion  Ignore changes due to tab expansion.
>   -b  --ignore-space-change  Ignore changes in the amount of white space.
>   -w  --ignore-all-space  Ignore all white space.
>   -B  --ignore-blank-lines  Ignore changes whose lines are all blank.
>   -I RE  --ignore-matching-lines=RE  Ignore changes whose lines all match RE.
>   --strip-trailing-cr  Strip trailing carriage return on input.
>
> Cheers,
> John

Tool can certainly be used to compare two files,but I just want to
compare them using Python code.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How should I compare two txt files separately coming from windows/dos and linux/unix

2009-06-11 Thread higer
On Jun 11, 11:44 am, Chris Rebert  wrote:
> On Wed, Jun 10, 2009 at 8:11 PM, higer wrote:
> > I just want to compare two files,one from windows and the other from
> > unix. But I do not want to compare them through reading them line by
> > line. Then I found there is a filecmp module which is used as file and
> > directory comparisons. However,when I use two same files (one from
> > unix,one from windows,the content of them is the same) to test its cmp
> > function, filecmp.cmp told me false.
>
> > Later, I found that windows use '\n\r' as new line flag but unix use
> > '\n', so filecmp.cmp think that they are different,then return false.
> > So, can anyone tell me that is there any method like IgnoreNewline
> > which can ignore the difference of new line flag in diffrent
> > platforms? If not,I think filecmp may be not a good file comparison
>
> Nope, there's no such flag. You could run the files through either
> `dos2unix` or `unix2dos` beforehand though, which would solve the
> problem.
> Or you could write the trivial line comparison code yourself and just
> make sure to open the files in Universal Newline mode (add 'U' to the
> `mode` argument to `open()`).
> You could also file a bug (a patch to add newline insensitivity would
> probably be welcome).
>
> Cheers,
> Chris
> --http://blog.rebertia.com

Thank you very much. Adding 'U' argument can perfectly work, and I
think it is definitely to report this as a bug to Python.org as you
say.

Cheers,
higer


-- 
http://mail.python.org/mailman/listinfo/python-list


failed to build decompyle/unpyc project on WindowsXP

2009-06-12 Thread higer
Maybe everyone know that decompyle(hosted on SourceForge.net) is a
tool to transfer a .pyc file to .py file and now it does only support
Python 2.3 or the below. I have found a project named unpyc which can
support Python version 2.5. Unpyc project is build on decompyle which
is hosted on google code and if you want you can download it.

I build unpyc on Ubuntu successfully and can run it ok. But with some
purpose, I just want to use this tool on my WindowsXP, so I tried to
build it. I have tried many times and methods, with .net2003 or
MingGW, but I failed. So,I come here looking for sombody can help me.I
will give the showing error message with different method on the
following:

1 Using command : python setup.py install
F:\unpyc>python setup.py install
running install
running build
running build_py
creating build\lib.win32-2.5
creating build\lib.win32-2.5\unpyc
copying unpyc\dis_15.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_16.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_20.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_21.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_22.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_23.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_24.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_25.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_26.py -> build\lib.win32-2.5\unpyc
copying unpyc\dis_files.py -> build\lib.win32-2.5\unpyc
copying unpyc\magics.py -> build\lib.win32-2.5\unpyc
copying unpyc\marshal_files.py -> build\lib.win32-2.5\unpyc
copying unpyc\opcode_23.py -> build\lib.win32-2.5\unpyc
copying unpyc\opcode_24.py -> build\lib.win32-2.5\unpyc
copying unpyc\opcode_25.py -> build\lib.win32-2.5\unpyc
copying unpyc\opcode_26.py -> build\lib.win32-2.5\unpyc
copying unpyc\Parser.py -> build\lib.win32-2.5\unpyc
copying unpyc\Scanner.py -> build\lib.win32-2.5\unpyc
copying unpyc\spark.py -> build\lib.win32-2.5\unpyc
copying unpyc\verify.py -> build\lib.win32-2.5\unpyc
copying unpyc\Walker.py -> build\lib.win32-2.5\unpyc
copying unpyc\__init__.py -> build\lib.win32-2.5\unpyc
running build_ext
building 'unpyc/marshal_25' extension
creating build\temp.win32-2.5
creating build\temp.win32-2.5\Release
creating build\temp.win32-2.5\Release\unpyc
f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c /
nologo /Ox /MD /W3 /GX /DNDEBU
G -IF:\Python25\include -IF:\Python25\PC /Tcunpyc/marshal_25.c /Fobuild
\temp.win32-2.5\Release\unpyc
/marshal_25.obj
marshal_25.c
unpyc\marshal_25.c(401) : warning C4273: 'PyMarshal_WriteLongToFile' :
inconsistent dll linkage
unpyc\marshal_25.c(413) : warning C4273:
'PyMarshal_WriteObjectToFile' : inconsistent dll linkage
unpyc\marshal_25.c(1004) : warning C4273:
'PyMarshal_ReadShortFromFile' : inconsistent dll linkage
unpyc\marshal_25.c(1015) : warning C4273:
'PyMarshal_ReadLongFromFile' : inconsistent dll linkage
unpyc\marshal_25.c(1044) : warning C4273:
'PyMarshal_ReadLastObjectFromFile' : inconsistent dll link
age
unpyc\marshal_25.c(1087) : warning C4273:
'PyMarshal_ReadObjectFromFile' : inconsistent dll linkage
unpyc\marshal_25.c(1101) : warning C4273:
'PyMarshal_ReadObjectFromString' : inconsistent dll linkag
e
unpyc\marshal_25.c(1116) : warning C4273:
'PyMarshal_WriteObjectToString' : inconsistent dll linkage

f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\link.exe /
DLL /nologo /INCREMENTAL:NO /LI
BPATH:F:\Python25\libs /LIBPATH:F:\Python25\PCBuild /EXPORT:initunpyc/
marshal_25 build\temp.win32-2.
5\Release\unpyc/marshal_25.obj /OUT:build\lib.win32-2.5\unpyc/
marshal_25.pyd /IMPLIB:build\temp.win3
2-2.5\Release\unpyc\marshal_25.lib
marshal_25.obj : error LNK2001: unresolved external symbol initunpyc/
marshal_25
build\temp.win32-2.5\Release\unpyc\marshal_25.lib : fatal error
LNK1120: 1 unresolved externals
LINK : fatal error LNK1141: failure during build of exports file
error: command '"f:\Program Files\Microsoft Visual Studio .NET
2003\Vc7\bin\link.exe"' failed with e
xit status 1141

2Using command:   python setup.py build -c mingw32

F:\unpyc>python setup.py build -c mingw32
running build
running build_py
running build_ext
building 'unpyc/marshal_25' extension
F:\mingw\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IF:\Python25\include -
IF:\Python25\PC -c unpyc/mars
hal_25.c -o build\temp.win32-2.5\Release\unpyc\marshal_25.o
unpyc/marshal_25.c:1087: warning: 'PyMarshal_ReadObjectFromFile'
defined locally after being referen
ced with dllimport linkage
unpyc/marshal_25.c:1101: warning: 'PyMarshal_ReadObjectFromString'
defined locally after being refer
enced with dllimport linkage
writing build\temp.win32-2.5\Release\unpyc\marshal_25.def
F:\mingw\bin\gcc.exe -mno-cygwin -shared -s build
\temp.win32-2.5\Release\unpyc\marshal_25.o build\te
mp.win32-2.5\Release\unpyc\marshal_25.def -LF:\Python25\libs -LF:
\Python25\PCBuild -lpython25 -lmsvc
r71 -o build\lib.win32-2.5\unpyc/marshal_25.pyd
F:\Python25\libs/libpython25.a(dcbbs00336.o):(.text+0x0): multiple
definition of `PyMarshal_ReadObje
ctFromStr

Re: failed to build decompyle/unpyc project on WindowsXP

2009-06-12 Thread higer
On Jun 12, 4:55 pm, higer  wrote:
> Maybe everyone know that decompyle(hosted on SourceForge.net) is a
> tool to transfer a .pyc file to .py file and now it does only support
> Python 2.3 or the below. I have found a project named unpyc which can
> support Python version 2.5. Unpyc project is build on decompyle which
> is hosted on google code and if you want you can download it.
>
> I build unpyc on Ubuntu successfully and can run it ok. But with some
> purpose, I just want to use this tool on my WindowsXP, so I tried to
> build it. I have tried many times and methods, with .net2003 or
> MingGW, but I failed. So,I come here looking for sombody can help me.I
> will give the showing error message with different method on the
> following:
>
> 1     Using command : python setup.py install
> F:\unpyc>python setup.py install
> running install
> running build
> running build_py
> creating build\lib.win32-2.5
> creating build\lib.win32-2.5\unpyc
> copying unpyc\dis_15.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_16.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_20.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_21.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_22.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_23.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_24.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_25.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_26.py -> build\lib.win32-2.5\unpyc
> copying unpyc\dis_files.py -> build\lib.win32-2.5\unpyc
> copying unpyc\magics.py -> build\lib.win32-2.5\unpyc
> copying unpyc\marshal_files.py -> build\lib.win32-2.5\unpyc
> copying unpyc\opcode_23.py -> build\lib.win32-2.5\unpyc
> copying unpyc\opcode_24.py -> build\lib.win32-2.5\unpyc
> copying unpyc\opcode_25.py -> build\lib.win32-2.5\unpyc
> copying unpyc\opcode_26.py -> build\lib.win32-2.5\unpyc
> copying unpyc\Parser.py -> build\lib.win32-2.5\unpyc
> copying unpyc\Scanner.py -> build\lib.win32-2.5\unpyc
> copying unpyc\spark.py -> build\lib.win32-2.5\unpyc
> copying unpyc\verify.py -> build\lib.win32-2.5\unpyc
> copying unpyc\Walker.py -> build\lib.win32-2.5\unpyc
> copying unpyc\__init__.py -> build\lib.win32-2.5\unpyc
> running build_ext
> building 'unpyc/marshal_25' extension
> creating build\temp.win32-2.5
> creating build\temp.win32-2.5\Release
> creating build\temp.win32-2.5\Release\unpyc
> f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c /
> nologo /Ox /MD /W3 /GX /DNDEBU
> G -IF:\Python25\include -IF:\Python25\PC /Tcunpyc/marshal_25.c /Fobuild
> \temp.win32-2.5\Release\unpyc
> /marshal_25.obj
> marshal_25.c
> unpyc\marshal_25.c(401) : warning C4273: 'PyMarshal_WriteLongToFile' :
> inconsistent dll linkage
> unpyc\marshal_25.c(413) : warning C4273:
> 'PyMarshal_WriteObjectToFile' : inconsistent dll linkage
> unpyc\marshal_25.c(1004) : warning C4273:
> 'PyMarshal_ReadShortFromFile' : inconsistent dll linkage
> unpyc\marshal_25.c(1015) : warning C4273:
> 'PyMarshal_ReadLongFromFile' : inconsistent dll linkage
> unpyc\marshal_25.c(1044) : warning C4273:
> 'PyMarshal_ReadLastObjectFromFile' : inconsistent dll link
> age
> unpyc\marshal_25.c(1087) : warning C4273:
> 'PyMarshal_ReadObjectFromFile' : inconsistent dll linkage
> unpyc\marshal_25.c(1101) : warning C4273:
> 'PyMarshal_ReadObjectFromString' : inconsistent dll linkag
> e
> unpyc\marshal_25.c(1116) : warning C4273:
> 'PyMarshal_WriteObjectToString' : inconsistent dll linkage
>
> f:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\link.exe /
> DLL /nologo /INCREMENTAL:NO /LI
> BPATH:F:\Python25\libs /LIBPATH:F:\Python25\PCBuild /EXPORT:initunpyc/
> marshal_25 build\temp.win32-2.
> 5\Release\unpyc/marshal_25.obj /OUT:build\lib.win32-2.5\unpyc/
> marshal_25.pyd /IMPLIB:build\temp.win3
> 2-2.5\Release\unpyc\marshal_25.lib
> marshal_25.obj : error LNK2001: unresolved external symbol initunpyc/
> marshal_25
> build\temp.win32-2.5\Release\unpyc\marshal_25.lib : fatal error
> LNK1120: 1 unresolved externals
> LINK : fatal error LNK1141: failure during build of exports file
> error: command '"f:\Program Files\Microsoft Visual Studio .NET
> 2003\Vc7\bin\link.exe"' failed with e
> xit status 1141
>
> 2    Using command:   python setup.py build -c mingw32
>
> F:\unpyc>python setup.py build -c mingw32
> running build
> running build_py
> running build_ext
> building 'unpyc/marshal_25' extension
> F:\mingw\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IF:\Python25\include -
> IF:\Python25\PC -c unpyc/mars
> hal_25.c -o build\temp.w

question about a command like 'goto ' in Python's bytecode or it's just a compiler optimization?

2009-06-16 Thread higer
My Python version is 2.5.2; When I reading the bytecode of some pyc
file, I always found that there are many jump command from different
position,but to the same position. You can see this situation in
following code(this bytecode is just from one .pyc file and I don't
have its source .py file):

.
526 POP_TOP   ''
527 LOAD_FAST 'imeHandle'
530 LOAD_ATTR 'isCnInput'
533 CALL_FUNCTION_0   ''
536 JUMP_IF_FALSE '574'
539 POP_TOP   ''
540 LOAD_FAST 'GUIDefine'
543 LOAD_ATTR 'CandidateIsOpen'
546 JUMP_IF_TRUE  '574'
549 POP_TOP   ''
550 LOAD_FAST 'GUIDefine'
553 LOAD_ATTR 'CompositionWndIsOpen'
556 JUMP_IF_TRUE  '574'
559 POP_TOP   ''
560 LOAD_FAST 'isWanNengWB'
563 JUMP_IF_FALSE '574'
566 POP_TOP   ''
567 LOAD_FAST 'state'
570 LOAD_CONST1
573 BINARY_AND''
574_0   COME_FROM ''
574_1   COME_FROM ''
574_2   COME_FROM ''
574_3   COME_FROM ''
...

>From the above bytecode,we know that line 574 is the point that many
position jumps to.So,it just looks like the 'goto' function in C, but
we know that there is none such function in Python.
One 'JUMP**' command is companied with a 'COME_FROM' command,so more
than one 'COME_FROM' OPs are listed on line 574...

But ,the question is, I have tried a lot of ways(e.g.for loop,while
loop and mixed) to re-present 'goto' style bytecodes like this, but
the result depressed me.
So,I think maybe it is just a compiler optimization in Python2.5? I'm
not sure,so I'm appreciated that if anyone can help me.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question about a command like 'goto ' in Python's bytecode orit's just a compiler optimization?

2009-06-17 Thread higer
Hi,all:

I'm sorry that I did not make my question clear. What I mean is that
what the souce code would look like that will be compiled to such
bytecodes.


Regards,
higer
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: question about a command like 'goto ' in Python's bytecode or it's just a compiler optimization?

2009-06-17 Thread higer
On Jun 17, 8:29 pm, John Machin  wrote:
> On Jun 17, 1:40 pm, higer  wrote:
>
> > My Python version is 2.5.2; When I reading the bytecode of some pyc
> > file, I always found that there are many jump command from different
> > position,but to the same position. You can see this situation in
> > following code(this bytecode is just from one .pyc file and I don't
> > have its source .py file):
>
> Why don't you (a) read the answers you got on stackoverflow to the
> identical question (b) WRITE some code instead of inspecting the
> entrails of the code of others?

Thanks, I read the answer just now.
And thank everbody for your suggestion!
-- 
http://mail.python.org/mailman/listinfo/python-list