EuGeNe Van den Bulke <[EMAIL PROTECTED]> wrote: > >>> import base64 > >>> base64.decode(file("hebrew.b64","r"),file("hebrew.lang","w")) > > It runs but the result is not correct: some of the lines in hebrew.lang > are correct but not all of them (hebrew.expected.lang is the correct > file). I guess it is a unicode problem but can't seem to find out how to > fix it.
My guess would be that your problem is that you wrote the file in text mode, so (assuming you are on windows) all newline characters in the output are converted to carriage return/linefeed pairs. However, the decoded text looks as though it is utf16 encoded so it should be written as binary. i.e. the output mode should be "wb". Simpler than using the base64 module you can just use the base64 codec. This will decode a string to a byte sequence and you can then decode that to get the unicode string: with file("hebrew.b64","r") as f: text = f.read().decode('base64').decode('utf16') You can then write the text to a file through any desired codec or process it first. BTW, you may just have shortened your example too much, but depending on python to close files for you is risky behaviour. If you get an exception thrown before the file goes out of scope it may not get closed when you expect and that can lead to some fairly hard to track problems. It is much better to either call the close method explicitly or to use Python 2.5's 'with' statement. -- http://mail.python.org/mailman/listinfo/python-list