encoding problem

2008-12-19 Thread digisat...@gmail.com
The below snippet code generates UnicodeDecodeError.
#!/usr/bin/env python
#--*-- coding: utf-8 --*--
s = 'äöü'
u = unicode(s)


It seems that the system use the default encoding- ASCII to decode the
utf8 encoded string literal, and thus generates the error.

The question is why the Python interpreter use the default encoding
instead of "utf-8", which I explicitly declared in the source.
--
http://mail.python.org/mailman/listinfo/python-list


Re: encoding problem

2008-12-19 Thread digisat...@gmail.com
On 12月19日, 下午9时34分, Marc 'BlackJack' Rintsch  wrote:
> On Fri, 19 Dec 2008 04:05:12 -0800, digisat...@gmail.com wrote:
> > The below snippet code generates UnicodeDecodeError.
> > #!/usr/bin/env
> > python
> > #--*-- coding: utf-8 --*--
> > s = 'äöü'
> > u = unicode(s)
>
> > It seems that the system use the default encoding- ASCII to decode the
> > utf8 encoded string literal, and thus generates the error.
>
> > The question is why the Python interpreter use the default encoding
> > instead of "utf-8", which I explicitly declared in the source.
>
> Because the declaration is only for decoding unicode literals in that
> very source file.
>
> Ciao,
>         Marc 'BlackJack' Rintsch

Thanks for the answer.
I believe the declaration is not only for unicode literals, it is for
all literals in the source even including Comments. we can try runing
a source file without encoding declaration and have only 1 line of
comments with non-ASCII characters. That will arise a Syntax error and
bring me to the pep263 URL.

I read the pep263 and quoted below:

 Python's tokenizer/compiler combo will need to be updated to work as
follows:
   1. read the file
   2. decode it into Unicode assuming a fixed per-file encoding
   3. convert it into a UTF-8 byte string
   4. tokenize the UTF-8 content
   5. compile it, creating Unicode objects from the given Unicode
data
  and creating string objects from the Unicode literal data
  by first reencoding the UTF-8 data into 8-bit string data
  using the given file encoding

The above described Python internal process indicate that the step 2
will utilise the specific encoding to decode all literals in source,
while in step5 will evolve a re-encoding with the specific encoding.

That is the reason why we have to explicitly declare a encoding as
long as we have non-ASCII in source.

Bruno answered why we need specify a encoding when decoding a byte
string with perfect explanation, Thank you very much.
--
http://mail.python.org/mailman/listinfo/python-list


expandtabs acts unexpectedly

2009-08-19 Thread digisat...@gmail.com
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> ' test\ttest'.expandtabs(4)
' test   test'
>>> 'test \ttest'.expandtabs(4)
'testtest'

1st example: expect returning 4 spaces between 'test', 3 spaces
returned
2nd example: expect returning 5 spaces between 'test', 4 spaces
returned

Is it a bug or something, please advice.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: expandtabs acts unexpectedly

2009-08-19 Thread digisat...@gmail.com
On Aug 19, 4:16 pm, Peter Brett  wrote:
> "digisat...@gmail.com"  writes:
> > Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> > [GCC 4.3.3] on linux2
> > Type "help", "copyright", "credits" or "license" for more information.
> >>>> ' test\ttest'.expandtabs(4)
> > ' test   test'
> >>>> 'test \ttest'.expandtabs(4)
> > 'test    test'
>
> > 1st example: expect returning 4 spaces between 'test', 3 spaces
> > returned
> > 2nd example: expect returning 5 spaces between 'test', 4 spaces
> > returned
>
> > Is it a bug or something, please advice.
>
> Consider where the 4-space tabstops are relative to those strings:
>
>  test   test
> test    test
> ^   ^   ^
>
> So no, it's not a bug.
>
> If you just want to replace the tab characters by spaces, use:
>
>   >>> " test\ttest".replace("\t", "    ")
>   ' test    test'
>   >>> "test \ttest".replace("\t", "    ")
>   'test     test'
>
> HTH,
>
>                                Peter
>
> --
> Peter Brett 
> Remote Sensing Research Group
> Surrey Space Centre

You corrected me for the understanding of tab stop. Great explanation.
Thank you so much.
-- 
http://mail.python.org/mailman/listinfo/python-list