Re: a simple unicode question

2009-10-28 Thread Tim Arnold
"Chris Jones" wrote in message news:mailman.2149.1256707687.2807.python-l...@python.org... > On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: >> Chris Jones wrote: > > [..] > >>> Best part of Unicode is that there are multiple encodings, right? ;-) >> >> No, the best part about Unicode is

Re: a simple unicode question

2009-10-28 Thread Gabriel Genellina
En Wed, 28 Oct 2009 02:28:01 -0300, Chris Jones escribió: On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: Chris Jones wrote: Best part of Unicode is that there are multiple encodings, right? ;-) No, the best part about Unicode is there is no encoding! Unicode does not define any enco

Re: a simple unicode question

2009-10-27 Thread Chris Jones
On Tue, Oct 27, 2009 at 06:21:11AM EDT, Lie Ryan wrote: > Chris Jones wrote: [..] >> Best part of Unicode is that there are multiple encodings, right? ;-) > > No, the best part about Unicode is there is no encoding! > Unicode does not define any encoding; RFC 3629: "ISO/IEC 10646 and Unicode

Re: a simple unicode question

2009-10-27 Thread Lie Ryan
Chris Jones wrote: On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote: [..] Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: I knew nothing about UTF-16 & friends before this thre

Re: a simple unicode question

2009-10-22 Thread Gabriel Genellina
En Thu, 22 Oct 2009 17:08:21 -0300, escribió: On 10/22/2009 03:23 AM, Gabriel Genellina wrote: En Wed, 21 Oct 2009 15:14:32 -0300, escribió: On Oct 21, 4:59 am, Bruno Desthuilliers wrote: beSTEfar a écrit : (snip) > When parsing strings, use Regular Expressions. And now you have _two_ p

Re: a simple unicode question

2009-10-22 Thread rurpy
On 10/22/2009 03:23 AM, Gabriel Genellina wrote: > En Wed, 21 Oct 2009 15:14:32 -0300, escribió: > >> On Oct 21, 4:59 am, Bruno Desthuilliers > 42.desthuilli...@websiteburo.invalid> wrote: >>> beSTEfar a écrit : >>> (snip) >>> > When parsing strings, use Regular Expressions. >>> >>> And now you h

Re: a simple unicode question

2009-10-22 Thread Chris Jones
On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote: [..] > Characters outside the 16-bit range aren't supported on all builds. > They won't be supported on most Windows builds, as Windows uses 16-bit > Unicode extensively: I knew nothing about UTF-16 & friends before this thread. Best part of

Re: a simple unicode question

2009-10-22 Thread Gabriel Genellina
En Wed, 21 Oct 2009 15:14:32 -0300, escribió: On Oct 21, 4:59 am, Bruno Desthuilliers wrote: beSTEfar a écrit : (snip) > When parsing strings, use Regular Expressions. And now you have _two_ problems For some simple parsing problems, Python's string methods are powerful enough to make REs

Re: a simple unicode question

2009-10-21 Thread Terry Reedy
Nobody wrote: Just curious, why did you choose to set the upper boundary at 0x? Characters outside the 16-bit range aren't supported on all builds. They won't be supported on most Windows builds, as Windows uses 16-bit Unicode extensively: Python 2.5.1 (r251:54863, Apr 18 2007, 08

Re: a simple unicode question

2009-10-21 Thread rurpy
On Oct 21, 4:59 am, Bruno Desthuilliers wrote: > beSTEfar a écrit : > (snip) >  > When parsing strings, use Regular Expressions. > > And now you have _two_ problems > > For some simple parsing problems, Python's string methods are powerful > enough to make REs overkill. And for any complex enough

Re: a simple unicode question

2009-10-21 Thread Nobody
On Wed, 21 Oct 2009 05:16:56 -0400, Chris Jones wrote: >> > Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? >> >> You can get them from the unicodedata module, e.g.: >> >> import unicodedata >> for i in xrange(0x1): >>n = unicodedata.name(unichr(i),None) >>

Re: a simple unicode question

2009-10-21 Thread Bruno Desthuilliers
beSTEfar a écrit : (snip) > When parsing strings, use Regular Expressions. And now you have _two_ problems For some simple parsing problems, Python's string methods are powerful enough to make REs overkill. And for any complex enough parsing (any recursive construct for example - think XML, H

Re: a simple unicode question

2009-10-21 Thread Chris Jones
On Wed, Oct 21, 2009 at 12:20:35AM EDT, Nobody wrote: > On Tue, 20 Oct 2009 17:56:21 +, George Trojan wrote: [..] > > Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? > > You can get them from the unicodedata module, e.g.: > > import unicodedata > for i in xrange(0x100

Re: a simple unicode question

2009-10-21 Thread Scott David Daniels
George Trojan wrote: Scott David Daniels wrote: ... And if you are unsure of the name to use: >>> import unicodedata >>> unicodedata.name(u'\xb0') 'DEGREE SIGN' > Thanks for all suggestions. It took me a while to find out how to > configure my keyboard to be able to type the degree sign. I

Re: a simple unicode question

2009-10-20 Thread Mark Tolonen
"George Trojan" wrote in message news:hbktk6$8b...@news.nems.noaa.gov... Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e. u'\N{DEGREE SIGN}') d

Re: a simple unicode question

2009-10-20 Thread Martin v. Löwis
> Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found > http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt > Is that the place to look? Correct - you are supposed to fill in a Unicode character name into the \N escape. The specific list of names depends on the version of the UCD

Re: a simple unicode question

2009-10-20 Thread Nobody
On Tue, 20 Oct 2009 17:56:21 +, George Trojan wrote: > Thanks for all suggestions. It took me a while to find out how to > configure my keyboard to be able to type the degree sign. I prefer to > stick with pure ASCII if possible. > Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I

Re: a simple unicode question

2009-10-20 Thread George Trojan
Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt Is

Re: a simple unicode question

2009-10-20 Thread Scott David Daniels
Mark Tolonen wrote: Is there a better way of getting the degrees? It seems your string is UTF-8. \xc2\xb0 is UTF-8 for DEGREE SIGN. If you type non-ASCII characters in source code, make sure to declare the encoding the file is *actually* saved in: # coding: utf-8 s = '''48° 13' 16.80" N'

Re: a simple unicode question

2009-10-19 Thread Mark Tolonen
"George Trojan" wrote in message news:hbidd7$i9...@news.nems.noaa.gov... A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80" N'''. I know the charset is "iso-8859-1". To get the degrees I did >>> encoding='iso-8859-1' >>> q=s

Re: a simple unicode question

2009-10-19 Thread Mark Tolonen
"George Trojan" wrote in message news:hbidd7$i9...@news.nems.noaa.gov... A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80" N'''. I know the charset is "iso-8859-1". To get the degrees I did >>> encoding='iso-8859-1' >>> q=s

Re: a simple unicode question

2009-10-19 Thread beSTEfar
On 19 Okt, 21:07, George Trojan wrote: > A trivial one, this is the first time I have to deal with Unicode. I am > trying to parse a string s='''48° 13' 16.80" N'''. I know the charset is > "iso-8859-1". To get the degrees I did >  >>> encoding='iso-8859-1' >  >>> q=s.decode(encoding) >  >>> q.spl

Re: a simple unicode question

2009-10-19 Thread Diez B. Roggisch
George Trojan schrieb: A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80" N'''. I know the charset is "iso-8859-1". To get the degrees I did >>> encoding='iso-8859-1' >>> q=s.decode(encoding) >>> q.split() [u'48\xc2\xb0', u"13

a simple unicode question

2009-10-19 Thread George Trojan
A trivial one, this is the first time I have to deal with Unicode. I am trying to parse a string s='''48° 13' 16.80" N'''. I know the charset is "iso-8859-1". To get the degrees I did >>> encoding='iso-8859-1' >>> q=s.decode(encoding) >>> q.split() [u'48\xc2\xb0', u"13'", u'16.80"', u'N'] >>> r=