Re: How do I display unicode value stored in a string variable using ord()

2012-08-22 Thread Hans Mulder
On 19/08/12 19:48:06, Paul Rubin wrote: > Terry Reedy writes: >> py> s = chr(0x + 1) >> py> a, b = s > That looks like a 3.2- narrow build. Such which treat unicode strings > as sequences of code units rather than sequences of codepoints. Not an > implementation bug, but compromise d

Re: How do I display unicode value stored in a string variable using ord()

2012-08-21 Thread Neil Hodgson
Steven D'Aprano: Using variable-sized strings like UTF-8 and UTF-16 for in-memory representations is a terrible idea because you can't assume that people will only every want to index the first or last character. On average, you need to scan half the string, one character at a time. In Big-Oh, w

Re: How do I display unicode value stored in a string variable using ord()

2012-08-20 Thread Piet van Oostrum
"Blind Anagram" writes: > This is an average slowdown by a factor of close to 2.3 on 3.3 when > compared with 3.2. > > I am not posting this to perpetuate this thread but simply to ask > whether, as you suggest, I should report this as a possible problem with > the beta? Being a beta release, is

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Paul Rubin
Steven D'Aprano writes: > Paul Rubin already told you about his experience using OCR to generate > multiple terrabytes of text, and how he would not be happy if that was > stored in UCS-4. That particular text was stored on disk as compressed XML that had UTF-8 in the data fields, but I think R

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread rusi
On Aug 19, 11:11 pm, wxjmfa...@gmail.com wrote: > Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit : > > > > > But they are not ascii pages, they are (as stated) MOSTLY ascii. > > > E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses > > > a much more memory-expensive enco

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Mon, 20 Aug 2012 00:44:22 -0400, Roy Smith wrote: > In article <5031bb2f$0$29972$c3e8da3$54964...@news.astraweb.com>, > Steven D'Aprano wrote: > >> > So it may be with utf-8 someday. >> >> Only if you believe that people's ability to generate data will remain >> lower than people's ability

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Roy Smith
In article <5031bb2f$0$29972$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > > So it may be with utf-8 someday. > > Only if you believe that people's ability to generate data will remain > lower than people's ability to install more storage. We're not talking *data*, we're talki

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 19:24:30 -0400, Roy Smith wrote: > In the primordial days of computing, using 8 bits to store a character > was a profligate waste of memory. What on earth did people need with > TWO cases of the alphabet That's obvious, surely? We need two cases so that we can distinguish

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Chris Angelico
On Mon, Aug 20, 2012 at 10:35 AM, Terry Reedy wrote: > On 8/19/2012 6:42 PM, Chris Angelico wrote: >> However, Python goes a bit further by making it VERY clear that this >> is a mere optimization, and that Unicode strings and bytes strings are >> completely different beasts. In Pike, it's possibl

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Terry Reedy
On 8/19/2012 6:42 PM, Chris Angelico wrote: On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy wrote: Python has often copied or borrowed, with adjustments. This time it is the first. I should have added 'that I know of' ;-) Maybe it wasn't consciously borrowed, but whatever innovation is done,

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread 88888 Dihedral
On Monday, August 20, 2012 1:03:34 AM UTC+8, Blind Anagram wrote: > "Steven D'Aprano" wrote in message > > news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com... > > > > On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote: > > > > [...] > > If you can consistently replicate a 100% to

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Roy Smith
In article , Chris Angelico wrote: > Really, the only viable alternative to PEP 393 is a fixed 32-bit > representation - it's the only way that's guaranteed to provide > equivalent semantics. The new storage format is guaranteed to take no > more memory than that, and provide equivalent function

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Chris Angelico
On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy wrote: > On 8/19/2012 4:04 AM, Paul Rubin wrote: >> I realize the folks who designed and implemented PEP 393 are very smart >> cookies and considered stuff carefully, while I'm just an internet user >> posting an immediate impression of something I hadn

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Terry Reedy
On 8/19/2012 2:11 PM, wxjmfa...@gmail.com wrote: Well, it seems some software producers know what they are doing. '€'.encode('cp1252') b'\x80' '€'.encode('mac-roman') b'\xdb' '€'.encode('iso-8859-1') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'latin-1'

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Terry Reedy
On 8/19/2012 1:03 PM, Blind Anagram wrote: Running Python from a Windows command prompt, I got the following on Python 3.2.3 and 3.3 beta 2: python33\python" -m timeit "('abc' * 1000).replace('c', 'de')" 1 loops, best of 3: 39.3 usec per loop python33\python" -m timeit "('ab…' * 1000).repl

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 18:03:34 +0100, Blind Anagram wrote: > "Steven D'Aprano" wrote in message > news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com... > > > If you can consistently replicate a 100% to 1000% slowdown in string > > handling, please report it as a performance bug: > > > > htt

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 11:50:12 -0600, Ian Kelly wrote: > On Sun, Aug 19, 2012 at 12:33 AM, Steven D'Aprano > wrote: [...] >> The PEP explicitly states that it only uses a 1-byte format for ASCII >> strings, not Latin-1: > > I think you misunderstand the PEP then, because that is empirically > fals

Abuse of Big Oh notation [was Re: How do I display unicode value stored in a string variable using ord()]

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 10:48:06 -0700, Paul Rubin wrote: > Terry Reedy writes: >> I would call it O(k), where k is a selectable constant. Slowing access >> by a factor of 100 is hardly acceptable to me. > > If k is constant then O(k) is the same as O(1). That is how O notation > works. You might

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Paul Rubin
Ian Kelly writes: print (type(bytes(range(256)).decode('latin1'))) > Thanks. -- http://mail.python.org/mailman/listinfo/python-list

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Mark Lawrence
On 19/08/2012 19:11, wxjmfa...@gmail.com wrote: Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit : But they are not ascii pages, they are (as stated) MOSTLY ascii. E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses a much more memory-expensive encoding than UTF-8.

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Ian Kelly
On Sun, Aug 19, 2012 at 11:50 AM, Ian Kelly wrote: > Note that this only describes the structure of "compact" string > objects, which I have to admit I do not fully understand from the PEP. > The wording suggests that it only uses the PyASCIIObject structure, > not the derived structures. It the

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Ian Kelly
On Sun, Aug 19, 2012 at 12:20 PM, Paul Rubin wrote: > Ian Kelly writes: > sys.getsizeof(bytes(range(256)).decode('latin1')) >> 329 > > Please try: > >print (type(bytes(range(256)).decode('latin1'))) > > to make sure that what comes back is actually a unicode string rather > than a byte st

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Paul Rubin
Ian Kelly writes: sys.getsizeof(bytes(range(256)).decode('latin1')) > 329 Please try: print (type(bytes(range(256)).decode('latin1'))) to make sure that what comes back is actually a unicode string rather than a byte string. -- http://mail.python.org/mailman/listinfo/python-list

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit : > > > But they are not ascii pages, they are (as stated) MOSTLY ascii. > > E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses > > a much more memory-expensive encoding than UTF-8. > > Imagine an us banking applicat

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Dave Angel
On 08/19/2012 01:03 PM, Blind Anagram wrote: > "Steven D'Aprano" wrote in message > news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com... > > On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote: > > [...] > If you can consistently replicate a 100% to 1000% slowdown in string > handling, plea

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Blind Anagram
wrote in message news:5dfd1779-9442-4858-9161-8f1a06d56...@googlegroups.com... Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit : "Steven D'Aprano" wrote in message news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com... On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Ian Kelly
On Sun, Aug 19, 2012 at 12:33 AM, Steven D'Aprano wrote: > On Sat, 18 Aug 2012 09:51:37 -0600, Ian Kelly wrote about PEP 393: >> There is some additional benefit for Latin-1 users, but this has nothing >> to do with Python. If Python is going to have the option of a 1-byte >> representation (and

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Paul Rubin
Terry Reedy writes: >> Meanwhile, an example of the 393 approach failing: > I am completely baffled by this, as this example is one where the 393 > approach potentially wins. What? The 393 approach is supposed to avoid memory bloat and that does the opposite. >> I was involved in a project that

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit : > "Steven D'Aprano" wrote in message > > news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com... > > > > On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote: > > > > [...] > > If you can consistently replicate a 100% to

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Terry Reedy
On 8/19/2012 4:04 AM, Paul Rubin wrote: Meanwhile, an example of the 393 approach failing: I am completely baffled by this, as this example is one where the 393 approach potentially wins. I was involved in a project that dealt with terabytes of OCR data of mostly English text. So the char

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Blind Anagram
"Steven D'Aprano" wrote in message news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com... On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote: [...] If you can consistently replicate a 100% to 1000% slowdown in string handling, please report it as a performance bug: http://bugs.python.or

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Terry Reedy
On 8/19/2012 4:54 AM, wxjmfa...@gmail.com wrote: About the exemples contested by Steven: eg: timeit.timeit("('ab…' * 10).replace('…', 'œ…')") And it is good enough to show the problem. Period. Repeating a false claim over and over does not make it true. Two people on pydev claim that 3.3 is *f

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread DJC
On 19/08/12 15:25, Steven D'Aprano wrote: Not necessarily. Presumably you're scanning each page into a single string. Then only the pages containing a supplementary plane char will be bloated, which is likely to be rare. Especially since I don't expect your OCR application would recognise many n

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 01:04:25 -0700, Paul Rubin wrote: > Steven D'Aprano writes: >> This standard data structure is called UCS-2 ... There's an extension >> to UCS-2 called UTF-16 > > My own understanding is UCS-2 simply shouldn't be used any more. Pretty much. But UTF-16 with lax support for

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 01:11:56 -0700, Paul Rubin wrote: > Steven D'Aprano writes: >> result = text[end:] > > if end not near the end of the original string, then this is O(N) even > with fixed-width representation, because of the char copying. Technically, yes. But it's a straight copy of a c

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread lipska the kat
On 19/08/12 11:19, Chris Angelico wrote: On Sun, Aug 19, 2012 at 8:13 PM, lipska the kat wrote: The date stamp is different but the Python version is the same Check out what 'sys.maxunicode' is in each of those Pythons. It's possible that one is a wide build and the other narrow. Ah ... I

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Mark Lawrence
On 19/08/2012 09:54, wxjmfa...@gmail.com wrote: About the exemples contested by Steven: eg: timeit.timeit("('ab…' * 10).replace('…', 'œ…')") And it is good enough to show the problem. Period. The rest (you have to do this, you should not do this, why are you using these characters - amazing an

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Chris Angelico
On Sun, Aug 19, 2012 at 8:13 PM, lipska the kat wrote: > The date stamp is different but the Python version is the same Check out what 'sys.maxunicode' is in each of those Pythons. It's possible that one is a wide build and the other narrow. ChrisA -- http://mail.python.org/mailman/listinfo/pyt

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread lipska the kat
On 19/08/12 07:09, Steven D'Aprano wrote: This is a long post. If you don't feel like reading an essay, skip to the very bottom and read my last few paragraphs, starting with "To recap". Thank you for this excellent post, it has certainly cleared up a few things for me [snip] incidentally >

Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 10:56:36 UTC+2, Steven D'Aprano a écrit : > > internal implementation, and strings which fit exactly in Latin-1 will > And this is the crucial point. latin-1 is an obsolete and non usable coding scheme (esp. for european languages). We fall on the point I mentionned ab

Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 09:43:13 +0200, Peter Otten wrote: > Steven D'Aprano wrote: >> I don't know where people are getting this myth that PEP 393 uses >> Latin-1 internally, it does not. Read the PEP, it explicitly states >> that 1-byte formats are only used for ASCII strings. > > From > > Python

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
About the exemples contested by Steven: eg: timeit.timeit("('ab…' * 10).replace('…', 'œ…')") And it is good enough to show the problem. Period. The rest (you have to do this, you should not do this, why are you using these characters - amazing and stupid question -) does not count. The real pro

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Paul Rubin
Chris Angelico writes: > And of course, taking the *entire* rest of the string isn't the only > thing you do. What if you want to take the next six characters after > that index? That would be constant time with a fixed-width storage > format. How often is this an issue in practice? I wonder how

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Chris Angelico
On Sun, Aug 19, 2012 at 6:11 PM, Paul Rubin wrote: > Steven D'Aprano writes: >> result = text[end:] > > if end not near the end of the original string, then this is O(N) > even with fixed-width representation, because of the char copying. > > if it is near the end, by knowing where the string

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Paul Rubin
Steven D'Aprano writes: > result = text[end:] if end not near the end of the original string, then this is O(N) even with fixed-width representation, because of the char copying. if it is near the end, by knowing where the string data area ends, I think it should be possible to scan backward

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Paul Rubin
Steven D'Aprano writes: > This is a long post. If you don't feel like reading an essay, skip to the > very bottom and read my last few paragraphs, starting with "To recap". I'm very flattered that you took the trouble to write that excellent exposition of different Unicode encodings in response

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sat, 18 Aug 2012 19:35:44 -0700, Paul Rubin wrote: > Scanning 4 characters (or a few dozen, say) to peel off a token in > parsing a UTF-8 string is no big deal. It gets more expensive if you > want to index far more deeply into the string. I'm asking how often > that is done in real code. It

New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Peter Otten
Steven D'Aprano wrote: > On Sat, 18 Aug 2012 19:34:50 +0100, MRAB wrote: > >> "a" will be stored as 1 byte/codepoint. >> >> Adding "é", it will still be stored as 1 byte/codepoint. > > Wrong. It will be 2 bytes, just like it already is in Python 3.2. > > I don't know where people are getting t

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sat, 18 Aug 2012 19:59:32 +0100, MRAB wrote: > The problem with strings containing surrogate pairs is that you could > inadvertently slice the string in the middle of the surrogate pair. That's the *least* of the problems with surrogate pairs. That would be easy to fix: check the point of the

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2012 19:34:50 +0100, MRAB wrote: > "a" will be stored as 1 byte/codepoint. > > Adding "é", it will still be stored as 1 byte/codepoint. Wrong. It will be 2 bytes, just like it already is in Python 3.2. I don't know where people are getting this myth that PEP 393 uses Latin-1 int

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2012 09:51:37 -0600, Ian Kelly wrote about PEP 393: > The change does not just benefit ASCII users. It primarily benefits > anybody using a wide unicode build with strings mostly containing only > BMP characters. Just to be clear: If you have many strings which are *mostly* BMP,

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2012 11:05:07 -0700, wxjmfauth wrote: > As I understand (I think) the undelying mechanism, I can only say, it is > not a surprise that it happens. > > Imagine an editor, I type an "a", internally the text is saved as ascii, > then I type en "é", the text can only be saved in at lea

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2012 11:30:19 -0700, wxjmfauth wrote: >> > I'm aware of this (and all the blah blah blah you are explaining). >> > This always the same song. Memory. >> >> >> >> Exactly. The reason it is always the same song is because it is an >> important song. >> >> > No offense here. But t

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Steven D'Aprano
This is a long post. If you don't feel like reading an essay, skip to the very bottom and read my last few paragraphs, starting with "To recap". On Sat, 18 Aug 2012 11:26:21 -0700, Paul Rubin wrote: > Steven D'Aprano writes: >> (There is an extension to UCS-2, UTF-16, which encodes non-BMP >>

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Paul Rubin
Chris Angelico writes: > Generally, I'm working with pure ASCII, but port those same algorithms > to Python and you'll easily be able to read in a file in some known > encoding and manipulate it as Unicode. If it's pure ASCII, you can use the bytes or bytearray type. > It's not so much 'random

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Chris Angelico
On Sun, Aug 19, 2012 at 1:10 PM, Paul Rubin wrote: > Chris Angelico writes: >> I don't have a Python example of parsing a huge string, but I've done >> it in other languages, and when I can depend on indexing being a cheap >> operation, I'll happily do exactly that. > > I'd be interested to know

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Paul Rubin
Chris Angelico writes: > Sure, four characters isn't a big deal to step through. But it still > makes indexing and slicing operations O(N) instead of O(1), plus you'd > have to zark the whole string up to where you want to work. I know some systems chop the strings into blocks of (say) a few hund

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Terry Reedy
On 8/18/2012 4:09 PM, Terry Reedy wrote: print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # .6 in 3.2.3, 1.2 in 3.3.0 This does not make sense to me and I will ask about it. I did ask on pydef list and paraphrased responses include: 1. 'My system gives opposite ratios.' 2. 'With a default

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Chris Angelico
On Sun, Aug 19, 2012 at 12:35 PM, Paul Rubin wrote: > Chris Angelico writes: > "asdfqwer"[4:] >> 'qwer' >> >> That's a not uncommon operation when parsing strings or manipulating >> data. You'd need to completely rework your algorithms to maintain a >> position somewhere. > > Scanning 4 chara

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Paul Rubin
Chris Angelico writes: "asdfqwer"[4:] > 'qwer' > > That's a not uncommon operation when parsing strings or manipulating > data. You'd need to completely rework your algorithms to maintain a > position somewhere. Scanning 4 characters (or a few dozen, say) to peel off a token in parsing a UTF

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Chris Angelico
On Sun, Aug 19, 2012 at 12:11 PM, Paul Rubin wrote: > Chris Angelico writes: >> UTF-8 is highly inefficient for indexing. Given a buffer of (say) a >> few thousand bytes, how do you locate the 273rd character? > > How often do you need to do that, as opposed to traversing the string by > iteratio

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Paul Rubin
Chris Angelico writes: > UTF-8 is highly inefficient for indexing. Given a buffer of (say) a > few thousand bytes, how do you locate the 273rd character? How often do you need to do that, as opposed to traversing the string by iteration? Anyway, you could use a rope-like implementation, or an i

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Chris Angelico
On Sun, Aug 19, 2012 at 4:26 AM, Paul Rubin wrote: > Can you explain the issue of "breaking surrogate pairs apart" a little > more? Switching between encodings based on the string contents seems > silly at first glance. Strings are immutable so I don't understand why > not use UTF-8 or UTF-16 fo

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Mark Lawrence
On 18/08/2012 21:22, wxjmfa...@gmail.com wrote: Le samedi 18 août 2012 20:40:23 UTC+2, rusi a écrit : On Aug 18, 10:59 pm, Steven D'Aprano wrote: On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: Is there any reason why non ascii users are somehow penalized compared to ascii users?

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 20:40:23 UTC+2, rusi a écrit : > On Aug 18, 10:59 pm, Steven D'Aprano > +comp.lang.pyt...@pearwood.info> wrote: > > > On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: > > > > Is there any reason why non ascii users are somehow penalized compared > > > > to ascii user

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Terry Reedy
On 8/18/2012 12:38 PM, wxjmfa...@gmail.com wrote: Sorry guys, I'm not stupid (I think). I can open IDLE with Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is always slower. Period. You have not tried enough tests ;-). On my Win7-64 system: from timeit import timeit print(timeit("

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Mark Lawrence
On 18/08/2012 19:40, rusi wrote: On Aug 18, 10:59 pm, Steven D'Aprano wrote: On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: Is there any reason why non ascii users are somehow penalized compared to ascii users? Of course there is a reason. If you want to represent 1114111 different ch

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Mark Lawrence
On 18/08/2012 19:30, wxjmfa...@gmail.com wrote: Le samedi 18 août 2012 19:59:18 UTC+2, Steven D'Aprano a écrit : On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : [...] The problem with UCS-4 is that every character re

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread MRAB
On 18/08/2012 19:26, Paul Rubin wrote: Steven D'Aprano writes: (There is an extension to UCS-2, UTF-16, which encodes non-BMP characters using two code points. This is fragile and doesn't work very well, because string-handling methods can break the surrogate pairs apart, leaving you with inval

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread rusi
On Aug 18, 10:59 pm, Steven D'Aprano wrote: > On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: > > Is there any reason why non ascii users are somehow penalized compared > > to ascii users? > > Of course there is a reason. > > If you want to represent 1114111 different characters in a string,

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 19:59:18 UTC+2, Steven D'Aprano a écrit : > On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: > > > > > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : > > >> [...] > > >> The problem with UCS-4 is that every character requires four bytes. > > >> [..

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread MRAB
On 18/08/2012 19:05, wxjmfa...@gmail.com wrote: Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit : Proof that is acceptable to everybody please, not just yourself. I cann't, I'm only facing the fact it works slower on my Windows platform. As I understand (I think) the undelying

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Paul Rubin
Steven D'Aprano writes: > (There is an extension to UCS-2, UTF-16, which encodes non-BMP characters > using two code points. This is fragile and doesn't work very well, > because string-handling methods can break the surrogate pairs apart, > leaving you with invalid unicode string. Not good.) .

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit : > > Proof that is acceptable to everybody please, not just yourself. > > I cann't, I'm only facing the fact it works slower on my Windows platform. As I understand (I think) the undelying mechanism, I can only say, it is not a surpr

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : >> [...] >> The problem with UCS-4 is that every character requires four bytes. >> [...] > > I'm aware of this (and all the blah blah blah you are explaining). This > always the

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Mark Lawrence
On 18/08/2012 17:38, wxjmfa...@gmail.com wrote: Sorry guys, I'm not stupid (I think). I can open IDLE with Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is always slower. Period. Proof that is acceptable to everybody please, not just yourself. Now, the reason. I think it is due

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Chris Angelico
On Sun, Aug 19, 2012 at 2:38 AM, wrote: > Sorry guys, I'm not stupid (I think). I can open IDLE with > Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is > always slower. Period. Ah, but what about all those other operations that use strings under the covers? As mentioned, namespace l

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Sorry guys, I'm not stupid (I think). I can open IDLE with Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is always slower. Period. Now, the reason. I think it is due the "flexible represention". Deeper reason. The "boss" do not wish to hear from a (pure) ucs-4/utf-32 "engine" (this h

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Ian Kelly
On Sat, Aug 18, 2012 at 9:07 AM, wrote: > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : >> [...] >> The problem with UCS-4 is that every character requires four bytes. >> [...] > > I'm aware of this (and all the blah blah blah you are > explaining). This always the same song. M

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Chris Angelico
On Sun, Aug 19, 2012 at 1:07 AM, wrote: > I'm aware of this (and all the blah blah blah you are > explaining). This always the same song. Memory. > > Let me ask. Is Python an 'american" product for us-users > or is it a tool for everybody [*]? > Is there any reason why non ascii users are somehow

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Mark Lawrence
On 18/08/2012 16:07, wxjmfa...@gmail.com wrote: Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : [...] The problem with UCS-4 is that every character requires four bytes. [...] I'm aware of this (and all the blah blah blah you are explaining). This always the same song. Memory.

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Ian Kelly
(Resending this to the list because I previously sent it only to Steven by mistake. Also showing off a case where top-posting is reasonable, since this bit requires no context. :-) On Sat, Aug 18, 2012 at 1:41 AM, Ian Kelly wrote: > > On Aug 17, 2012 10:17 PM, "Steven D'Aprano" > wrote: >> >> U

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : > [...] > The problem with UCS-4 is that every character requires four bytes. > [...] I'm aware of this (and all the blah blah blah you are explaining). This always the same song. Memory. Let me ask. Is Python an 'american" product

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread Steven D'Aprano
On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote: sys.version > '3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]' timeit.timeit("('ab…' * 1000).replace('…', '……')") > 37.32762490493721 > timeit.timeit("('ab…' * 10).replace('…', 'œ…')") 0.8158757139801764 > sys

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
>>> sys.version '3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]' >>> timeit.timeit("('ab…' * 1000).replace('…', '……')") 37.32762490493721 timeit.timeit("('ab…' * 10).replace('…', 'œ…')") 0.8158757139801764 >>> sys.version '3.3.0b2 (v3.3.0b2:4972a8f1b2aa, Aug 12 2012, 15:02:36)

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread Steven D'Aprano
On Fri, 17 Aug 2012 23:30:22 -0400, Dave Angel wrote: > On 08/17/2012 08:21 PM, Ian Kelly wrote: >> On Aug 17, 2012 2:58 PM, "Dave Angel" wrote: >>> The internal coding described in PEP 393 has nothing to do with >>> latin-1 encoding. >> It certainly does. PEP 393 provides for Unicode strings to

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread Steven D'Aprano
On Fri, 17 Aug 2012 11:45:02 -0700, wxjmfauth wrote: > Le vendredi 17 août 2012 20:21:34 UTC+2, Jerry Hill a écrit : >> On Fri, Aug 17, 2012 at 1:49 PM, wrote: >> >> > The character '…', Unicode name 'HORIZONTAL ELLIPSIS', >> > is one of these characters existing in the cp1252, mac-roman >> > c

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread Dave Angel
On 08/17/2012 08:21 PM, Ian Kelly wrote: > On Aug 17, 2012 2:58 PM, "Dave Angel" wrote: >> The internal coding described in PEP 393 has nothing to do with latin-1 >> encoding. > It certainly does. PEP 393 provides for Unicode strings to be represented > internally as any of Latin-1, UCS-2, or UCS-

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread Ian Kelly
On Aug 17, 2012 2:58 PM, "Dave Angel" wrote: > > The internal coding described in PEP 393 has nothing to do with latin-1 > encoding. It certainly does. PEP 393 provides for Unicode strings to be represented internally as any of Latin-1, UCS-2, or UCS-4, whichever is smallest and sufficient to con

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread Dave Angel
On 08/17/2012 02:45 PM, wxjmfa...@gmail.com wrote: > Le vendredi 17 août 2012 20:21:34 UTC+2, Jerry Hill a écrit : >> >> >> I don't understand what any of this has to do with Python. Just >> >> output your text in UTF-8 like any civilized person in the 21st >> >> century, and none of that is a pr

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread wxjmfauth
Le vendredi 17 août 2012 20:21:34 UTC+2, Jerry Hill a écrit : > On Fri, Aug 17, 2012 at 1:49 PM, wrote: > > > The character '…', Unicode name 'HORIZONTAL ELLIPSIS', > > > is one of these characters existing in the cp1252, mac-roman > > > coding schemes and not in iso-8859-1 (latin-1) and obvio

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread Jerry Hill
On Fri, Aug 17, 2012 at 1:49 PM, wrote: > The character '…', Unicode name 'HORIZONTAL ELLIPSIS', > is one of these characters existing in the cp1252, mac-roman > coding schemes and not in iso-8859-1 (latin-1) and obviously > not in ascii. It causes Py3.3 to work a few 100% slower > than Py<3.3 ve

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread wxjmfauth
Le vendredi 17 août 2012 01:59:31 UTC+2, Terry Reedy a écrit : > a = '…' > > print(ord(a)) > > >>> > > 8230 > > Most things with unicode are easier in 3.x, and some are even better in > > 3.3. The current beta is good enough for most informal work. 3.3.0 will > > be out in a month. > > >

Re: How do I display unicode value stored in a string variable using ord()

2012-08-16 Thread Alister
On Thu, 16 Aug 2012 15:09:47 -0700, Charles Jensen wrote: > Everyone knows that the python command > > ord(u'…') > > will output the number 8230 which is the unicode character for the > horizontal ellipsis. > > How would I use ord() to find the unicode value of a string stored in a > varia

Re: How do I display unicode value stored in a string variable using ord()

2012-08-16 Thread Terry Reedy
a = '…' print(ord(a)) >>> 8230 Most things with unicode are easier in 3.x, and some are even better in 3.3. The current beta is good enough for most informal work. 3.3.0 will be out in a month. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list

Re: How do I display unicode value stored in a string variable using ord()

2012-08-16 Thread Dave Angel
On 08/16/2012 06:09 PM, Charles Jensen wrote: > Everyone knows that the python command > > ord(u'…') > > will output the number 8230 which is the unicode character for the horizontal > ellipsis. > > How would I use ord() to find the unicode value of a string stored in a > variable? > > So

Re: How do I display unicode value stored in a string variable using ord()

2012-08-16 Thread Chris Angelico
On Fri, Aug 17, 2012 at 8:09 AM, Charles Jensen wrote: > How would I use ord() to find the unicode value of a string stored in a > variable? > > So the following 2 lines of code will give me the ascii value of the variable > a. How do I specify ord to give me the unicode value of a? > > a

How do I display unicode value stored in a string variable using ord()

2012-08-16 Thread Charles Jensen
Everyone knows that the python command ord(u'…') will output the number 8230 which is the unicode character for the horizontal ellipsis. How would I use ord() to find the unicode value of a string stored in a variable? So the following 2 lines of code will give me the ascii value of th