Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Marko Rauhamaa
BartC : > Usually anything that is defined can be changed at run-time so that the > compiler can never assume anything. The compiler can't assume anything permanent, but it could heuristically make excellent guesses at runtime. It needs to verify its guesses at the boundaries of compiled code and

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread BartC
On 04/07/2016 15:46, Ned Batchelder wrote: On Monday, July 4, 2016 at 10:36:54 AM UTC-4, BartC wrote: On 04/07/2016 13:47, Ned Batchelder wrote: This is a huge change. I've used a kind of 'weak' import scheme elsewhere, corresponding to C's '#include'. I think that could work in Python p

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Ned Batchelder
On Monday, July 4, 2016 at 10:36:54 AM UTC-4, BartC wrote: > On 04/07/2016 13:47, Ned Batchelder wrote: > > On Monday, July 4, 2016 at 6:05:20 AM UTC-4, BartC wrote: > >> On 04/07/2016 03:30, Steven D'Aprano wrote: > > >>> You're still having problems with the whole Python-as-a-dynamic-language >

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread BartC
On 04/07/2016 13:47, Ned Batchelder wrote: On Monday, July 4, 2016 at 6:05:20 AM UTC-4, BartC wrote: On 04/07/2016 03:30, Steven D'Aprano wrote: You're still having problems with the whole Python-as-a-dynamic-language thing, aren't you? :-) Most Pythons seem to pre-compile code before exec

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Ned Batchelder
On Monday, July 4, 2016 at 6:05:20 AM UTC-4, BartC wrote: > On 04/07/2016 03:30, Steven D'Aprano wrote: > > On Mon, 4 Jul 2016 10:17 am, BartC wrote: > > > >> On 04/07/2016 01:00, Lawrence D’Oliveiro wrote: > >>> On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote: > Python lacks a m

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Rustom Mody
On Monday, July 4, 2016 at 3:56:43 PM UTC+5:30, BartC wrote: > On 04/07/2016 02:15, Lawrence D’Oliveiro wrote: > > On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote: > >> The structure of such a parser doesn't need to exactly match the grammar > >> with a dedicated block of code for each o

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Jussi Piitulainen
BartC writes: > A simpler approach is to treat user-defined operators as aliases for > functions: > > def myadd(a,b): > return a+b > > operator ∇: >(myadd,2,+3) # map to myadd, 2 operands, prio 3, LTR > > x = y ∇ z > > is then equivalent to: > > x = myadd(y,z) > > However you will usua

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread BartC
On 04/07/2016 02:15, Lawrence D’Oliveiro wrote: On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote: The structure of such a parser doesn't need to exactly match the grammar with a dedicated block of code for each operator precedence. It can be table-driven so that an operator precedence

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Jussi Piitulainen
Lawrence D’Oliveiro writes: > On Monday, July 4, 2016 at 6:08:51 PM UTC+12, Jussi Piitulainen wrote: >> Something could be done, but if the intention is to allow >> mathematical notation, it needs to be done with care. > > Mathematics uses single-character variable names so that > multiplication c

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread BartC
On 04/07/2016 03:30, Steven D'Aprano wrote: On Mon, 4 Jul 2016 10:17 am, BartC wrote: On 04/07/2016 01:00, Lawrence D’Oliveiro wrote: On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote: Python lacks a mechanism to add user-defined operators. (R has this capability.) Maybe this feat

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Marko Rauhamaa
Lawrence D’Oliveiro : > Mathematics uses single-character variable names so that > multiplication can be implicit. I don't think anybody developed mathematical notation systematically. Rather, over the centuries, various masters came up with personal abbreviations and shorthand, which spread amon

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-04 Thread Lawrence D’Oliveiro
On Monday, July 4, 2016 at 6:08:51 PM UTC+12, Jussi Piitulainen wrote: > Something could be done, but if the intention is to allow > mathematical notation, it needs to be done with care. Mathematics uses single-character variable names so that multiplication can be implicit. An old, stillborn la

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Jussi Piitulainen
Rustom Mody writes: > Subscripts OTOH as part of identifier-lexemes doesn't seem to have any > issues They have the general issue that one might *want* them interpreted as indexes, so that a₁ would mean the same as a[1]. Mathematical symbols face similar issues. One would not *want* them all be

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Rustom Mody
On Monday, July 4, 2016 at 8:03:47 AM UTC+5:30, Steven D'Aprano wrote: > On Mon, 4 Jul 2016 07:28 am, Lawrence D’Oliveiro wrote: > > > On Monday, July 4, 2016 at 6:39:45 AM UTC+12, John Ladasky wrote: > >> Here's another worm for the can. Would you rather read this... > >> > >> d = sqrt(x**2 + y

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Steven D'Aprano
On Mon, 4 Jul 2016 07:28 am, Lawrence D’Oliveiro wrote: > On Monday, July 4, 2016 at 6:39:45 AM UTC+12, John Ladasky wrote: >> Here's another worm for the can. Would you rather read this... >> >> d = sqrt(x**2 + y**2) >> >> ...or this? >> >> d = √(x² + y²) > > Neither. I would rather see > >

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Steven D'Aprano
On Mon, 4 Jul 2016 10:17 am, BartC wrote: > On 04/07/2016 01:00, Lawrence D’Oliveiro wrote: >> On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote: >>> Python lacks a mechanism to add user-defined operators. (R has this >>> capability.) Maybe this feature could be added. >> >> That would

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Random832
On Sun, Jul 3, 2016, at 21:15, Lawrence D’Oliveiro wrote: > On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote: > > The structure of such a parser doesn't need to exactly match the grammar > > with a dedicated block of code for each operator precedence. It can be > > table-driven so that

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Random832
On Sun, Jul 3, 2016, at 20:00, Lawrence D’Oliveiro wrote: > That would be neat. But remember, you would have to define the operator > precedence as well. So you could no longer use a recursive-descent > parser. You could use a recursive-descent parser if you monkey-patch the parser when adding a n

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Monday, July 4, 2016 at 12:40:14 PM UTC+12, BartC wrote: > The structure of such a parser doesn't need to exactly match the grammar > with a dedicated block of code for each operator precedence. It can be > table-driven so that an operator precedence value is just an attribute. Of course. But

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread BartC
On 04/07/2016 01:24, Lawrence D’Oliveiro wrote: On Monday, July 4, 2016 at 12:17:47 PM UTC+12, BartC wrote: On 04/07/2016 01:00, Lawrence D’Oliveiro wrote: On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote: Python lacks a mechanism to add user-defined operators. (R has this capa

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Monday, July 4, 2016 at 12:17:47 PM UTC+12, BartC wrote: > > On 04/07/2016 01:00, Lawrence D’Oliveiro wrote: >> >> On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote: >>> >>> Python lacks a mechanism to add user-defined operators. (R has this >>> capability.) Maybe this feature could

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread BartC
On 04/07/2016 01:00, Lawrence D’Oliveiro wrote: On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote: Python lacks a mechanism to add user-defined operators. (R has this capability.) Maybe this feature could be added. That would be neat. But remember, you would have to define the oper

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Monday, July 4, 2016 at 11:47:26 AM UTC+12, eryk sun wrote: > Python lacks a mechanism to add user-defined operators. (R has this > capability.) Maybe this feature could be added. That would be neat. But remember, you would have to define the operator precedence as well. So you could no longer

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread eryk sun
On Sun, Jul 3, 2016 at 6:58 AM, John Ladasky wrote: > The nabla symbol (∇) is used in the naming of gradients. Python isn't having > it. > The interpreter throws a "SyntaxError: invalid character in identifier" when > it > encounters the ∇. Del is a mathematical operator to take the gradient. I

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Monday, July 4, 2016 at 6:39:45 AM UTC+12, John Ladasky wrote: > Here's another worm for the can. Would you rather read this... > > d = sqrt(x**2 + y**2) > > ...or this? > > d = √(x² + y²) Neither. I would rather see d = math.hypot(x, y) Much simpler, don’t you think? -- https://mail

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Marko Rauhamaa
Random832 : > Being able to put any character in a symbol doesn't make those strings > identifiers, any more than passing them to getattr/setattr (or > __import__, something's __name__, etc) does in Python. From R7RS, the newest Scheme standard (p. 61-62): 7.1.1. Lexical structure [...

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Sunday, July 3, 2016 at 11:50:52 PM UTC+12, BartC wrote: > Otherwise you can be looking at: > >a b c d e f g h > > (not Scheme) and wondering which are names and which are operators. I did a language design for my MSc thesis where all “functions” were operators. So a construct like “f(a,

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Sunday, July 3, 2016 at 9:02:05 PM UTC+12, Marko Rauhamaa wrote: > Lawrence D’Oliveiro: > >> On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote: >> >>> Personally, I don't think even π should be used in identifiers. >> > > Why not? > > 1. It can't be typed easily. I have a cus

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Random832
On Sun, Jul 3, 2016, at 07:22, Marko Rauhamaa wrote: > Christian Gollwitzer : > > Am 03.07.16 um 13:01 schrieb Marko Rauhamaa: > >> Scheme allows *any* characters whatsoever in identifiers. > > > > Parentheses? > > Yes. > > Hint: Python allows *any* characters whatsoever in strings. Being able t

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread MRAB
On 2016-07-03 19:39, John Ladasky wrote: On Sunday, July 3, 2016 at 12:42:14 AM UTC-7, Chris Angelico wrote: On Sun, Jul 3, 2016 at 4:58 PM, John Ladasky wrote: Very good question! The detaily answer is here: https://docs.python.org/3/reference/lexical_analysis.html#identifiers > A philosop

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread John Ladasky
On Sunday, July 3, 2016 at 12:42:14 AM UTC-7, Chris Angelico wrote: > On Sun, Jul 3, 2016 at 4:58 PM, John Ladasky wrote: > Very good question! The detaily answer is here: > > https://docs.python.org/3/reference/lexical_analysis.html#identifiers > > > A philosophical question. Why should any ch

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread John Ladasky
Lawrence, I trust you understand that I didn't post a complete working program, just a few lines showing the intended usage? -- https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Chris Angelico
On Sun, Jul 3, 2016 at 7:01 PM, Marko Rauhamaa wrote: > Lawrence D’Oliveiro : > >> On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote: >> >>> Personally, I don't think even π should be used in identifiers. >> >> Why not? > > 1. It can't be typed easily. > > 2. It can look like an n

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Marko Rauhamaa
Christian Gollwitzer : > Am 03.07.16 um 13:22 schrieb Marko Rauhamaa: >> Christian Gollwitzer : >>> Am 03.07.16 um 13:01 schrieb Marko Rauhamaa: Scheme allows *any* characters whatsoever in identifiers. >>> Parentheses? >> Yes. > > My knowledge of Scheme is rusty. How do you do that? More

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Christian Gollwitzer
Am 03.07.16 um 13:22 schrieb Marko Rauhamaa: Christian Gollwitzer : Am 03.07.16 um 13:01 schrieb Marko Rauhamaa: Alain Ketterlin : It would be very confusing to have a variable named ∇f, as confusing as naming a variable a+b or √x. Scheme allows *any* characters whatsoever in identifiers.

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread BartC
On 03/07/2016 12:01, Marko Rauhamaa wrote: Alain Ketterlin : It would be very confusing to have a variable named ∇f, as confusing as naming a variable a+b or √x. Scheme allows *any* characters whatsoever in identifiers. I think it's one of those languages that has already dispensed with mos

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Marko Rauhamaa
Christian Gollwitzer : > Am 03.07.16 um 13:01 schrieb Marko Rauhamaa: >> Alain Ketterlin : >> >>> It would be very confusing to have a variable named ∇f, as confusing >>> as naming a variable a+b or √x. >> >> Scheme allows *any* characters whatsoever in identifiers. > > Parentheses? Yes. Hint: P

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Christian Gollwitzer
Am 03.07.16 um 13:01 schrieb Marko Rauhamaa: Alain Ketterlin : It would be very confusing to have a variable named ∇f, as confusing as naming a variable a+b or √x. Scheme allows *any* characters whatsoever in identifiers. Parentheses? Christian -- https://mail.python.org/mailman/l

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Marko Rauhamaa
Alain Ketterlin : > It would be very confusing to have a variable named ∇f, as confusing > as naming a variable a+b or √x. Scheme allows *any* characters whatsoever in identifiers. Marko -- https://mail.python.org/mailman/listinfo/python-list

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Alain Ketterlin
John Ladasky writes: > from math import pi as π > [...] > c = 2 * π * r > Up until today, every character I've tried has been accepted by the > Python interpreter as a legitimate character for inclusion in a > variable name. Now I'm copying a formula which defines a gradient. The > nabla symbol

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Robert Kern
On 2016-07-03 08:29, Jussi Piitulainen wrote: (Hm. Python seems to understand that the character occurs in what is intended to be an identifier. Perhaps that's a default error message.) I suspect that "identifier" is the final catch-all token in the lexer. Comments and strings are clearly deli

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Marko Rauhamaa
Lawrence D’Oliveiro : > On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote: > >> Personally, I don't think even π should be used in identifiers. > > Why not? 1. It can't be typed easily. 2. It can look like an n. 3. Single-character identifiers should not be promoted, especially

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Sunday, July 3, 2016 at 7:27:04 PM UTC+12, Marko Rauhamaa wrote: > Personally, I don't think even π should be used in identifiers. Why not? Python already has all the other single-character constants in what probably the most fundamental identity in all of mathematics: $$e^{i \pi} + 1 =

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Chris Angelico
On Sun, Jul 3, 2016 at 4:58 PM, John Ladasky wrote: > Up until today, every character I've tried has been accepted by the Python > interpreter as a legitimate character for inclusion in a variable name. Now > I'm copying a formula which defines a gradient. The nabla symbol (∇) is used > in th

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Rustom Mody
On Sunday, July 3, 2016 at 12:29:14 PM UTC+5:30, John Ladasky wrote: > A while back, I shared my love for using Greek letters as variable names in > my Python (3.4) code -- when, and only when, they are warranted for improved > readability. For example, I like to see the following: > > > from

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Marko Rauhamaa
Lawrence D’Oliveiro : > It wasn’t the “π” it was complaining about... The question is why π is accepted but ∇ is not. The immediate reason is that π is a letter while ∇ is not. But the question, then, is why bother excluding nonletters from identifiers. Personally, I don't think even π should b

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Jussi Piitulainen
John Ladasky writes: [- -] > The nabla symbol (∇) is used in the naming of gradients. Python isn't > having it. The interpreter throws a "SyntaxError: invalid character > in identifier" when it encounters the ∇. > > I am now wondering what constitutes a valid character for an > identifier, and

Re: Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread Lawrence D’Oliveiro
On Sunday, July 3, 2016 at 6:59:14 PM UTC+12, John Ladasky wrote: > from math import pi as π > > c = 2 * π * r ldo@theon:~> python3 Python 3.5.1+ (default, Jun 10 2016, 09:03:40) [GCC 5.4.0 20160603] on linux Type "help", "copyright", "credits" or "license" for more information.

Well, I finally ran into a Python Unicode problem, sort of

2016-07-03 Thread John Ladasky
A while back, I shared my love for using Greek letters as variable names in my Python (3.4) code -- when, and only when, they are warranted for improved readability. For example, I like to see the following: from math import pi as π c = 2 * π * r When I am copying mathematical formulas from

Re: How to work around a unicode problem?

2012-01-24 Thread tinnews
Chris Rebert wrote: > On Tue, Jan 24, 2012 at 3:57 AM, wrote: > > I have a small python program that uses the pyexiv2 package to view > > exif data in image files. > > > > I've hit a problem because I have a filename with accented characters > > in its path and the pyexiv2 code traps as follows:

Re: How to work around a unicode problem?

2012-01-24 Thread tinnews
Peter Otten <__pete...@web.de> wrote: > tinn...@isbd.co.uk wrote: > > > I have a small python program that uses the pyexiv2 package to view > > exif data in image files. > > > > I've hit a problem because I have a filename with accented characters > > in its path and the pyexiv2 code traps as fol

Re: How to work around a unicode problem?

2012-01-24 Thread Peter Otten
tinn...@isbd.co.uk wrote: > I have a small python program that uses the pyexiv2 package to view > exif data in image files. > > I've hit a problem because I have a filename with accented characters > in its path and the pyexiv2 code traps as follows:- > > Traceback (most recent call last): >

Re: How to work around a unicode problem?

2012-01-24 Thread Chris Rebert
On Tue, Jan 24, 2012 at 3:57 AM, wrote: > I have a small python program that uses the pyexiv2 package to view > exif data in image files. > > I've hit a problem because I have a filename with accented characters > in its path and the pyexiv2 code traps as follows:- > >    Traceback (most recent c

How to work around a unicode problem?

2012-01-24 Thread tinnews
I have a small python program that uses the pyexiv2 package to view exif data in image files. I've hit a problem because I have a filename with accented characters in its path and the pyexiv2 code traps as follows:- Traceback (most recent call last): File "/home/chris/bin/eview.py", lin

Re: Re: unicode problem?

2010-10-09 Thread hidura
I had a similar problem but i can 't encode a byte to a file what has been uploaded, without damage the data if i used utf-8 to encode the file duplicates the size, and i try to change the codec to raw_unicode_escape and this barely give me the correct size but still damage the file, i used

Re: unicode problem?

2010-10-09 Thread Chris Rebert
On Sat, Oct 9, 2010 at 4:59 PM, Brian Blais wrote: > This may be a stemming from my complete ignorance of unicode, but when I do > this (Python 2.6): > > s='\xc2\xa9 2008 \r\n' > > and I want the ascii version of it, ignoring any non-ascii chars, I thought I > could do: > > s.encode('ascii','ign

Re: unicode problem?

2010-10-09 Thread Benjamin Kaplan
On Sat, Oct 9, 2010 at 7:59 PM, Brian Blais wrote: > This may be a stemming from my complete ignorance of unicode, but when I do > this (Python 2.6): > > s='\xc2\xa9 2008 \r\n' > > and I want the ascii version of it, ignoring any non-ascii chars, I thought I > could do: > > s.encode('ascii','ign

unicode problem?

2010-10-09 Thread Brian Blais
This may be a stemming from my complete ignorance of unicode, but when I do this (Python 2.6): s='\xc2\xa9 2008 \r\n' and I want the ascii version of it, ignoring any non-ascii chars, I thought I could do: s.encode('ascii','ignore') but it gives the error: In [20]:s.encode('ascii','ignore')

Re: Unicode problem in ucs4

2009-03-25 Thread abhi
On Mar 24, 4:55 am, "Martin v. Löwis" wrote: > > So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3 > > \0s after a char, printf or wprintf is only printing one letter. > > No. printf indeed will see a terminating character. However, wprintf > should correctly know that a wchar_t

Re: Unicode problem in ucs4

2009-03-23 Thread Martin v. Löwis
> So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3 > \0s after a char, printf or wprintf is only printing one letter. No. printf indeed will see a terminating character. However, wprintf should correctly know that a wchar_t has four bytes per character, and print it correctly. M

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 12:57, abhi wrote: >>> Is there any way >>> by which I can force wchar_t to be 2 bytes, or can I convert this UCS4 >>> data to UCS2 explicitly? >> Sure: just use the appropriate UTF-16 codec for this. >> >> /* Generic codec based encoding API. >> >>object is passed through the enc

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 14:05, abhi wrote: > Hi Marc, >Is there any way to ensure that wchar_t size would always be 2 > instead of 4 in ucs4 configured python? Googling gave me the > impression that there is some logic written in PyUnicode_AsWideChar() > which can take care of ucs4 to ucs2 conversion

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 23, 4:57 pm, abhi wrote: > On Mar 23, 4:37 pm, "M.-A. Lemburg" wrote: > > > > > On 2009-03-23 11:50, abhi wrote: > > > > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > > > Thanks Marc, John, > > >          With your help, I am at least somewhere. I re-wrote the code > > > to compare Py_Unic

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 23, 4:37 pm, "M.-A. Lemburg" wrote: > On 2009-03-23 11:50, abhi wrote: > > > > > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > > Thanks Marc, John, > >          With your help, I am at least somewhere. I re-wrote the code > > to compare Py_Unicode and wchar_t outputs and they both look exac

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 11:50, abhi wrote: > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > Thanks Marc, John, > With your help, I am at least somewhere. I re-wrote the code > to compare Py_Unicode and wchar_t outputs and they both look exactly > the same. > > #include > > static PyObject *unicode_

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > On 2009-03-23 08:18, abhi wrote: > > > > > On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote: > >>> unicodeTest.c > >>> #include > >>> static PyObject *unicode_helper(PyObject *self,PyObject *args){ > >>>    PyObject *sampleObj = NULL; > >>>            Py_UNIC

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 08:18, abhi wrote: > On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote: >>> unicodeTest.c >>> #include >>> static PyObject *unicode_helper(PyObject *self,PyObject *args){ >>>PyObject *sampleObj = NULL; >>>Py_UNICODE *sample = NULL; >>> if (!PyArg_ParseTuple(args, "O", &

Re: Unicode problem in ucs4

2009-03-23 Thread John Machin
On Mar 23, 6:41 pm, John Machin had a severe attack of backslashitis: > [presuming littleendian] The ucs4 string will look like "\t\0\0\0e > \0\0\0s\0\0\0t\0\0\0" in memory. I suspect that your wprintf is > grokking only 16-bit doodads -- "\t\0" is printed and then "\0\0" is > end-of-string. Try

Re: Unicode problem in ucs4

2009-03-23 Thread John Machin
On Mar 23, 6:18 pm, abhi wrote: [snip] > Hi Mark, >      Thanks for the help. I tried PyUnicode_AsWideChar() but I am > getting the same result i.e. only the first letter. > > sample code: > > #include > > static PyObject *unicode_helper(PyObject *self,PyObject *args){ >         PyObject *sampleO

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote: > On 2009-03-20 12:13, abhi wrote: > > > > > > > On Mar 20, 11:03 am, "Martin v. Löwis" wrote: > >>> Any idea on why this is happening? > >> Can you provide a complete example? Your code looks correct, and should > >> just work. > > >> How do you know th

Re: Unicode problem in ucs4

2009-03-20 Thread M.-A. Lemburg
On 2009-03-20 12:13, abhi wrote: > On Mar 20, 11:03 am, "Martin v. Löwis" wrote: >>> Any idea on why this is happening? >> Can you provide a complete example? Your code looks correct, and should >> just work. >> >> How do you know the result contains only 't' (i.e. how do you know it >> does not c

Re: Unicode problem in ucs4

2009-03-20 Thread abhi
On Mar 20, 11:03 am, "Martin v. Löwis" wrote: > > Any idea on why this is happening? > > Can you provide a complete example? Your code looks correct, and should > just work. > > How do you know the result contains only 't' (i.e. how do you know it > does not contain 'e', 's', 't')? > > Regards, >

Re: Unicode problem in ucs4

2009-03-19 Thread Martin v. Löwis
> Any idea on why this is happening? Can you provide a complete example? Your code looks correct, and should just work. How do you know the result contains only 't' (i.e. how do you know it does not contain 'e', 's', 't')? Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list

Unicode problem in ucs4

2009-03-19 Thread abhi
Hi, I have a C extension, which takes a unicode or string value from python and convert it to unicode before doing more operations on it. The skeleton looks like: static PyObject *unicode_helper( PyObject *self, PyObject *args){ PyObject *sampleObj = NULL; Py_UNICODE *sample = NULL

Re: Unicode Problem

2008-10-30 Thread Bard Aase
On Thu, Oct 30, 2008 at 8:28 AM, Seid Mohammed <[EMAIL PROTECTED]> wrote: > I am new to python. > I want to print Amharic character using the Python IDLE. > here goes somple code > == abebe = 'አበበ በሶ በላ' abebe > '\xe1\x8a\xa0\xe1

Re: Unicode Problem

2008-10-30 Thread Ulrich Eckhardt
Seid Mohammed wrote: > I am new to python. Welcome! :) abebe = 'አበበ በሶ በላ' abebe > '\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89\xa0 \xe1\x89\xa0\xe1\x88\xb6 > \xe1\x89\xa0\xe1\x88\x8b' print abebe > አበበ በሶ በላ abeba = ['አበበ','በሶ','በላ'] abeba > ['\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89\xa0',

Re: Unicode Problem

2008-10-30 Thread Marc 'BlackJack' Rintsch
On Thu, 30 Oct 2008 10:28:39 +0300, Seid Mohammed wrote: > I am new to python. > I want to print Amharic character using the Python IDLE. here goes > somple code > == abebe = 'አበበ በሶ በላ' abebe > '\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89

Unicode Problem

2008-10-30 Thread Seid Mohammed
I am new to python. I want to print Amharic character using the Python IDLE. here goes somple code == >>> abebe = 'አበበ በሶ በላ' >>> abebe '\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89\xa0 \xe1\x89\xa0\xe1\x88\xb6 \xe1\x89\xa0\xe1\x88\x8b' >>> print abeb

Re: Logging library unicode problem

2008-08-20 Thread Vinay Sajip
On 13 Aug, 11:08, Victor Lin <[EMAIL PROTECTED]> wrote: > Hi, > I'm writting a application using python standardloggingsystem. I > encounter some problem with unicode message passed tologginglibrary. > I found that unicode message will be messed up bylogginghandler. > > piese of StreamHandler: > >

Re: Logging library unicode problem

2008-08-13 Thread Patrol Sun
What's your system? Simple Chinese Windows??? 2008/8/13 Victor Lin <[EMAIL PROTECTED]> > Hi, > I'm writting a application using python standard logging system. I > encounter some problem with unicode message passed to logging library. > I found that unicode message will be messed up by logging ha

Logging library unicode problem

2008-08-13 Thread Victor Lin
Hi, I'm writting a application using python standard logging system. I encounter some problem with unicode message passed to logging library. I found that unicode message will be messed up by logging handler. piese of StreamHandler: try: self.stream.write(fs %

Unicode Problem

2008-01-28 Thread Victor Subervi
Hi; New to unicode. Got this error: Traceback (most recent call last): File "", line 1, in File "", line 29, in tagWords File "/usr/local/lib/python2.5/codecs.py", line 303, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0x

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-26 Thread oren . tsur
On Jul 26, 4:34 pm, Stefan Behnel <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > On Jul 26, 3:13 pm, John Machin <[EMAIL PROTECTED]> wrote: > >> On Jul 26, 9:24 pm, [EMAIL PROTECTED] wrote: > > >>> OK, I solved the problem but I still don't get what went wrong. > >>> Solution - use tree

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-26 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > On Jul 26, 3:13 pm, John Machin <[EMAIL PROTECTED]> wrote: >> On Jul 26, 9:24 pm, [EMAIL PROTECTED] wrote: >> >>> OK, I solved the problem but I still don't get what went wrong. >>> Solution - use tree builder in order to create the new xml file >>> (previously I was "ma

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-26 Thread oren . tsur
On Jul 26, 3:13 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Jul 26, 9:24 pm, [EMAIL PROTECTED] wrote: > > > OK, I solved the problem but I still don't get what went wrong. > > Solution - use tree builder in order to create the new xml file > > (previously I was "manually" creating it). > > > I

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-26 Thread John Machin
On Jul 26, 9:24 pm, [EMAIL PROTECTED] wrote: > OK, I solved the problem but I still don't get what went wrong. > Solution - use tree builder in order to create the new xml file > (previously I was "manually" creating it). > > I'm still curious so I'm adding a link to a short and very simple > scri

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-26 Thread oren . tsur
OK, I solved the problem but I still don't get what went wrong. Solution - use tree builder in order to create the new xml file (previously I was "manually" creating it). I'm still curious so I'm adding a link to a short and very simple script that gets an xml (containing non ascii chars) from th

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-24 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: >> How about trying >> root = ElementTree.parse(urlopen(query), encoding ='utf-8') That doesn't work. > this specific thing is not working, however, parsing the url is not > problematic. So you tried parsing the complete XML file and it works? Then it's the way you stri

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-24 Thread oren . tsur
> How about trying > root = ElementTree.parse(urlopen(query), encoding ='utf-8') > this specific thing is not working, however, parsing the url is not problematic. the problem is that after parsing the xml at the url I save some of the fields to a local file and the local file is not being parsed

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-24 Thread André
On Jul 23, 11:29 am, [EMAIL PROTECTED] wrote: > (this question was also posted in the devshed python > forum:http://forums.devshed.com/python-programming-11/parsing-xml-with-elem... > ). > - > > (it's a bit longish but I hope I give all the information) > > 1. here is m

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-24 Thread Steve Holden
Marc 'BlackJack' Rintsch wrote: > On Tue, 24 Jul 2007 05:57:26 +, oren.tsur wrote: > >> but the thing is that the parser parses it all right from the web (the >> amazon response) but fails to parse the locally saved file. > > I've just used wget to fetch that URL and `ElementTree` parses that

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-24 Thread Marc 'BlackJack' Rintsch
On Tue, 24 Jul 2007 05:57:26 +, oren.tsur wrote: > but the thing is that the parser parses it all right from the web (the > amazon response) but fails to parse the locally saved file. I've just used wget to fetch that URL and `ElementTree` parses that local file without problems. Maybe you s

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-24 Thread Stefan Behnel
[EMAIL PROTECTED] wrote: > On Jul 23, 4:46 pm, "Richard Brodie" <[EMAIL PROTECTED]> wrote: >> <[EMAIL PROTECTED]> wrote in message >> >> news:[EMAIL PROTECTED] >> >>> so what's the difference? how comes parsing is fine >>> in the first case but erroneous in the second case? >> You may have guessed

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-23 Thread oren . tsur
On Jul 23, 4:46 pm, "Richard Brodie" <[EMAIL PROTECTED]> wrote: > <[EMAIL PROTECTED]> wrote in message > > news:[EMAIL PROTECTED] > > > so what's the difference? how comes parsing is fine > > in the first case but erroneous in the second case? > > You may have guessed the encoding wrong. It probabl

Re: Parsing XML with ElementTree (unicode problem?)

2007-07-23 Thread Richard Brodie
<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > so what's the difference? how comes parsing is fine > in the first case but erroneous in the second case? You may have guessed the encoding wrong. It probably wasn't utf-8 to start with but iso8859-1 or similar. What actual byte valu

Parsing XML with ElementTree (unicode problem?)

2007-07-23 Thread oren . tsur
(this question was also posted in the devshed python forum: http://forums.devshed.com/python-programming-11/parsing-xml-with-elementtree-unicode-problem-461518.html ). - (it's a bit longish but I hope I give all the information) 1. here is my problem: I'm

Re: Unicode problem

2007-07-08 Thread [EMAIL PROTECTED]
> > What software did you use to make that so? The Python codec certainly > never would do such a thing. > > Are you sure it was latin-1 and \x27, and not windows-1252 and \x92? > > Regards, > Martin you're right...the source of text are html pages and obviously webmasters have poor knowledge o

Re: Unicode problem

2007-07-07 Thread Erik Max Francis
[EMAIL PROTECTED] wrote: > Hi to all, I have a little problem with unicode handling under Python. > > I have this code > > s = u'A unicode string with this damn apostrophe \x2019' > > outf = codecs.open('filename.txt', 'w', 'iso-8859-15') > outf.write(s) > > what I obtain is a UnicodeEncodeErr

Re: Unicode problem

2007-07-07 Thread Alex Martelli
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: ... Ah, I answered you on the Italian NG before seeing you had also posted the same request here. What I proposed there was (untested): import codecs _rimedi = { u'\x2019': "'" } def rimedia(exc): if isinstance(exc, (UnicodeEncodeError, Unic

Re: Unicode problem

2007-07-07 Thread Martin v. Löwis
> I agree, but the problem is much subtle. I have coverted a text from > iso-8859-1 to utf-8 and the codecs have translated \x27 ( the iso > apostrophe ) to \xe28099 in utf-8 ( or u'2019' in unicode code point > notation ) What software did you use to make that so? The Python codec certainly never

  1   2   >