Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Steve D'Aprano
On Wed, 28 Dec 2016 02:05 am, Skip Montanaro wrote: > I am trying to parse some XML which doesn't specify an encoding (Python > 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII > data. No great surprise there, but I'm having trouble getting it to use > another encoding. F

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Peter Otten wrote: > works, but to go back to the bytes that the XML parser needs the > "preferred encoding", in your case ASCII, will be used. Correction: it's probably sys.getdefaultencoding() rather than locale.getdefaultencoding(). So all systems with a sane configuration will behave the sa

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Skip Montanaro wrote: > Peter> Isn't UTF-8 the default? > > Apparently not. Sorry, I meant the default for XML. > I believe in my reading it said that it used whatever > locale.getpreferredencoding() returned. That's problematic when you > live in a country that thinks ASCII is everything. Per

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Skip Montanaro
Peter> Isn't UTF-8 the default? Apparently not. I believe in my reading it said that it used whatever locale.getpreferredencoding() returned. That's problematic when you live in a country that thinks ASCII is everything. Personally, I think UTF-8 should be the default, but that train's long left t

Re: encoding="utf8" ignored when parsing XML

2016-12-27 Thread Peter Otten
Skip Montanaro wrote: > I am trying to parse some XML which doesn't specify an encoding (Python > 2.7.12 via Anaconda on RH Linux), so it barfs when it encounters non-ASCII > data. No great surprise there, but I'm having trouble getting it to use > another encoding. First, I tried specifying the e

Re: Encoding of Python 2 string literals

2015-07-22 Thread Chris Angelico
On Thu, Jul 23, 2015 at 3:58 PM, dieter wrote: > Steven D'Aprano writes: >> On Wed, 22 Jul 2015 08:17 pm, anatoly techtonik wrote: >>> Is there a way to know encoding of string (bytes) literal >>> defined in source file? For example, given that source: >>> >>> # -*- coding: utf-8 -*- >>>

Re: Encoding of Python 2 string literals

2015-07-22 Thread dieter
Steven D'Aprano writes: > On Wed, 22 Jul 2015 08:17 pm, anatoly techtonik wrote: >> Is there a way to know encoding of string (bytes) literal >> defined in source file? For example, given that source: >> >> # -*- coding: utf-8 -*- >> from library import Entry >> Entry("текст") >> >>

Re: Encoding of Python 2 string literals

2015-07-22 Thread dieter
Mark Lawrence writes: > On 22/07/2015 11:17, anatoly techtonik wrote: > ... > Without question you are the most appalling person who should by > definition be excluded from the Python community. You refuse point > blank to contribute to the bug tracker, you've already been banned > from several

Re: Encoding of Python 2 string literals

2015-07-22 Thread Steven D'Aprano
On Thursday 23 July 2015 07:26, Mark Lawrence wrote: > Oh, and please do not tell me to back off. It has been show in recent > days that despite my problems I have contributed to core Python. > Anatoly will contribute to any project, but on his terms, and his terms > only. I have never done any

Re: Encoding of Python 2 string literals

2015-07-22 Thread Mark Lawrence
On 22/07/2015 21:52, Terry Reedy wrote: On 7/22/2015 4:30 PM, Mark Lawrence wrote: http://www.scons.org/ In particular, he is, according to him, trying to make it possible to port it to 3.x. This is something we both want to encourage. I didn't say he was banned from the Python list. He has

Re: Encoding of Python 2 string literals

2015-07-22 Thread Terry Reedy
On 7/22/2015 4:30 PM, Mark Lawrence wrote: Mark, back off. Anatoly has not been banned from python-list. In fact, he has been told that this is where he *should* post, and not be off-topic. Not having the temperament to work with core Python, he is, apparently, trying to contribute to another

Re: Encoding of Python 2 string literals

2015-07-22 Thread Mark Lawrence
On 22/07/2015 11:17, anatoly techtonik wrote: Hi, Is there a way to know encoding of string (bytes) literal defined in source file? For example, given that source: # -*- coding: utf-8 -*- from library import Entry Entry("текст") Is there any way for Entry() constructor to know t

Re: Encoding of Python 2 string literals

2015-07-22 Thread Chris Angelico
On Thu, Jul 23, 2015 at 12:38 AM, Steven D'Aprano wrote: > On Wed, 22 Jul 2015 08:17 pm, anatoly techtonik wrote: > >> Hi, >> >> Is there a way to know encoding of string (bytes) literal >> defined in source file? For example, given that source: >> >> # -*- coding: utf-8 -*- >> from librar

Re: Encoding of Python 2 string literals

2015-07-22 Thread Steven D'Aprano
On Wed, 22 Jul 2015 08:17 pm, anatoly techtonik wrote: > Hi, > > Is there a way to know encoding of string (bytes) literal > defined in source file? For example, given that source: > > # -*- coding: utf-8 -*- > from library import Entry > Entry("текст") > > Is there any way for Entr

Re: Encoding of Python 2 string literals

2015-07-22 Thread Laura Creighton
In a message of Wed, 22 Jul 2015 22:39:56 +1000, Chris Angelico writes: >On Wed, Jul 22, 2015 at 8:17 PM, anatoly techtonik wrote: >> Is there a way to know encoding of string (bytes) literal >> defined in source file? For example, given that source: >> >> # -*- coding: utf-8 -*- >> from l

Re: Encoding of Python 2 string literals

2015-07-22 Thread Chris Angelico
On Wed, Jul 22, 2015 at 8:17 PM, anatoly techtonik wrote: > Is there a way to know encoding of string (bytes) literal > defined in source file? For example, given that source: > > # -*- coding: utf-8 -*- > from library import Entry > Entry("текст") > > Is there any way for Entry() cons

Re: encoding name mappings in codecs.py with email/charset.py

2014-12-15 Thread Stefanos Karasavvidis
I played around with changing the names in the aliases.py and locale.py files (from iso8859 to iso-88559), but this broke mailman. I ended up changing the charset.py file input_charset = codecs.lookup(input_charset).name except LookupError: pass if (inpu

Re: encoding name mappings in codecs.py with email/charset.py

2014-12-14 Thread gst
Le dimanche 14 décembre 2014 14:10:22 UTC-5, Stefanos Karasavvidis a écrit : > thanks for replying gst. > > I've thought already of patching the Charset class, but hoped for a cleaner > solution. > > > This ALIASES dict has already all the iso names *with* a dash. So it must get > striped som

Re: encoding name mappings in codecs.py with email/charset.py

2014-12-14 Thread Stefanos Karasavvidis
thanks for replying gst. I've thought already of patching the Charset class, but hoped for a cleaner solution. This ALIASES dict has already all the iso names *with* a dash. So it must get striped somewhere else. sk On Sun, Dec 14, 2014 at 7:21 PM, gst wrote: > Le vendredi 12 décembre 2014 04

Re: encoding name mappings in codecs.py with email/charset.py

2014-12-14 Thread gst
Le vendredi 12 décembre 2014 04:21:14 UTC-5, Stefanos Karasavvidis a écrit : > I've hit a wall with mailman which seems to be caused by pyhon's character > encoding names. > > I've narrowed the problem down to the email/charset.py file. Basically the > following happens: > Hi, it's all in th

Re: Encoding trouble when script called from application

2014-01-14 Thread Peter Otten
Florian Lindner wrote: > Hello! > > I'm using python 3.2.3 on debian wheezy. My script is called from my mail > delivery agent (MDA) maildrop (like procmail) through it's xfilter > directive. > > Script works fine when used interactively, e.g. ./script.py < testmail but > when called from maildr

Re: Encoding of surrogate code points to UTF-8

2013-10-09 Thread Neil Cerutti
On 2013-10-09, Ned Batchelder wrote: > On 10/9/13 4:22 AM, wxjmfa...@gmail.com wrote: >> and what Unicode.org does not say is that these coding schemes >> (like any coding scheme) should be used in an exclusive way. > > Can you clarify what you mean by "in an exclusive way"? Ned, pay no attention

Re: Encoding of surrogate code points to UTF-8

2013-10-09 Thread Ned Batchelder
On 10/9/13 4:22 AM, wxjmfa...@gmail.com wrote: Le mercredi 9 octobre 2013 08:20:05 UTC+2, Steven D'Aprano a écrit : http://www.unicode.org/versions/Unicode6.2.0/ch02.pdf#G13708 "All three encoding forms can be used to represent the full range of encoded characters in the Unicode Standard; ...

Re: Encoding of surrogate code points to UTF-8

2013-10-09 Thread wxjmfauth
Le mercredi 9 octobre 2013 08:20:05 UTC+2, Steven D'Aprano a écrit : > > > > http://www.unicode.org/versions/Unicode6.2.0/ch02.pdf#G13708 "All three > > > encoding forms can be used to represent the full range of encoded > > > characters in the Unicode Standard; ... Each of the three Unicode >

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Steven D'Aprano
On Tue, 08 Oct 2013 21:28:25 -0400, Terry Reedy wrote: > On 10/8/2013 6:30 PM, Steven D'Aprano wrote: >> On Tue, 08 Oct 2013 15:14:33 +, Neil Cerutti wrote: >> >>> In any case, "\ud800\udc01" isn't a valid unicode string. >> >> I don't think this is correct. Can you show me where the standard

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Terry Reedy
On 10/8/2013 6:30 PM, Steven D'Aprano wrote: On Tue, 08 Oct 2013 15:14:33 +, Neil Cerutti wrote: In any case, "\ud800\udc01" isn't a valid unicode string. I don't think this is correct. Can you show me where the standard says that Unicode strings[1] may not contain surrogates? I think tha

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Steven D'Aprano
On Tue, 08 Oct 2013 15:14:33 +, Neil Cerutti wrote: > In any case, "\ud800\udc01" isn't a valid unicode string. I don't think this is correct. Can you show me where the standard says that Unicode strings[1] may not contain surrogates? I think that is a critical point, and the FAQ conflates

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Steven D'Aprano
On Tue, 08 Oct 2013 18:00:58 +0100, MRAB wrote: > The only time you should get a surrogate pair in a Unicode string is in > a narrow build, which doesn't exist in Python 3.3 and later. Incorrect. py> sys.version '3.3.0rc3 (default, Sep 27 2012, 18:44:58) \n[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Terry Reedy
On 10/8/2013 5:47 PM, Terry Reedy wrote: On 10/8/2013 9:52 AM, Steven D'Aprano wrote: But reading the previous entry in the FAQs: http://www.unicode.org/faq/utf_bom.html#utf8-4 I interpret this as meaning that I should be able to encode valid pairs of surrogates. It says you should be able

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Terry Reedy
On 10/8/2013 9:52 AM, Steven D'Aprano wrote: I think this is a bug in Python's UTF-8 handling, but I'm not sure. If I've read the Unicode FAQs correctly, you cannot encode *lone* surrogate code points into UTF-8: http://www.unicode.org/faq/utf_bom.html#utf8-5 Sure enough, using Python 3.3: py

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread wxjmfauth
>>> sys.version '3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]' >>> '\ud800'.encode('utf-8') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread MRAB
On 08/10/2013 16:23, Pete Forman wrote: Steven D'Aprano writes: I think this is a bug in Python's UTF-8 handling, but I'm not sure. [snip] py> s = '\ud800\udc01' py> s.encode('utf-8') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode cha

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Neil Cerutti
On 2013-10-08, Neil Cerutti wrote: > In any case, "\ud800\udc01" isn't a valid unicode string. In a > perfect world it would automatically get converted to > '\u00010001' without intervention. This last paragraph is erroneous. I must have had a typo in my testing. -- Neil Cerutti -- https://ma

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Pete Forman
Steven D'Aprano writes: > I think this is a bug in Python's UTF-8 handling, but I'm not sure. [snip] > py> s = '\ud800\udc01' > py> s.encode('utf-8') > Traceback (most recent call last): > File "", line 1, in > UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in > position 0:

Re: Encoding of surrogate code points to UTF-8

2013-10-08 Thread Neil Cerutti
On 2013-10-08, Steven D'Aprano wrote: > py> c = '\N{LINEAR B SYLLABLE B038 E}' > py> surr_pair = c.encode('utf-16be') > py> print(surr_pair) > b'\xd8\x00\xdc\x01' > > and then use those same values as the code points, I ought to be able to > encode to UTF-8, as if it were the same \N{LINEAR B SYL

Re: Encoding problem in python

2013-08-21 Thread electron
If you use Arabic frequently on your system, I suggest to change your windows system locale from "Region and Language" in control panel (Administrative tab) and set to Arabic. -- http://mail.python.org/mailman/listinfo/python-list

Re: Encoding questions (continuation)

2013-06-12 Thread Larry Hudson
On 06/12/2013 01:20 AM, Larry Hudson wrote: On 06/11/2013 01:09 PM, Νικόλαος Κούρας wrote: Τη Τρίτη, 11 Ιουνίου 2013 10:52:02 π.μ. UTC+3, ο χρήστης Larry Hudson έγραψε: On 06/10/2013 06:56 AM, Νικόλαος Κούρας wrote: I forgot to specify I'm talking about using Thunderbird Newsgroups, not the E

Re: Encoding questions (continuation)

2013-06-12 Thread Larry Hudson
On 06/11/2013 01:09 PM, Νικόλαος Κούρας wrote: Τη Τρίτη, 11 Ιουνίου 2013 10:52:02 π.μ. UTC+3, ο χρήστης Larry Hudson έγραψε: On 06/10/2013 06:56 AM, Νικόλαος Κούρας wrote: i think your suggestions works only if you have a mail handy in TB and you hit follow-up what if you dont have the mail

Re: OT: e-mail reply to old/archived message (was Re: Encoding questions (continuation))

2013-06-11 Thread Νικόλαος Κούρας
Τη Τρίτη, 11 Ιουνίου 2013 2:21:50 μ.μ. UTC+3, ο χρήστης Andreas Perstinger έγραψε: > > sending the mail to python-list@python.org will just open anew > > subject intead of replyign to an opened thread. > You would need to find out the Message-Id of the post you want to reply > to and then add m

Re: Encoding questions (continuation)

2013-06-11 Thread Νικόλαος Κούρας
Τη Τρίτη, 11 Ιουνίου 2013 10:52:02 π.μ. UTC+3, ο χρήστης Larry Hudson έγραψε: > On 06/10/2013 06:56 AM, Νικόλαος Κούρας wrote: > > > > >>> ps. i tried to post a reply to the thread i opend via thunderbird mail > > >>> client, but not as a reply to somne other reply but as new mail send to > >

Re: Encoding questions (continuation)

2013-06-11 Thread Νικόλαος Κούρας
Τη Τρίτη, 11 Ιουνίου 2013 1:19:25 π.μ. UTC+3, ο χρήστης Lele Gaifax έγραψε: > Maybe he just want to prove we are smart enough... > Or maybe his encoding algorithm needs some refinement > :-) I already knwo you are smart enough, the latter is what needs some more refinement work :-) -- http://

OT: e-mail reply to old/archived message (was Re: Encoding questions (continuation))

2013-06-11 Thread Andreas Perstinger
On 10.06.2013 15:56, Νικόλαος Κούρας wrote: Τη Δευτέρα, 10 Ιουνίου 2013 2:41:07 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε: On Mon, 10 Jun 2013 14:13:00 +0300, Νικόλαος Κούρας wrote: ps. i tried to post a reply to the thread i opend via thunderbird mail client, but not as a reply to somne oth

Re: Encoding questions (continuation)

2013-06-11 Thread Larry Hudson
On 06/10/2013 06:56 AM, Νικόλαος Κούρας wrote: ps. i tried to post a reply to the thread i opend via thunderbird mail client, but not as a reply to somne other reply but as new mail send to python list. because of that a new thread will be opened. How can i tell thunderbird to reply to the orig

Re: Encoding questions (continuation)

2013-06-10 Thread Lele Gaifax
Steven D'Aprano writes: >> I did but docs confuse me even more. Can you pleas ebut it simple. > > Nikos, if you can't be bothered to correct your spelling mistakes, why > should we be bothered to answer your questions? Maybe he just want to prove we are smart enough... http://www.foxnews.com/s

Re: Encoding questions (continuation)

2013-06-10 Thread Fábio Santos
On 10 Jun 2013 15:04, "Νικόλαος Κούρας" wrote: > > Τη Δευτέρα, 10 Ιουνίου 2013 2:41:07 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε: > > On Mon, 10 Jun 2013 14:13:00 +0300, Νικόλαος Κούρας wrote: > > > > > > > > > Τη Δευτέρα, 10 Ιουνίου 2013 1:42:25 μ.μ. UTC+3, ο χρήστης Andreas > > > > > Persting

Re: Encoding questions (continuation)

2013-06-10 Thread Νικόλαος Κούρας
Τη Δευτέρα, 10 Ιουνίου 2013 2:41:07 μ.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε: > On Mon, 10 Jun 2013 14:13:00 +0300, Νικόλαος Κούρας wrote: > > > > > Τη Δευτέρα, 10 Ιουνίου 2013 1:42:25 μ.μ. UTC+3, ο χρήστης Andreas > > > Perstinger έγραψε: > > > > > > > >>> s = b'\xce\xb1' > > > > >

Re: Encoding questions (continuation)

2013-06-10 Thread Steven D'Aprano
On Mon, 10 Jun 2013 14:13:00 +0300, Νικόλαος Κούρας wrote: > Τη Δευτέρα, 10 Ιουνίου 2013 1:42:25 μ.μ. UTC+3, ο χρήστης Andreas > Perstinger έγραψε: > > > >>> s = b'\xce\xb1' > > > > >>> s[0] > > > > 206 > > 's' is a byte object, how can you treat it as a string asking to present > you its

Re: Encoding NaN in JSON

2013-04-22 Thread Wayne Werner
On Sat, 20 Apr 2013, Chris “Kwpolska” Warrick wrote: On Fri, Apr 19, 2013 at 9:42 PM, Grant Edwards wrote: The OP asked for a string, and I thought you were proposing the string 'null'. If one is to use a string, then 'NaN' makes the most sense, since it can be converted back into a floating

Re: Encoding NaN in JSON

2013-04-20 Thread Chris “Kwpolska” Warrick
On Fri, Apr 19, 2013 at 9:42 PM, Grant Edwards wrote: > The OP asked for a string, and I thought you were proposing the string > 'null'. If one is to use a string, then 'NaN' makes the most sense, > since it can be converted back into a floating point NaN object. > > I infer that you were proposi

Re: Encoding NaN in JSON

2013-04-19 Thread Miki Tebeka
> > You understand that this will result in a chunk of text that is not JSON? > I think he means something like this: > >>> json.dumps([float('nan')]) > '["N/A"]' That's exactly what I mean :) -- http://mail.python.org/mailman/listinfo/python-list

Re: Encoding NaN in JSON

2013-04-19 Thread Grant Edwards
On 2013-04-19, Chris ???Kwpolska??? Warrick wrote: > On Fri, Apr 19, 2013 at 4:54 PM, Grant Edwards > wrote: >> On 2013-04-18, Wayne Werner wrote: >>> On Wed, 17 Apr 2013, Miki Tebeka wrote: >>> >> I'm trying to find a way to have json emit float('NaN') as 'N/A'. > No. There is no way

Re: Encoding NaN in JSON

2013-04-19 Thread Chris “Kwpolska” Warrick
On Fri, Apr 19, 2013 at 4:54 PM, Grant Edwards wrote: > On 2013-04-18, Wayne Werner wrote: >> On Wed, 17 Apr 2013, Miki Tebeka wrote: >> > I'm trying to find a way to have json emit float('NaN') as 'N/A'. No. There is no way to represent NaN in JSON. It's simply not part of the sp

Re: Encoding NaN in JSON

2013-04-19 Thread Grant Edwards
On 2013-04-18, Wayne Werner wrote: > On Wed, 17 Apr 2013, Miki Tebeka wrote: > I'm trying to find a way to have json emit float('NaN') as 'N/A'. >>> No. There is no way to represent NaN in JSON. It's simply not part of the >>> specification. >> I know that. I'm trying to emit the *string* '

Re: Encoding NaN in JSON

2013-04-18 Thread Robert Kern
On 2013-04-19 10:34, Tim Roberts wrote: Miki Tebeka wrote: I'm trying to find a way to have json emit float('NaN') as 'N/A'. No. There is no way to represent NaN in JSON. It's simply not part of the specification. I know that. I'm trying to emit the *string* 'N/A' for every NaN. You un

Re: Encoding NaN in JSON

2013-04-18 Thread Tim Roberts
Miki Tebeka wrote: > >>> I'm trying to find a way to have json emit float('NaN') as 'N/A'. >> No. There is no way to represent NaN in JSON. It's simply not part of the >> specification. > >I know that. I'm trying to emit the *string* 'N/A' for every NaN. You understand that this will result in

Re: Encoding NaN in JSON

2013-04-18 Thread Wayne Werner
On Wed, 17 Apr 2013, Miki Tebeka wrote: I'm trying to find a way to have json emit float('NaN') as 'N/A'. No. There is no way to represent NaN in JSON. It's simply not part of the specification. I know that. I'm trying to emit the *string* 'N/A' for every NaN. Why not use `null` instead? I

Re: Encoding NaN in JSON

2013-04-18 Thread Roland Koebler
On Thu, Apr 18, 2013 at 11:46:37AM +1000, Chris Angelico wrote: > Wait... you can do that? It's internal to iterencode, at least in > Python 3.3 and 2.7 that I'm looking at here. In Python 2.6 it wasn't internal to iterencode; in Python 2.7 and 3.x you probably would have to monkey-patch iterencode

Re: Encoding NaN in JSON

2013-04-17 Thread Chris Angelico
On Thu, Apr 18, 2013 at 11:39 AM, Roland Koebler wrote: > as a quickhack, you > could even monkey patch json.encoder.floatstr with a wrapper which > returns "N/A" for NaN. (I've tested it: It works.) Wait... you can do that? It's internal to iterencode, at least in Python 3.3 and 2.7 that I'm loo

Re: Encoding NaN in JSON

2013-04-17 Thread Chris Angelico
On Thu, Apr 18, 2013 at 11:01 AM, Miki Tebeka wrote: > [Roland] >> yes, there is: subclass+extend the JSON-encoder, see pydoc json. > Please read the original post before answering. What you suggested does not > work since NaN is of float type. You may be able to override a bit more of the code,

Re: Encoding NaN in JSON

2013-04-17 Thread Roland Koebler
Hi, > > yes, there is: subclass+extend the JSON-encoder, see pydoc json. > Please read the original post before answering. What you suggested does not > work since NaN is of float type. ok, right, default does not work this way. But I would still suggest to extend the JSON-encoder, since that is

Re: Encoding NaN in JSON

2013-04-17 Thread Miki Tebeka
[Roland] > yes, there is: subclass+extend the JSON-encoder, see pydoc json. Please read the original post before answering. What you suggested does not work since NaN is of float type. -- http://mail.python.org/mailman/listinfo/python-list

Re: Encoding NaN in JSON

2013-04-17 Thread Roland Koebler
Hi, > > Easiest way is probably to transform your object before you try to write > Yeah, that's what I ended up doing. Wondered if there's a better way ... yes, there is: subclass+extend the JSON-encoder, see pydoc json. e.g.: class JsonNanEncoder(json.JSONEncoder): def default(self, obj):

Re: Encoding NaN in JSON

2013-04-17 Thread Dave Angel
On 04/17/2013 03:05 PM, Johann Hibschman wrote: Miki Tebeka writes: I'm trying to find a way to have json emit float('NaN') as 'N/A'. No. There is no way to represent NaN in JSON. It's simply not part of the specification. I know that. I'm trying to emit the *string* 'N/A' for every NaN.

Re: Encoding NaN in JSON

2013-04-17 Thread Miki Tebeka
> >>> I'm trying to find a way to have json emit float('NaN') as 'N/A'. > Easiest way is probably to transform your object before you try to write Yeah, that's what I ended up doing. Wondered if there's a better way ... Thanks, -- Miki -- http://mail.python.org/mailman/listinfo/python-list

Re: Encoding NaN in JSON

2013-04-17 Thread Johann Hibschman
Miki Tebeka writes: >>> I'm trying to find a way to have json emit float('NaN') as 'N/A'. >> No. There is no way to represent NaN in JSON. It's simply not part of the >> specification. > I know that. I'm trying to emit the *string* 'N/A' for every NaN. Easiest way is probably to transform your

Re: Encoding NaN in JSON

2013-04-17 Thread John Gordon
In Miki Tebeka writes: > >> I'm trying to find a way to have json emit float('NaN') as 'N/A'. > > No. There is no way to represent NaN in JSON. It's simply not part of the > > specification. > I know that. I'm trying to emit the *string* 'N/A' for every NaN. import math x = possibly_NaN()

Re: Encoding NaN in JSON

2013-04-17 Thread Miki Tebeka
>> I'm trying to find a way to have json emit float('NaN') as 'N/A'. > No. There is no way to represent NaN in JSON. It's simply not part of the > specification. I know that. I'm trying to emit the *string* 'N/A' for every NaN. -- http://mail.python.org/mailman/listinfo/python-list

Re: Encoding NaN in JSON

2013-04-16 Thread Tim Roberts
Miki Tebeka wrote: > >I'm trying to find a way to have json emit float('NaN') as 'N/A'. >I can't seem to find a way since NaN is a float, which means overriding >"default" won't help. > >Any simple way to do this? No. There is no way to represent NaN in JSON. It's simply not part of the specif

Re: Encoding problem in python

2013-03-04 Thread Vlastimil Brom
2013/3/4 : > I have a problem with encoding in python 27 shell. > > when i write this in the python shell: > > w=u'العربى' > > It gives me the following error: > > Unsupported characters in input > > any help? > -- > http://mail.python.org/mailman/listinfo/python-list Hi, I guess, you are using

Re: Encoding problem in python

2013-03-04 Thread Steven D'Aprano
On Mon, 04 Mar 2013 01:37:42 -0800, yomnasalah91 wrote: > I have a problem with encoding in python 27 shell. > > when i write this in the python shell: > > w=u'العربى' > > It gives me the following error: > > Unsupported characters in input > > any help? Firstly, please show the COMPLETE err

Re: Encoding problem in python

2013-03-04 Thread Laszlo Nagy
On 2013-03-04 10:37, yomnasala...@gmail.com wrote: I have a problem with encoding in python 27 shell. when i write this in the python shell: w=u'العربى' It gives me the following error: Unsupported characters in input any help? Maybe it is not Python related. Did you get an exception? Can yo

Re: encoding error in python 27

2013-02-24 Thread Peter Otten
Hala Gamal wrote: > thank you :)it worked well for small file but when i enter big file,, i > obtain this error: "Traceback (most recent call last): > File "D:\Python27\yarab (4).py", line 46, in > writer.add_document(**doc) > File "build\bdist.win32\egg\whoosh\filedb\filewriting.py", lin

Re: encoding error in python 27

2013-02-23 Thread Hala Gamal
thank you :)it worked well for small file but when i enter big file,, i obtain this error: "Traceback (most recent call last): File "D:\Python27\yarab (4).py", line 46, in writer.add_document(**doc) File "build\bdist.win32\egg\whoosh\filedb\filewriting.py", line 369, in add_document

Re: encoding error in python 27

2013-02-22 Thread MRAB
On 2013-02-22 14:55, Hala Gamal wrote: my code works well with english file but when i use text file encodede"utf-8" "my file contain some arabic letters" it doesn't work. my code: # encoding: utf-8 from whoosh import fields, index import os.path import re,string import codecs from whoosh.qparse

Re: encoding error in python 27

2013-02-22 Thread Peter Otten
Hala Gamal wrote: > my code works well with english file but when i use text file > encodede"utf-8" "my file contain some arabic letters" it doesn't work. my > code: > with codecs.open("tt.txt",encoding='utf-8') as txtfile: Try encoding="utf-8-sig" in the above to remove the byte order mark (B

Re: encoding error

2013-02-19 Thread Terry Reedy
On 2/19/2013 8:07 PM, halagamal2...@gmail.com wrote: UnicodeEncodeError: 'decimal' codec can't encode character u'\ufeff' in position 0: invalid decimal Unicode string I believe that is a byte-order mark, which should only be the first 2 bytes in the file and which should be removed if you use

Re: Encoding conundrum

2012-11-21 Thread Dave Angel
On 11/21/2012 06:24 AM, danielk wrote: > On Tuesday, November 20, 2012 6:03:47 PM UTC-5, Ian wrote: >>> >> >> In Linux, your terminal encoding is probably either UTF-8 or Latin-1, >> >> and either way it has no problems encoding that data for output. In a >> >> Windows cmd terminal, the default t

Re: Encoding conundrum

2012-11-21 Thread Nobody
On Wed, 21 Nov 2012 03:24:01 -0800, danielk wrote: >> >>> import sys >> >>> sys.stdout.encoding >> 'cp437' > > Hmmm. So THAT'S why I am only able to use 'cp437'. I had (mistakenly) > thought that I could just indicate whatever encoding I wanted, as long as > the codec supported it. sys.stdout.enc

Re: Encoding conundrum

2012-11-21 Thread danielk
On Tuesday, November 20, 2012 6:03:47 PM UTC-5, Ian wrote: > On Tue, Nov 20, 2012 at 2:49 PM, Daniel Klein wrote: > > > With the assistance of this group I am understanding unicode encoding issues > > > much better; especially when handling special characters that are outside of > > > the ASCII

Re: Encoding conundrum

2012-11-20 Thread Ian Kelly
On Tue, Nov 20, 2012 at 2:49 PM, Daniel Klein wrote: > With the assistance of this group I am understanding unicode encoding issues > much better; especially when handling special characters that are outside of > the ASCII range. I've got my application working perfectly now :-) > > However, I am

Re: Encoding conundrum

2012-11-20 Thread Dave Angel
On 11/20/2012 04:49 PM, Daniel Klein wrote: > With the assistance of this group I am understanding unicode encoding > issues much better; especially when handling special characters that are > outside of the ASCII range. I've got my application working perfectly now > :-) > > However, I am still co

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-08 Thread Nobody
On Wed, 05 Oct 2011 21:39:17 -0700, Greg wrote: > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. > fileObj = open(filePath,"r").read() > fileContent =

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread John Gordon
In xDog Walker writes: > What is this io of which you speak? It was introduced in Python 2.6. -- John Gordon A is for Amy, who fell down the stairs gor...@panix.com B is for Basil, assaulted by bears -- Edward Gorey, "The Gashl

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread xDog Walker
On Thursday 2011 October 06 10:41, jmfauth wrote: > or  (Python2/Python3) > > >>> import io > >>> with io.open('abc.txt', 'r', encoding='iso-8859-2') as f: > > ...     r = f.read() > ... > > >>> repr(r) > > u'a\nb\nc\n' > > >>> with io.open('def.txt', 'w', encoding='utf-8-sig') as f: > > ...     t

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread jmfauth
On 6 oct, 06:39, Greg wrote: > Brilliant! It worked. Thanks! > > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. > fileObj = open(filePath,"r").read() >

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread Chris Angelico
On Thu, Oct 6, 2011 at 8:29 PM, Ulrich Eckhardt wrote: > Just wondering, why do you split the latter two parts? I would have used > codecs.open() to open the file and define the encoding in a single step. Is > there a downside to this approach? > Those two steps still happen, even if you achieve

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-06 Thread Ulrich Eckhardt
Am 06.10.2011 05:40, schrieb Steven D'Aprano: (4) Do all your processing in Unicode, not bytes. (5) Encode the text into bytes using UTF-8 encoding. (6) Write the bytes to a file. Just wondering, why do you split the latter two parts? I would have used codecs.open() to open the file and defi

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Chris Angelico
On Thu, Oct 6, 2011 at 3:39 PM, Greg wrote: > Brilliant! It worked. Thanks! > > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. > fileContent = fileObj.

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Greg
Brilliant! It worked. Thanks! Here is the final code for those who are struggling with similar problems: ## open and decode file # In this case, the encoding comes from the charset argument in a meta tag # e.g. fileObj = open(filePath,"r").read() fileContent = fileObj.decode("iso-8859-2") fileSo

Re: encoding problem with BeautifulSoup - problem when writing parsed text to file

2011-10-05 Thread Steven D'Aprano
On Wed, 05 Oct 2011 16:35:59 -0700, Greg wrote: > Hi, I am having some encoding problems when I first parse stuff from a > non-english website using BeautifulSoup and then write the results to a > txt file. If you haven't already read this, you should do so: http://www.joelonsoftware.com/article

Re: Encoding problem when launching Python27 via DOS

2011-04-11 Thread Jean-Pierre M
Thanks a lot for this quick answer! It is very clear! Ti better understand what the difference between encoding and decoding is I found the following website: http://www.evanjones.ca/python-utf8.html I change the program to (changes are in bold): *# -*- c

Re: Encoding problem when launching Python27 via DOS

2011-04-10 Thread MRAB
On 10/04/2011 13:22, Jean-Pierre M wrote: > I created a simple program which writes in a unicode files some french text with accents! [snip] This line: l.p("premier message de Log à accents") passes a bytestring to the method, and inside the method, this line: unicode_str=u'%s : %s \n

Re: encoding hell - any chance of salvation ?

2011-03-08 Thread southof40
Thanks for both the suggestions. I haven't yet had time to try them out but will do so and report back. -- http://mail.python.org/mailman/listinfo/python-list

Re: encoding hell - any chance of salvation ?

2011-03-07 Thread Terry Reedy
On 3/7/2011 6:24 AM, southof40 wrote: Hi - I've got some code which uses array (http://docs.python.org/ library/array.html) to store charcters read from a file (it's not my code it comes from here http://sourceforge.net/projects/pygold/) The read is done, in GrammarReader.py, like this ...

Re: encoding hell - any chance of salvation ?

2011-03-07 Thread Tom Zych
southof40 wrote: > ... > result = array('u') > ... > ... and results in the error"TypeError: array item must be unicode > character" is raised (full stack trace at bottom) . > ... > Can anyone make a suggestion as to the best way to allow the array > object to accept what is in essence a bi

Re: encoding

2011-02-14 Thread Adam Tauno Williams
On Mon, 2011-02-14 at 13:03 -0500, Verde Denim wrote: > On Mon, Feb 14, 2011 at 12:46 PM, MRAB > wrote: > On 14/02/2011 17:10, Verde Denim wrote: > All > I'm a bit new to py coding and need to setup some code to > encode/decode > base 128. > Anyone here have some info they can poi

Re: encoding

2011-02-14 Thread MRAB
On 14/02/2011 18:03, Verde Denim wrote: On Mon, Feb 14, 2011 at 12:46 PM, MRAB mailto:pyt...@mrabarnett.plus.com>> wrote: On 14/02/2011 17:10, Verde Denim wrote: All I'm a bit new to py coding and need to setup some code to encode/decode base 128. A

Re: encoding

2011-02-14 Thread Verde Denim
On Mon, Feb 14, 2011 at 12:46 PM, MRAB wrote: > On 14/02/2011 17:10, Verde Denim wrote: > >> All >> I'm a bit new to py coding and need to setup some code to encode/decode >> base 128. >> Anyone here have some info they can point me to do get this done? I've >> been looking around on the web for

Re: encoding

2011-02-14 Thread Verde Denim
On Mon, Feb 14, 2011 at 12:35 PM, Ian Kelly wrote: > On Mon, Feb 14, 2011 at 10:10 AM, Verde Denim wrote: > > All > > I'm a bit new to py coding and need to setup some code to encode/decode > base > > 128. > > Anyone here have some info they can point me to do get this done? I've > been > > look

Re: encoding

2011-02-14 Thread MRAB
On 14/02/2011 17:10, Verde Denim wrote: All I'm a bit new to py coding and need to setup some code to encode/decode base 128. Anyone here have some info they can point me to do get this done? I've been looking around on the web for a few days and can't seem to lay my hands on anything definitive.

  1   2   3   >