subject:"Re\: Character encoding"

Re: Character encoding & the copyright symbol

2009-08-13 Thread Ben Finney

Dave Angel writes: > But I wanted to comment on the (c) remark. If you're in the US, > that's the wrong abbreviation for copyright. The only recognized > abbreviation is (copr). More reading on this: http://en.wikipedia.org/wiki/Universal_Copyright_Convention> http://en.wikipedia.org/

Re: Character encoding & the copyright symbol

2009-08-06 Thread Dave Angel

Robert Dailey wrote: Hello, I'm loading a file via open() in Python 3.1 and I'm getting the following error when I try to print the contents of the file that I obtained through a call to read(): UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in position 1650: character maps t

Re: Character encoding & the copyright symbol

2009-08-06 Thread Benjamin Kaplan

On Thu, Aug 6, 2009 at 12:41 PM, Robert Dailey wrote: > On Aug 6, 11:31 am, "Richard Brodie" wrote: >> "Robert Dailey" wrote in message >> >> news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com... >> >> > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in >> >

Re: Character encoding & the copyright symbol

2009-08-06 Thread Philip Semanchuk

On Aug 6, 2009, at 3:14 PM, Martin v. Löwis wrote: As a side note, you should probably use something other than "file" for the parameter name in GetFileContentsAsString() since file() is a Python function. Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42) [GCC 4.3.3] on linux2 Type "help

Re: Character encoding & the copyright symbol

2009-08-06 Thread Martin v. Löwis

> As a side note, you should probably use something other than "file" for > the parameter name in GetFileContentsAsString() since file() is a Python > function. Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42) [GCC 4.3.3] on linux2 Type "help", "copyright", "credits" or "license" for more inform

Re: Character encoding & the copyright symbol

2009-08-06 Thread Nobody

On Thu, 06 Aug 2009 09:14:08 -0700, Robert Dailey wrote: > I'm loading a file via open() in Python 3.1 and I'm getting the > following error when I try to print the contents of the file that I > obtained through a call to read(): > > UnicodeEncodeError: 'charmap' codec can't encode character '\xa

Re: Character encoding & the copyright symbol

2009-08-06 Thread Richard Brodie

"Robert Dailey" wrote in message news:f64f9830-c416-41b1-a510-c1e486271...@g19g2000vbi.googlegroups.com... > As you can see, I am trying to load the file with encoding 'cp1252' > which, according to the python 3.1 docs, translates to windows-1252. I > also tried 'latin_1', which translates to I

Re: Character encoding & the copyright symbol

2009-08-06 Thread Philip Semanchuk

On Aug 6, 2009, at 12:41 PM, Robert Dailey wrote: On Aug 6, 11:31 am, "Richard Brodie" wrote: "Robert Dailey" wrote in message news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com ... UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in position 1650: c

Re: Character encoding & the copyright symbol

2009-08-06 Thread Albert Hopkins

On Thu, 2009-08-06 at 09:14 -0700, Robert Dailey wrote: > Hello, > > I'm loading a file via open() in Python 3.1 and I'm getting the > following error when I try to print the contents of the file that I > obtained through a call to read(): > > UnicodeEncodeError: 'charmap' codec can't encode char

Re: Character encoding & the copyright symbol

2009-08-06 Thread Robert Dailey

On Aug 6, 11:31 am, "Richard Brodie" wrote: > "Robert Dailey" wrote in message > > news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com... > > > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in > > position 1650: character maps to > > > The file is defined a

Re: Character encoding & the copyright symbol

2009-08-06 Thread Richard Brodie

"Robert Dailey" wrote in message news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com... > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in > position 1650: character maps to > > The file is defined as ASCII. That's the problem: ASCII is a seven bit code.

Re: Character encoding & the copyright symbol

2009-08-06 Thread Philip Semanchuk

On Aug 6, 2009, at 12:14 PM, Robert Dailey wrote: Hello, I'm loading a file via open() in Python 3.1 and I'm getting the following error when I try to print the contents of the file that I obtained through a call to read(): UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in

Re: Character encoding

2006-11-08 Thread Frederic Rentsch

mp wrote: > I have html document titles with characters like >, , and > ‡. How do I decode a string with these values in Python? > > Thanks > > This is definitely the most FAQ. It comes up about once a week. The stream-editing way is like this: >>> import SE >>> HTM_Decoder = SE.SE ('htm2is

Re: Character encoding

2006-11-08 Thread [EMAIL PROTECTED]

Dennis Lee Bieber wrote: > On 7 Nov 2006 11:34:32 -0800, "mp" <[EMAIL PROTECTED]> declaimed the > following in comp.lang.python: > > > I have html document titles with characters like >, , and > > ‡. How do I sddecode a string with these values in Python? > > > > Wouldn't HTMLParser be suit

Re: Character encoding

2006-11-07 Thread Gabriel Genellina

At Tuesday 7/11/2006 17:10, mp wrote: I'd prefer a more generalized solution which takes care of all possible ampersand characters. I assume that there is code already written which does this. Try the htmlentitydefs module -- Gabriel Genellina Softlab SRL _

Re: Character encoding

2006-11-07 Thread mp

I'd prefer a more generalized solution which takes care of all possible ampersand characters. I assume that there is code already written which does this. Thanks i80and wrote: > I would suggest using string.replace. Simply replace ' ' with ' ' > for each time it occurs. It doesn't take too much

Re: Character encoding

2006-11-07 Thread i80and

I would suggest using string.replace. Simply replace ' ' with ' ' for each time it occurs. It doesn't take too much code. On Nov 7, 1:34 pm, "mp" <[EMAIL PROTECTED]> wrote: > I have html document titles with characters like >, , and > ‡. How do I decode a string with these values in Python? >

Re: character encoding conversion

2004-12-13 Thread "Martin v. Löwis"

Max M wrote: A smiple way to try out different encodings in a given order: The loop is fine - although ('UTF-8', 'Latin-1', 'ASCII') is somewhat redundant. The 'ASCII' case is never considered, since Latin-1 effectively works as a catch-all encoding (as all byte sequences can be considered Latin-1

Re: character encoding conversion

2004-12-13 Thread "Martin v. Löwis"

Christian Ergh wrote: Once more, indention should be correct now, and the 128 is gone too. So, something like this? Yes, something like this. The tricky part is of, course, then the fragments which you didn't implement. Also, it might be possible to do this in a for loop, e.g. for encoding in (pag

Re: character encoding conversion

2004-12-13 Thread Christian Ergh

Forgot a part... You need the encoding list: encodings = [ 'utf-8', 'latin-1', 'ascii', 'cp1252', ] Christian Ergh wrote: Dylan wrote: Here's what I'm trying to do: - scrape some html content from various sources The issue I'm running to: - some of the sources have incorrectly e

Re: character encoding conversion

2004-12-13 Thread Christian Ergh

Dylan wrote: Here's what I'm trying to do: - scrape some html content from various sources The issue I'm running to: - some of the sources have incorrectly encoded characters... for example, cp1252 curly quotes that were likely the result of the author copying and pasting content from Word Finally:

Re: character encoding conversion

2004-12-13 Thread Christian Ergh

- snip - def get_encoded(st, encodings): "Returns an encoding that doesn't fail" for encoding in encodings: try: st_encoded = st.decode(encoding) return st_encoded, encoding except UnicodeError: pass -snip- This works fine, but after this

Re: character encoding conversion

2004-12-13 Thread Max M

Christian Ergh wrote: A smiple way to try out different encodings in a given order: # -*- coding: latin-1 -*- def get_encoded(st, encodings): "Returns an encoding that doesn't fail" for encoding in encodings: try: st_encoded = st.decode(encoding) return st_en

Re: character encoding conversion

2004-12-13 Thread Christian Ergh

Once more, indention should be correct now, and the 128 is gone too. So, something like this? Chris import urllib2 url = 'www.someurl.com' f = urllib2.urlopen(url) data = f.read() # if it is not in the pagecode, how do i get the encoding of the page? pageencoding = '???' xmlencoding = 'whatever

Re: character encoding conversion

2004-12-13 Thread Christian Ergh

Peter Otten wrote: Steven Bethard wrote: Christian Ergh wrote: flag = true for char in data: if 127 < ord(char) < 128: flag = false if flag: try: data = data.encode('latin-1') except: pass A little OT, but (assuming I got your indentation right[1]) this kind of loop i

Re: character encoding conversion

2004-12-13 Thread Christian Ergh

Martin v. Löwis wrote: Dylan wrote: Things I have tried include encode()/decode() This should work. If you somehow manage to guess the encoding, e.g. guess it as cp1252, then htmlstring.decode("cp1252").encode("us-ascii", "xmlcharrefreplace") will give you a file that contains only ASCII charact

Re: character encoding conversion

2004-12-13 Thread Peter Otten

Steven Bethard wrote: > Christian Ergh wrote: >> flag = true >> for char in data: >> if 127 < ord(char) < 128: >> flag = false >> if flag: >> try: >> data = data.encode('latin-1') >> except: >> pass > > A little OT, but (assuming I got your indentation right[1]

Re: character encoding conversion

2004-12-13 Thread Steven Bethard

Christian Ergh wrote: flag = true for char in data: if 127 < ord(char) < 128: flag = false if flag: try: data = data.encode('latin-1') except: pass A little OT, but (assuming I got your indentation right[1]) this kind of loop is exactly what the else clause of a

Re: character encoding conversion

2004-12-12 Thread "Martin v. Löwis"

Christian Ergh wrote: - it works with the characters i mentioned It does. - what encoding do you have in the end US-ASCII - and how exactly are you doing all this? All with somestring.decode() or... Can you please give an example for these 7 steps? I could, but I don't have the time - just try to

Re: character encoding conversion

2004-12-12 Thread Christian Ergh

Martin v. Löwis wrote: Dylan wrote: Things I have tried include encode()/decode() This should work. If you somehow manage to guess the encoding, e.g. guess it as cp1252, then htmlstring.decode("cp1252").encode("us-ascii", "xmlcharrefreplace") will give you a file that contains only ASCII charact

Re: character encoding conversion

2004-12-12 Thread "Martin v. Löwis"

Dylan wrote: Things I have tried include encode()/decode() This should work. If you somehow manage to guess the encoding, e.g. guess it as cp1252, then htmlstring.decode("cp1252").encode("us-ascii", "xmlcharrefreplace") will give you a file that contains only ASCII characters, and character refer

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding & the copyright symbol

Re: Character encoding

Re: Character encoding

Re: Character encoding

Re: Character encoding

Re: Character encoding

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

Re: character encoding conversion

31 matches

Site Navigation

Mail list logo

Footer information