Dave Angel writes:
> But I wanted to comment on the (c) remark. If you're in the US,
> that's the wrong abbreviation for copyright. The only recognized
> abbreviation is (copr).
More reading on this:
http://en.wikipedia.org/wiki/Universal_Copyright_Convention>
http://en.wikipedia.org/
Robert Dailey wrote:
Hello,
I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():
UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: character maps t
On Thu, Aug 6, 2009 at 12:41 PM, Robert Dailey wrote:
> On Aug 6, 11:31 am, "Richard Brodie" wrote:
>> "Robert Dailey" wrote in message
>>
>> news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com...
>>
>> > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
>> >
On Aug 6, 2009, at 3:14 PM, Martin v. Löwis wrote:
As a side note, you should probably use something other than "file"
for
the parameter name in GetFileContentsAsString() since file() is a
Python
function.
Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
[GCC 4.3.3] on linux2
Type "help
> As a side note, you should probably use something other than "file" for
> the parameter name in GetFileContentsAsString() since file() is a Python
> function.
Python 3.1.1a0 (py3k:74094, Jul 19 2009, 13:39:42)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more inform
On Thu, 06 Aug 2009 09:14:08 -0700, Robert Dailey wrote:
> I'm loading a file via open() in Python 3.1 and I'm getting the
> following error when I try to print the contents of the file that I
> obtained through a call to read():
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\xa
"Robert Dailey" wrote in message
news:f64f9830-c416-41b1-a510-c1e486271...@g19g2000vbi.googlegroups.com...
> As you can see, I am trying to load the file with encoding 'cp1252'
> which, according to the python 3.1 docs, translates to windows-1252. I
> also tried 'latin_1', which translates to I
On Aug 6, 2009, at 12:41 PM, Robert Dailey wrote:
On Aug 6, 11:31 am, "Richard Brodie" wrote:
"Robert Dailey" wrote in message
news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com
...
UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
position 1650: c
On Thu, 2009-08-06 at 09:14 -0700, Robert Dailey wrote:
> Hello,
>
> I'm loading a file via open() in Python 3.1 and I'm getting the
> following error when I try to print the contents of the file that I
> obtained through a call to read():
>
> UnicodeEncodeError: 'charmap' codec can't encode char
On Aug 6, 11:31 am, "Richard Brodie" wrote:
> "Robert Dailey" wrote in message
>
> news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com...
>
> > UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
> > position 1650: character maps to
>
> > The file is defined a
"Robert Dailey" wrote in message
news:29ab0981-b95d-4435-91bd-a7a520419...@b15g2000yqd.googlegroups.com...
> UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
> position 1650: character maps to
>
> The file is defined as ASCII.
That's the problem: ASCII is a seven bit code.
On Aug 6, 2009, at 12:14 PM, Robert Dailey wrote:
Hello,
I'm loading a file via open() in Python 3.1 and I'm getting the
following error when I try to print the contents of the file that I
obtained through a call to read():
UnicodeEncodeError: 'charmap' codec can't encode character '\xa9' in
mp wrote:
> I have html document titles with characters like >, , and
> ‡. How do I decode a string with these values in Python?
>
> Thanks
>
>
This is definitely the most FAQ. It comes up about once a week.
The stream-editing way is like this:
>>> import SE
>>> HTM_Decoder = SE.SE ('htm2is
Dennis Lee Bieber wrote:
> On 7 Nov 2006 11:34:32 -0800, "mp" <[EMAIL PROTECTED]> declaimed the
> following in comp.lang.python:
>
> > I have html document titles with characters like >, , and
> > ‡. How do I sddecode a string with these values in Python?
> >
>
> Wouldn't HTMLParser be suit
At Tuesday 7/11/2006 17:10, mp wrote:
I'd prefer a more generalized solution which takes care of all possible
ampersand characters. I assume that there is code already written which
does this.
Try the htmlentitydefs module
--
Gabriel Genellina
Softlab SRL
_
I'd prefer a more generalized solution which takes care of all possible
ampersand characters. I assume that there is code already written which
does this.
Thanks
i80and wrote:
> I would suggest using string.replace. Simply replace ' ' with ' '
> for each time it occurs. It doesn't take too much
I would suggest using string.replace. Simply replace ' ' with ' '
for each time it occurs. It doesn't take too much code.
On Nov 7, 1:34 pm, "mp" <[EMAIL PROTECTED]> wrote:
> I have html document titles with characters like >, , and
> ‡. How do I decode a string with these values in Python?
>
Max M wrote:
A smiple way to try out different encodings in a given order:
The loop is fine - although ('UTF-8', 'Latin-1', 'ASCII') is
somewhat redundant. The 'ASCII' case is never considered, since
Latin-1 effectively works as a catch-all encoding (as all byte
sequences can be considered Latin-1
Christian Ergh wrote:
Once more, indention should be correct now, and the 128 is gone too. So,
something like this?
Yes, something like this. The tricky part is of, course, then the
fragments which you didn't implement.
Also, it might be possible to do this in a for loop, e.g.
for encoding in (pag
Forgot a part... You need the encoding list:
encodings = [
'utf-8',
'latin-1',
'ascii',
'cp1252',
]
Christian Ergh wrote:
Dylan wrote:
Here's what I'm trying to do:
- scrape some html content from various sources
The issue I'm running to:
- some of the sources have incorrectly e
Dylan wrote:
Here's what I'm trying to do:
- scrape some html content from various sources
The issue I'm running to:
- some of the sources have incorrectly encoded characters... for
example, cp1252 curly quotes that were likely the result of the author
copying and pasting content from Word
Finally:
- snip -
def get_encoded(st, encodings):
"Returns an encoding that doesn't fail"
for encoding in encodings:
try:
st_encoded = st.decode(encoding)
return st_encoded, encoding
except UnicodeError:
pass
-snip-
This works fine, but after this
Christian Ergh wrote:
A smiple way to try out different encodings in a given order:
# -*- coding: latin-1 -*-
def get_encoded(st, encodings):
"Returns an encoding that doesn't fail"
for encoding in encodings:
try:
st_encoded = st.decode(encoding)
return st_en
Once more, indention should be correct now, and the 128 is gone too. So,
something like this?
Chris
import urllib2
url = 'www.someurl.com'
f = urllib2.urlopen(url)
data = f.read()
# if it is not in the pagecode, how do i get the encoding of the page?
pageencoding = '???'
xmlencoding = 'whatever
Peter Otten wrote:
Steven Bethard wrote:
Christian Ergh wrote:
flag = true
for char in data:
if 127 < ord(char) < 128:
flag = false
if flag:
try:
data = data.encode('latin-1')
except:
pass
A little OT, but (assuming I got your indentation right[1]) this kind of
loop i
Martin v. Löwis wrote:
Dylan wrote:
Things I have tried include encode()/decode()
This should work. If you somehow manage to guess the encoding,
e.g. guess it as cp1252, then
htmlstring.decode("cp1252").encode("us-ascii", "xmlcharrefreplace")
will give you a file that contains only ASCII charact
Steven Bethard wrote:
> Christian Ergh wrote:
>> flag = true
>> for char in data:
>> if 127 < ord(char) < 128:
>> flag = false
>> if flag:
>> try:
>> data = data.encode('latin-1')
>> except:
>> pass
>
> A little OT, but (assuming I got your indentation right[1]
Christian Ergh wrote:
flag = true
for char in data:
if 127 < ord(char) < 128:
flag = false
if flag:
try:
data = data.encode('latin-1')
except:
pass
A little OT, but (assuming I got your indentation right[1]) this kind of
loop is exactly what the else clause of a
Christian Ergh wrote:
- it works with the characters i mentioned
It does.
- what encoding do you have in the end
US-ASCII
- and how exactly are you doing all this? All with somestring.decode()
or... Can you please give an example for these 7 steps?
I could, but I don't have the time - just try to
Martin v. Löwis wrote:
Dylan wrote:
Things I have tried include encode()/decode()
This should work. If you somehow manage to guess the encoding,
e.g. guess it as cp1252, then
htmlstring.decode("cp1252").encode("us-ascii", "xmlcharrefreplace")
will give you a file that contains only ASCII charact
Dylan wrote:
Things I have tried include encode()/decode()
This should work. If you somehow manage to guess the encoding,
e.g. guess it as cp1252, then
htmlstring.decode("cp1252").encode("us-ascii", "xmlcharrefreplace")
will give you a file that contains only ASCII characters, and
character refer
31 matches
Mail list logo