Martin v. Löwis wrote:
> Well, if the document is UTF-8, you should decode it as UTF-8, of
> course.
Thanks. This and:
http://en.wikipedia.org/wiki/UTF-8
solved my problem with understanding the encoding.
Anton
proof that I understand it now (please anyone, prove me wrong if you can):
from z
Anton Vredegoor wrote:
>> So if that is the case: What is the problem then? If you interpret
>> the document as cp1252, and it contains \x93 and \x94, what is
>> it that you don't like about that? In yet other words: what actions
>> are you performing, what are the results you expect to get, and
>>
Martin v. Löwis wrote:
> So if that is the case: What is the problem then? If you interpret
> the document as cp1252, and it contains \x93 and \x94, what is
> it that you don't like about that? In yet other words: what actions
> are you performing, what are the results you expect to get, and
> wha
Anton Vredegoor wrote:
> Serge Orlov wrote:
>
> > Anton Vredegoor wrote:
> >> In fact there are a lot of printable things that haven't got a text
> >> attribute, for example some items with tag ()s.
> >
> > In my sample file I see , is that you're talking
> > about? Since my file is small I ca
Anton Vredegoor wrote:
> Anton Vredegoor wrote:
>
> > So, probably yes. If it doesn't have a text attribrute if you iterate
> > over it using OOopy for example:
>
> Sorry about that, I meant if the text attribute is None, but there *is*
> some text.
OK, I think I understand what you're talking ab
Anton Vredegoor wrote:
> Anton Vredegoor wrote:
>
> > So, probably yes. If it doesn't have a text attribrute if you iterate
> > over it using OOopy for example:
>
> Sorry about that, I meant if the text attribute is None, but there *is*
> some text.
OK, I think I understand what you're talking ab
Anton Vredegoor wrote:
> So, probably yes. If it doesn't have a text attribrute if you iterate
> over it using OOopy for example:
Sorry about that, I meant if the text attribute is None, but there *is*
some text.
Anton
--
http://mail.python.org/mailman/listinfo/python-list
Anton Vredegoor wrote:
> Serge Orlov wrote:
>
> > Anton Vredegoor wrote:
> >> In fact there are a lot of printable things that haven't got a text
> >> attribute, for example some items with tag ()s.
> >
> > In my sample file I see , is that you're talking
> > about? Since my file is small I ca
Serge Orlov wrote:
> Anton Vredegoor wrote:
>> In fact there are a lot of printable things that haven't got a text
>> attribute, for example some items with tag ()s.
>
> In my sample file I see , is that you're talking
> about? Since my file is small I can say for sure this tag represents
> t
Anton Vredegoor wrote:
> Serge Orlov wrote:
>
> > I extracted content.xml from a test file and the header is:
> >
> >
> > So any xml library should handle it just fine, without you trying to
> > guess the encoding.
>
> Yes my header also says UTF-8. However some kind person send me an
> e-mail sta
Richard Brodie wrote:
> "Anton Vredegoor" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>
>> Yes my header also says UTF-8. However some kind person send me an e-mail
>> stating that
>> since I am getting \x94 and such output when using repr (even if str is
>> giving correct
Anton Vredegoor wrote:
> In fact there are a lot of printable things that haven't got a text
> attribute, for example some items with tag ()s.
In my sample file I see , is that you're talking
about? Since my file is small I can say for sure this tag represents
two space characters.
--
http:
JM>> No, not quite. If you saw \x94 in the repr() output, but it looked
"OK" when displayed using str(), then the only reasonable hypotheses are
(a) the data was in an 8-bit string, presumably encoded as cp1252
(definitely NOT UTF-8), rather than a Unicode string (b) yo
"Anton Vredegoor" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Yes my header also says UTF-8. However some kind person send me an e-mail
> stating that
> since I am getting \x94 and such output when using repr (even if str is
> giving correct
> output) there could be some pr
Serge Orlov wrote:
> I extracted content.xml from a test file and the header is:
>
>
> So any xml library should handle it just fine, without you trying to
> guess the encoding.
Yes my header also says UTF-8. However some kind person send me an
e-mail stating that since I am getting \x94 and s
4 codes were left inside the document.
But that was an *artifact*, because if one prints something using
s.__repr__() as is used for example when printing a list of strings
(duh) the output is not the same as when one prints with 'print s'. I
guess what is called then is str(s).
On 27/04/2006 12:49 AM, Anton Vredegoor wrote:
> Fredrik Lundh wrote:
>
>> Anton Vredegoor wrote:
>>
>>> I'm trying to import text from an open office document (save as .sxw and
>>> read the data from content.xml inside the sxw-archive using
>>> elementtree and such tools).
>>>
>>> The encoding t
Anton Vredegoor wrote:
>> Not sure I understand the question. If you process data in cp1252,
>> then \x94 and \x94 are legal characters, and the Python codec should
>> support them just fine.
>
> Tell that to the guys from open-office.
Ok, I'll rephrase: Can you please explain your problem again,
Anton Vredegoor wrote:
> I'm trying to import text from an open office document (save as .sxw and
> read the data from content.xml inside the sxw-archive using
> elementtree and such tools).
>
> The encoding that gives me the least problems seems to be cp1252,
> however it's not completely perfe
Martin v. Löwis wrote:
> Not sure I understand the question. If you process data in cp1252,
> then \x94 and \x94 are legal characters, and the Python codec should
> support them just fine.
Tell that to the guys from open-office.
Anton
--
http://mail.python.org/mailman/listinfo/python-list
Anton Vredegoor wrote:
> The encoding that gives me the least problems seems to be cp1252,
> however it's not completely perfect because there are still characters
> in it like \93 or \94. Has anyone handled this before? I'd rather not
> reinvent the wheel and start translating strings 'by hand'.
Fredrik Lundh wrote:
> Anton Vredegoor wrote:
>
>> I'm trying to import text from an open office document (save as .sxw and
>> read the data from content.xml inside the sxw-archive using
>> elementtree and such tools).
>>
>> The encoding that gives me the least problems seems to be cp1252,
>> ho
Anton Vredegoor wrote:
> I'm trying to import text from an open office document (save as .sxw and
> read the data from content.xml inside the sxw-archive using
> elementtree and such tools).
>
> The encoding that gives me the least problems seems to be cp1252,
> however it's not completely perfec
I'm trying to import text from an open office document (save as .sxw and
read the data from content.xml inside the sxw-archive using
elementtree and such tools).
The encoding that gives me the least problems seems to be cp1252,
however it's not completely perfect because there are still chara
24 matches
Mail list logo