Martin v. Löwis wrote:
> Well, if the document is UTF-8, you should decode it as UTF-8, of
> course.
Thanks. This and:
http://en.wikipedia.org/wiki/UTF-8
solved my problem with understanding the encoding.
Anton
proof that I understand it now (please anyone, prove me wrong if you can):
from z
Anton Vredegoor wrote:
>> So if that is the case: What is the problem then? If you interpret
>> the document as cp1252, and it contains \x93 and \x94, what is
>> it that you don't like about that? In yet other words: what actions
>> are you performing, what are the results you expect to get, and
>>
Martin v. Löwis wrote:
> So if that is the case: What is the problem then? If you interpret
> the document as cp1252, and it contains \x93 and \x94, what is
> it that you don't like about that? In yet other words: what actions
> are you performing, what are the results you expect to get, and
> wha
Anton Vredegoor wrote:
> Serge Orlov wrote:
>
> > Anton Vredegoor wrote:
> >> In fact there are a lot of printable things that haven't got a text
> >> attribute, for example some items with tag ()s.
> >
> > In my sample file I see , is that you're talking
> > about? Since my file is small I ca
Anton Vredegoor wrote:
> Anton Vredegoor wrote:
>
> > So, probably yes. If it doesn't have a text attribrute if you iterate
> > over it using OOopy for example:
>
> Sorry about that, I meant if the text attribute is None, but there *is*
> some text.
OK, I think I understand what you're talking ab
Anton Vredegoor wrote:
> Anton Vredegoor wrote:
>
> > So, probably yes. If it doesn't have a text attribrute if you iterate
> > over it using OOopy for example:
>
> Sorry about that, I meant if the text attribute is None, but there *is*
> some text.
OK, I think I understand what you're talking ab
Anton Vredegoor wrote:
> So, probably yes. If it doesn't have a text attribrute if you iterate
> over it using OOopy for example:
Sorry about that, I meant if the text attribute is None, but there *is*
some text.
Anton
--
http://mail.python.org/mailman/listinfo/python-list
Anton Vredegoor wrote:
> Serge Orlov wrote:
>
> > Anton Vredegoor wrote:
> >> In fact there are a lot of printable things that haven't got a text
> >> attribute, for example some items with tag ()s.
> >
> > In my sample file I see , is that you're talking
> > about? Since my file is small I ca
Serge Orlov wrote:
> Anton Vredegoor wrote:
>> In fact there are a lot of printable things that haven't got a text
>> attribute, for example some items with tag ()s.
>
> In my sample file I see , is that you're talking
> about? Since my file is small I can say for sure this tag represents
> t
Anton Vredegoor wrote:
> Serge Orlov wrote:
>
> > I extracted content.xml from a test file and the header is:
> >
> >
> > So any xml library should handle it just fine, without you trying to
> > guess the encoding.
>
> Yes my header also says UTF-8. However some kind person send me an
> e-mail sta
Richard Brodie wrote:
> "Anton Vredegoor" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>
>> Yes my header also says UTF-8. However some kind person send me an e-mail
>> stating that
>> since I am getting \x94 and such output when using repr (even if str is
>> giving correct
Anton Vredegoor wrote:
> In fact there are a lot of printable things that haven't got a text
> attribute, for example some items with tag ()s.
In my sample file I see , is that you're talking
about? Since my file is small I can say for sure this tag represents
two space characters.
--
http:
On 28/04/2006 9:21 PM, Anton Vredegoor wrote:
> Serge Orlov wrote:
>
>> I extracted content.xml from a test file and the header is:
>>
>>
>> So any xml library should handle it just fine, without you trying to
>> guess the encoding.
>
> Yes my header also says UTF-8. However some kind person sen
"Anton Vredegoor" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Yes my header also says UTF-8. However some kind person send me an e-mail
> stating that
> since I am getting \x94 and such output when using repr (even if str is
> giving correct
> output) there could be some pr
Serge Orlov wrote:
> I extracted content.xml from a test file and the header is:
>
>
> So any xml library should handle it just fine, without you trying to
> guess the encoding.
Yes my header also says UTF-8. However some kind person send me an
e-mail stating that since I am getting \x94 and s
John Machin wrote:
> Firstly, this should be 'content.xml', not 'contents.xml'.
Right, the code doesn't do *anything* :-( Thanks for pointing that out.
At least it doesn't do much harm either :-|
> Secondly, as pointed out by Sergei, the data is encoded by OOo as UTF-8
> e.g. what is '\x94' in
On 27/04/2006 12:49 AM, Anton Vredegoor wrote:
> Fredrik Lundh wrote:
>
>> Anton Vredegoor wrote:
>>
>>> I'm trying to import text from an open office document (save as .sxw and
>>> read the data from content.xml inside the sxw-archive using
>>> elementtree and such tools).
>>>
>>> The encoding t
Anton Vredegoor wrote:
>> Not sure I understand the question. If you process data in cp1252,
>> then \x94 and \x94 are legal characters, and the Python codec should
>> support them just fine.
>
> Tell that to the guys from open-office.
Ok, I'll rephrase: Can you please explain your problem again,
Anton Vredegoor wrote:
> I'm trying to import text from an open office document (save as .sxw and
> read the data from content.xml inside the sxw-archive using
> elementtree and such tools).
>
> The encoding that gives me the least problems seems to be cp1252,
> however it's not completely perfe
Martin v. Löwis wrote:
> Not sure I understand the question. If you process data in cp1252,
> then \x94 and \x94 are legal characters, and the Python codec should
> support them just fine.
Tell that to the guys from open-office.
Anton
--
http://mail.python.org/mailman/listinfo/python-list
Anton Vredegoor wrote:
> The encoding that gives me the least problems seems to be cp1252,
> however it's not completely perfect because there are still characters
> in it like \93 or \94. Has anyone handled this before? I'd rather not
> reinvent the wheel and start translating strings 'by hand'.
Fredrik Lundh wrote:
> Anton Vredegoor wrote:
>
>> I'm trying to import text from an open office document (save as .sxw and
>> read the data from content.xml inside the sxw-archive using
>> elementtree and such tools).
>>
>> The encoding that gives me the least problems seems to be cp1252,
>> ho
Anton Vredegoor wrote:
> I'm trying to import text from an open office document (save as .sxw and
> read the data from content.xml inside the sxw-archive using
> elementtree and such tools).
>
> The encoding that gives me the least problems seems to be cp1252,
> however it's not completely perfec
23 matches
Mail list logo