Carsten Haese:
> If you want to convey an arbitrary sequence of bytes as if they were
> characters, you need to pick a character encoding that can handle an
> arbitrary sequence of bytes. utf-8 can not do that. ISO-8859-1 can, but
> you need to specify the encoding explicitly. Observe what happens
On Fri, 2007-05-25 at 17:30 +0300, Maksim Kasimov wrote:
> I insist - my message is correct and not contradicts no any point of w3.org
> xml-specification.
The fact that you believe this so strongly and we disagree just as
strongly indicates a fundamental misunderstanding. Your fundamental
misund
Jarek Zgoda:
>
> No, it is not a part of string. It's a part of byte stream, split in a
> middle of multibyte-encoded character.
>
> You cann't get only dot from small letter "i" and ask the parser to
> treat it as a complete "i".
>
... i know it :))
can you propose something to solve it? ;)
Maksim Kasimov napisał(a):
>> 'utf8' codec can't decode bytes in position 176-177: invalid data
> iMessage[176:178]
>> '\xd1]'
>>
>> And that's your problem. In general you can't just truncate a utf-8
>> encoded string anywhere and expect the result to be valid utf-8. The
>> \xd1 at the very e
Richard Brodie пишет:
> "Neil Cerutti" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>
>> Web browsers are in the very business of reasonably rendering
>> ill-formed mark-up. It's one of the things that makes
>> implementing a browser take forever. ;)
>
> For HTML, yes. it accept
[EMAIL PROTECTED] :
> You need to explicitly convert the string of UTF8 encoded bytes to a
> Unicode string before parsing e.g.
> unicodestring = unicode(encodedbytes, 'utf8')
it is only a part of a string - not hole string, i've wrote it before.
That meens that the content can not be converted t
Carsten Haese:
> On Fri, 2007-05-25 at 04:03 -0700, sim.sim wrote:
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 176-177:
> invalid data
iMessage[176:178]
> '\xd1]'
>
> And that's your problem. In general you can't just truncate a utf-8
> encoded string anywhere and ex
Richard Brodie ha scritto:
> For HTML, yes. it accepts all sorts of garbage, like most
> browsers; I've never, before now, seen it accept an invalid
> XML document though.
It *could* depend on Content-Type. I've seen that Firefox treats XHTML
as HTML (i.e. not trying to validate it) if you set Co
"Neil Cerutti" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Web browsers are in the very business of reasonably rendering
> ill-formed mark-up. It's one of the things that makes
> implementing a browser take forever. ;)
For HTML, yes. it accepts all sorts of garbage, like most
On 2007-05-25, Richard Brodie <[EMAIL PROTECTED]> wrote:
>
> "Marc 'BlackJack' Rintsch" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>
>> How did you verified that it is well formed?
>
> It appears to have a more fundamental problem, which is
> that it isn't correctly encoded (pre
"Marc 'BlackJack' Rintsch" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> How did you verified that it is well formed?
It appears to have a more fundamental problem, which is
that it isn't correctly encoded (presumably because the
CDATA is truncated in mid-character). I'm surpris
On May 25, 12:03 pm, "sim.sim" <[EMAIL PROTECTED]> wrote:
> On 25 ÍÁÊ, 12:45, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>
> > In <[EMAIL PROTECTED]>, sim.sim wrote:
> > > Below the code that tryes to parse an well-formed xml, but it fails
> > > with error message:
> > > "not well-formed (
On Fri, 2007-05-25 at 04:03 -0700, sim.sim wrote:
> my CDATA-section contains only symbols in the range specified for
> Char:
> Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
> [#x1-#x10]
>
>
> filter(lambda x: ord(x) not in range(0x20, 0xD7FF), iMessage)
That test is me
On 25 май, 12:45, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> In <[EMAIL PROTECTED]>, sim.sim wrote:
> > Below the code that tryes to parse an well-formed xml, but it fails
> > with error message:
> > "not well-formed (invalid token): line 3, column 85"
>
> How did you verified that it is
In <[EMAIL PROTECTED]>, sim.sim wrote:
> Below the code that tryes to parse an well-formed xml, but it fails
> with error message:
> "not well-formed (invalid token): line 3, column 85"
How did you verified that it is well formed? `xmllint` barf on it too.
> The "problem" within CDATA-section:
On 22 май, 16:45, "sim.sim" <[EMAIL PROTECTED]> wrote:
> Hi all.
> i'm faced to trouble using minidom:
>
> #i have a string (xml) within CDATA section, and the section includes
> "\r\n":
> iInStr = '\n\n'
>
> #After i create DOM-object, i get the value of "Data" without "\r\n"
>
> from xml.dom impo
16 matches
Mail list logo