Re: just a bug (done)

2007-05-25 Thread Maksim Kasimov
Carsten Haese: > If you want to convey an arbitrary sequence of bytes as if they were > characters, you need to pick a character encoding that can handle an > arbitrary sequence of bytes. utf-8 can not do that. ISO-8859-1 can, but > you need to specify the encoding explicitly. Observe what happens

Re: just a bug

2007-05-25 Thread Carsten Haese
On Fri, 2007-05-25 at 17:30 +0300, Maksim Kasimov wrote: > I insist - my message is correct and not contradicts no any point of w3.org > xml-specification. The fact that you believe this so strongly and we disagree just as strongly indicates a fundamental misunderstanding. Your fundamental misund

Re: just a bug

2007-05-25 Thread Maksim Kasimov
Jarek Zgoda: > > No, it is not a part of string. It's a part of byte stream, split in a > middle of multibyte-encoded character. > > You cann't get only dot from small letter "i" and ask the parser to > treat it as a complete "i". > ... i know it :)) can you propose something to solve it? ;)

Re: just a bug

2007-05-25 Thread Jarek Zgoda
Maksim Kasimov napisał(a): >> 'utf8' codec can't decode bytes in position 176-177: invalid data > iMessage[176:178] >> '\xd1]' >> >> And that's your problem. In general you can't just truncate a utf-8 >> encoded string anywhere and expect the result to be valid utf-8. The >> \xd1 at the very e

Re: just a bug

2007-05-25 Thread Maksim Kasimov
Richard Brodie пишет: > "Neil Cerutti" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > >> Web browsers are in the very business of reasonably rendering >> ill-formed mark-up. It's one of the things that makes >> implementing a browser take forever. ;) > > For HTML, yes. it accept

Re: just a bug

2007-05-25 Thread Maksim Kasimov
[EMAIL PROTECTED] : > You need to explicitly convert the string of UTF8 encoded bytes to a > Unicode string before parsing e.g. > unicodestring = unicode(encodedbytes, 'utf8') it is only a part of a string - not hole string, i've wrote it before. That meens that the content can not be converted t

Re: just a bug

2007-05-25 Thread Maksim Kasimov
Carsten Haese: > On Fri, 2007-05-25 at 04:03 -0700, sim.sim wrote: > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 176-177: > invalid data iMessage[176:178] > '\xd1]' > > And that's your problem. In general you can't just truncate a utf-8 > encoded string anywhere and ex

Re: just a bug

2007-05-25 Thread Mattia Gentilini
Richard Brodie ha scritto: > For HTML, yes. it accepts all sorts of garbage, like most > browsers; I've never, before now, seen it accept an invalid > XML document though. It *could* depend on Content-Type. I've seen that Firefox treats XHTML as HTML (i.e. not trying to validate it) if you set Co

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread Richard Brodie
"Neil Cerutti" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Web browsers are in the very business of reasonably rendering > ill-formed mark-up. It's one of the things that makes > implementing a browser take forever. ;) For HTML, yes. it accepts all sorts of garbage, like most

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread Neil Cerutti
On 2007-05-25, Richard Brodie <[EMAIL PROTECTED]> wrote: > > "Marc 'BlackJack' Rintsch" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] > >> How did you verified that it is well formed? > > It appears to have a more fundamental problem, which is > that it isn't correctly encoded (pre

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread Richard Brodie
"Marc 'BlackJack' Rintsch" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > How did you verified that it is well formed? It appears to have a more fundamental problem, which is that it isn't correctly encoded (presumably because the CDATA is truncated in mid-character). I'm surpris

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread harvey . thomas
On May 25, 12:03 pm, "sim.sim" <[EMAIL PROTECTED]> wrote: > On 25 ÍÁÊ, 12:45, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > > > In <[EMAIL PROTECTED]>, sim.sim wrote: > > > Below the code that tryes to parse an well-formed xml, but it fails > > > with error message: > > > "not well-formed (

Re: just a bug

2007-05-25 Thread Carsten Haese
On Fri, 2007-05-25 at 04:03 -0700, sim.sim wrote: > my CDATA-section contains only symbols in the range specified for > Char: > Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | > [#x1-#x10] > > > filter(lambda x: ord(x) not in range(0x20, 0xD7FF), iMessage) That test is me

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread sim.sim
On 25 май, 12:45, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, sim.sim wrote: > > Below the code that tryes to parse an well-formed xml, but it fails > > with error message: > > "not well-formed (invalid token): line 3, column 85" > > How did you verified that it is

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread Marc 'BlackJack' Rintsch
In <[EMAIL PROTECTED]>, sim.sim wrote: > Below the code that tryes to parse an well-formed xml, but it fails > with error message: > "not well-formed (invalid token): line 3, column 85" How did you verified that it is well formed? `xmllint` barf on it too. > The "problem" within CDATA-section:

just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread sim.sim
On 22 май, 16:45, "sim.sim" <[EMAIL PROTECTED]> wrote: > Hi all. > i'm faced to trouble using minidom: > > #i have a string (xml) within CDATA section, and the section includes > "\r\n": > iInStr = '\n\n' > > #After i create DOM-object, i get the value of "Data" without "\r\n" > > from xml.dom impo