Fredrik Lundh added the comment:
Note that this was fixed in upstream 1.3 (and verified by the selftests), but
the fix and test was apparently lost when that code was merged into 2.7. Since
2.7 is supposed to ship with 1.3, this is a regression, not a feature request.
(But 2.7 is in rc, and
Fredrik Lundh added the comment:
Namespaces are a fundamental part of the XML information model (both xpath and
infoset) and all modern XML document formats, so I'm not sure what problem
you're trying to solve by pretending that they don't exist.
It's a bit like modif
Fredrik Lundh added the comment:
The missing/extra words in the findtext description is just a case of sloppy
copy-editing, most likely after a quick reformatting. Not sure why you're
spending all this energy arguing about commas, t
Fredrik Lundh added the comment:
> As per PEP 257, “Returns” should become “Return” (it’s a command, not a
> description).
Upstream ET uses JavaDoc conventions, where the conventions are
designed by technical writers, not hackers. In JavaDoc, descriptions
are 3rd person declarative
Fredrik Lundh added the comment:
Hmm. I'm not entirely sure about giving False a meaning when None has
traditionally had a different (and documented) meaning. And sleeping on it
hasn't convinced me in either direction :-(
(well, I'd say no, but the compatibility argum
Fredrik Lundh added the comment:
Oops :) Yeah, that was pretty lousy way to show what encoding I was using for
that test:
>>> import locale
>>> locale.getpreferredencoding()
'cp1252'
>>>
(Somewhat related, it would be nice if Python actually normalized
Fredrik Lundh added the comment:
Interesting. But isn't the problem with 3.1 that it relies on the standard
encoding, which results in code that may or may not work depending on a global
platform setting? Who's doing the encoding in the new version? And what ends
up i
Fredrik Lundh added the comment:
"I wouldn't raise much opposition against tobytes() as an alias for tostring(),
although that sounds more like duplicating an otherwise simple API."
Adding an alias would be a way address the 2.X/3.X terminology overlap; string
traditionally i
Changes by Fredrik Lundh :
--
___
Python tracker
<http://bugs.python.org/issue8047>
___
___
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/m
Fredrik Lundh added the comment:
"I wouldn't raise much opposition against tobytes() as an alias for tostring(),
although that sounds more like duplicating an otherwise simple API."
Adding an alias would be a way address the 2.X/3.X terminology overlap; string
traditionally i
Fredrik Lundh added the comment:
"Yes, the feature has been implemented deep down in the _encode() helper
function, so it impacts the entire serialiser, not only its API"
Ouch.
>>> import locale
>>> locale.getpreferredencoding() == "utf-8"
False
>>
Fredrik Lundh added the comment:
(what's the Python 3 replacement for the array module, btw?)
--
___
Python tracker
<http://bugs.python.org/issue8047>
___
___
Fredrik Lundh added the comment:
"'None' has always been the documented default for the encoding parameter"
That's probably mostly by accident at least in original ET, but the 1.3 draft
docs at effbot.org/elementtree does spell it out explicitly for the 'write
Fredrik Lundh added the comment:
So now it's the domain experts against some hypothetical people that might
exist? Tricky.
--
___
Python tracker
<http://bugs.python.org/i
Fredrik Lundh added the comment:
>>> import array
>>> array.array("i", [1, 2, 3]).tostring()
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'
--
___
Python trac
Fredrik Lundh added the comment:
W00t!
--
___
Python tracker
<http://bugs.python.org/issue6472>
___
___
Python-bugs-list mailing list
Unsubscribe:
Fredrik Lundh added the comment:
> if I don't specify an encoding, I get unicode. If I do specify an encoding,
> I get encoded bytes.
You're confusing the XML document encoding with character set encoding.
A serialized (unparsed) XML document is a byte stream, not a s
Fredrik Lundh added the comment:
Footnote: "iterparse" does things this way mostly to keep the implementation
simple and fast; due to buffering, the tree builder are usually ahead of the
event generation with up to 16k. See the note on this page:
http://effbot.org/zone/element-ite
Fredrik Lundh added the comment:
And to clarify, XHTML is an reformulation of HTML4 using XML syntax, so you
should use an XML parser to parse it, not an HTML parser. The formats are
related, but not identical.
--
___
Python tracker
<h
Fredrik Lundh added the comment:
The "no header" thing is very much done on purpose, and it's documented in the
upstream ElementTree documentation.
I suggest dropping this "Python 3 exists in its own universe" nonsense; it's
not very professional, and it's
Fredrik Lundh added the comment:
Thanks Florent!
> Are there any simple, common cases that are made slower by this patch?
The original fastsearch implementation has a couple of special cases to make
sure it's faster than the original code in all cases. The reason it wasn't
im
Fredrik Lundh added the comment:
Note that "fail silently" is a bit of a misnomer - if the embedded import
doesn't work, portions of the library will fail pretty loudly. Feel free
to use some variation of the suggested patch, or just wait until the next
upstream release ge
Fredrik Lundh added the comment:
The real problem here is that XML attributes weren't really designed
to hold data that doesn't survive normalization. One would have
thought that making it difficult to do that, and easy to store such
things as character data, would have made people t
Fredrik Lundh added the comment:
PIL is completely thread-agnostic, so I not sure there's anything PIL can
do to fix this.
(and ImageQt is of course an interface to PyQt, which is an interface to
Qt, which consists of a *lot* more than 50
Fredrik Lundh added the comment:
For ET, that's very much on purpose. Validating data provided by every
single application would kill performance for all of them, even if only a
small minority would ever try to serialize data that cannot be represented
i
Fredrik Lundh added the comment:
That's backwards, unless I'm missing something here: charrefs represent
Unicode characters, not UTF-8 byte values. The character "LATIN SMALL
LETTER A WITH TILDE" with the character value 227 should be represented as
"ã" if
Fredrik Lundh added the comment:
Did you look at the 1.3 alpha code base when you came up with this idea?
Unfortunately, 1.3's _encode is used for a different purpose...
I don't have time to test it tonight, but I suspect that 1.3's
escape_data/escape_attrib functions mi
Fredrik Lundh added the comment:
Umm. Isn't _encode used to encode tags and attribute names? The charref
syntax is only valid in CDATA sections and attribute values, which are
encoded by the corresponding _escape functions. I suspect this patch will
make things blow up on a non-ASCI
Fredrik Lundh added the comment:
It should definitely give what's intended (either a Unicode string, or, if
the content is plain ASCII, an 8-bit string). What did you get instead?
--
___
Python tracker
<http://bugs.python.org/i
Fredrik Lundh added the comment:
Converting from UTF-8 to Unicode is the right thing to do, but
converting back to Latin-1 is not correct -- note that ET returns a
Unicode string, not an 8-bit string. There's a "makestring" helper that
does the right thing in the library
Fredrik Lundh added the comment:
sgmlop doesn't do much validation; to quote the homepage: "[sgmlop] is
tolerant, and happily accepts XML-like data that are not well-formed. If
you need strictness, use another parser."
But given that Python ships with cElementTree
Fredrik Lundh added the comment:
In the upstream 1.0.6, the ParseError exception has a position attribute
that contains a (line, column) tuple.
--
___
Python tracker
<http://bugs.python.org/issue1538
Fredrik Lundh added the comment:
ET 1.3 is still in alpha, though. Hopefully, that'll sort itself out
over the next few weeks.
--
___
Python tracker
<http://bugs.python.org/i
Fredrik Lundh added the comment:
Forgot to mention that this is fixed in the cElementTree trunk (public
as of today's 1.0.6 preview release). Will merge with Python trunk when
I find the time...
___
Python tracker
<http://bugs.python.org/i
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
Roland's right - "iterparse" only guarantees that it has seen the ">"
character of a starting tag when it emits a "start" event, so the
attributes are defined, but the contents of the text and
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
Yes, this refers to the POSIX character classes as described here:
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
(Ideally, there should be an (internal) API that lets you register class
definitions from the
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
"Do" should be "does", right. Not enough coffee today :)
___
Python tracker <[EMAIL PROTECTED]>
&
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
Looks fine to me, except for the comment in the test suite. Should
+# MS compilers do NOT combine c_short and c_int into
+# one field, gcc doesn't.
perhaps be
+# MS compilers do NOT combine c_short and
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
(the reason this is extra bad for C modules is that the profilers
introduce overhead for Python code, but not for C-level functions. For
example, using the standard profiler to benchmark parser performance for
xml.etree.ElementT
New submission from Fredrik Lundh <[EMAIL PROTECTED]>:
You often see people using the profiler for benchmarking instead of
profiling. I suggest adding a note that explains that the profiler
modules are designed to provide an execution profile for a given
program, not for benchmarking dif
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
A bit more information on the changes to the core engine that are
responsible for the 2x speedup (on what?) would be nice to have, I think
(especially since you seem to have removed the KMP prefix scanner).
(Isn't there a RE benc
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
The patch looks fine to me (assuming that I didn't miss something
critical hidden among the large table diffs).
(I'd probably named the "NODELTA" flag after what it is rather than what
it isn't, but I cann
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
It's a missing feature, not a bug in the existing code. But if you're
desperate, why not just use the transport implementation that's attached
to this issue?
___
Python tracker <
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
This is fixed in the ET 1.3-compatible codebase. Since it's too late to
add ET 1.3 to 2.6, I guess it's time to make a new 1.2 bugfix release
for 2.6.
___
Python tracker <[EMAI
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
That's should be all that's needed to expose the existing API, as is.
If you want to verify the build, you can grab the pytoken.c and setup.py
files from this directory, and try building the module.
http://svn.effbot.org/publ
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
Hmm. That's embarrassing. What was I thinking?
Guess it's time to update the 2.X codebase to ET 1.2.8.
___
Python tracker <[EMAIL PROTECTED]>
<http://
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
There are a few things in the struct that needs to be public, but that's
nothing that cannot be handled by documentation. No need to complicate
the API just in case.
___
Python tracker <[E
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
Reducing priority to normal; this bug has been around since Python 2.2,
and only affects code that doesn't work anyway when running on debug builds.
--
priority: critical -> normal
New submission from Fredrik Lundh <[EMAIL PROTECTED]>:
CPython provides a Python-level API to the parser, but not to the
tokenizer itself. Somewhat annoyingly, it does provide a nice C API,
but that's not properly exposed for external modules.
To fix this, the tokenizer.h fil
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
This report makes no sense to me; at least in Python 2.X, PyObject_Del
removes a chunk of memory from the object heap. It's designed to be
used from dealloc implementations, to release the actual memory (either
directly, or as
Changes by Fredrik Lundh <[EMAIL PROTECTED]>:
--
nosy: -effbot
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue2842>
__
___
Python-bugs-list mailin
Fredrik Lundh <[EMAIL PROTECTED]> added the comment:
Eh? Why did you add *everyone* involved the project to the nosy list?
(I'll leave explaining why breaking almost all Python programs in the
name of "consistency" is an absurd idea to someone else).
--
n
Fredrik Lundh added the comment:
Can you switch on verbose mode in xmlrpclib, so you can see *where* the
transfer hangs?
Arguing that a hanging Python program must be caused by a bug in the
code that *executes* the Python program isn't that meaningful, really.
After all, that code is us
Fredrik Lundh added the comment:
Looks like the mechanisms used decide when to invoke the full
ElementPath machinery differs somewhat. I've added this to the TODO
list for ET 1.3; in the meantime, my advice is "don't do that".
(adding a check for '.' to the PAT
Fredrik Lundh added the comment:
For the record, $ is defined to match "before a newline at the end of
the string, or at the end of the string" in normal mode, and "before any
newline, or at the end of the string" in multiline mode.
(and I have a vague memory that t
Fredrik Lundh added the comment:
re.findall has the same behaviour. Without looking at the code, I'm not
sure if this is a bug in the code or in the documentation, really.
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python
Fredrik Lundh added the comment:
That changes to ceval should have introduced some kind of XML-RPC
package limit seems a bit unlikely. If you can still reproduce this,
can you try instrumenting the xmlrpclib.py library to see where it gets
stuck?
(passing in verbose=True to the Server[Proxy
Fredrik Lundh added the comment:
This is fixed in the development version, so I'm closing this for now.
The updated docs can be found here:
http://docs.python.org/dev/library/xml.etree.elementtree.html
--
resolution: -> fixed
status: open -
Fredrik Lundh added the comment:
Looks like the wrong execution flags are being passed to the function
that creates the actual pattern object; the SRE compiler does the right
thing, but the engine isn't running with the right flags in the last
case. Changing the call to _sre.compi
Fredrik Lundh added the comment:
Well, I'm not sure 81k qualifies as "medium sized", really. If you look
at the size distribution for typical RE:s (which are usually
handwritten, not machine generated), that's one or two orders of
magnitude larger than "medium
Changes by Fredrik Lundh:
--
type: -> behavior
versions: +Python 2.4, Python 2.5
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue814253>
___
P
Fredrik Lundh added the comment:
I'm trying to think of a reason for actually providing __repr__ over
RPC, but I cannot find any. Not quite as sure about __str__, though; I
suggest adding a __repr__ method, but leaving the rest as is.
--
assignee: effbot -> coll
Fredrik Lundh added the comment:
A proper patch, including tests (if possible) and documentation, would
be nice.
(also note that SimpleXMLRPCServer was written by Brian Quinlan.)
--
assignee: effbot ->
_
Tracker <[EMAIL PROTECTED]&
Changes by Fredrik Lundh:
--
status: open -> closed
_
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1602189>
_
___
Python-bugs-list mailing li
Changes by Fredrik Lundh:
--
title: Updated to latest ElementTree in 2.6 -> Update to latest ElementTree in
Python 2.6
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.o
Fredrik Lundh added the comment:
ElementTree 1.3 provides a variant of this (tentatively called "itertext").
--
resolution: -> accepted
superseder: -> Updated to latest ElementTree in 2.6
_
Tracker <[EMAIL PROTECTED]>
&
New submission from Fredrik Lundh:
The xml.etree package should be updated to ElementTree 1.3/cElementTree
1.0.6 (or later).
--
assignee: effbot
components: XML
messages: 55811
nosy: effbot
priority: normal
severity: minor
status: open
title: Updated to latest ElementTree in 2.6
type
Fredrik Lundh added the comment:
But wasn't your complaint that the implementation didn't match the
documentation?
As I said, the *implementation* treats "runs of whitespace" as
separators, except for whitespace at the beginning or end (or in other
words, it never returns e
Fredrik Lundh added the comment:
Looks like a *documentation* bug to me; at the implementation level,
None just means "no empty parts, treat runs of whitespace as separators".
--
nosy: +effbot
__
Tracker <[EMAIL PROTECTED]>
<htt
Changes by Fredrik Lundh:
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1140>
__
___
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mai
Fredrik Lundh added the comment:
Well, I spent a minute hunting around for a "comment" field or an "add
comment" button. Guess this is a "you only need to learn this once"
thing...
__
Tracker <[EMAIL PROTECTED]&g
Fredrik Lundh added the comment:
(is there a way to just add a comment in the new tracker, btw, or is
everything a "change note", even if nothing has changed?)
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.p
Fredrik Lundh added the comment:
Looks good to me. I still subscribe to the idea that
robust code should accept 8-bit *ASCII* strings any-
where it accepts Unicode (especially when the 8-bit
string is empty), but that's me.
Feel free to check this in (or assign back to you if
you don
73 matches
Mail list logo