Jean-Paul Calderone <exar...@divmod.com> added the comment: > It's indeed possible to provide that as a third-party module; one > would have to implement an EntityResolver, and applications would > have to use it. If there was a need for such a thing, somebody would > have done it years ago.
I don't think this is true, for several reasons. First, most people never notice that they are writing or using an application which has this behavior. This is because the behavior is transparent in almost all cases, manifesting only as a slowdown. Often, no one is paying close attention to whether a function takes 0.1s or 0.5s. So code gets written which fetches resources from the network by accident. Similarly, users generally don't have any idea that this kind of defect is possible, or they don't think it's unusual behavior. In general, they're not equipped to understand why this is a bad thing. At best, they may decide a program is slow and be upset, but out of the myriad reasons a program might be slow, they have no particular reason to settle on this one as the real cause. Second, it is *difficult* to implement the non-network behavior. Seriously, seriously difficult. The documentation for these APIs is obscure and incomplete in places. It takes a long time to puzzle out what it means and how to achieve the desired behavior. I wouldn't be surprised if many people simply gave up and either switched to another parser or decided they could live with the slowdown (perhaps not realizing that it could be arbitrarily long and might add a network dependency to a program which doesn't already have one). Third, there are several pitfalls on the way to a correct implementation of the non-network behavior which may lead a developer to decide they have succeeded when they have actually failed. The most obvious is that simply turning off the external-general-entities feature appears to solve the problem but actually changes the parser's behavior so that it will silently drop named character entities. This is quite surprising behavior to anyone who hasn't spent a lot of time with the XML specification. So I think it would be a significant improvement if there were a simple, documented way to switch from network retrieval to local retrieval from a cache. I also think that the current default behavior is wrong. The default should not be to go out to the network, even if there is a well-behaved HTTP caching client involved. So the current behavior should be deprecated. After a sufficient period of time, the local-only behavior should be made the default. I don't see any problem with making it easy to re-enable the old behavior, though. > -1 on issuing a warning. I really cannot see much of a problem in > this entire issue. XML was designed to "be straightforwardly usable > over the Internet" (XML rec., section 1.1), and this issue is a > direct consequence of that design decision. You might just as well > warn people against using XML in the first place. Quoting part of the XML design goals isn't a strong argument for the current behavior. Transparently requesting network resources in order to process local data isn't a necessary consequence of the "straightforwardly usable over the internet" goal. Allowing this behavior to be explicitly enabled, but not enabled by default, easily meets this goal. Straightforwardly supporting a local cache of DTDs is even better, since it improves application performance and removes a large number of of security concerns. With the general disfavor of DTDs (in favor of other validation techniques, such as relax-ng) and the general disfavor of named character entities (basically only XHTML uses them), I find it extremely difficult to justify Python's current default behavior. _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue2124> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com