[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (07/24/09 - 07/31/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2292 open (+33) / 16143 closed (+18) / 18435 total (+51) Open issues with patches: 899 Average duration of open issues: 660 days. Median duration of open issues: 414 days. Open Issues Breakdown open 2261 (+32) pending30 ( +0) Issues Created Or Reopened (52) ___ string.lowercase/uppercase/letters not affected by locale change 07/25/09 CLOSED http://bugs.python.org/issue6525reopened ezio.melotti inserting None into sys.modules does not raise ImportError with 07/24/09 http://bugs.python.org/issue6563created brett.cannon Error in Sec. 8.4 of Tutorial07/24/09 CLOSED http://bugs.python.org/issue6564created gmarsden improper use of __setitem__ in ElementTree for Python 3.107/24/09 http://bugs.python.org/issue6565created aroberge json.dumps converts None to "null" (not null)07/24/09 http://bugs.python.org/issue6566created srid Make inf be almost equal to inf 07/25/09 http://bugs.python.org/issue6567reopened lucaspmelo unittest test discovery improvements 07/24/09 http://bugs.python.org/issue6568created michael.foord unittest document bug (random.shuffle sequence) 07/25/09 CLOSED http://bugs.python.org/issue6569created hsmtkk patch Documentation Clarity07/25/09 http://bugs.python.org/issue6570created StMark Help index 07/25/09 CLOSED http://bugs.python.org/issue6571created cjw Manas Thapliyal wants to chat07/25/09 CLOSED http://bugs.python.org/issue6572created gravitywarrior1 set union method ignores arguments appearing after the original 07/25/09 CLOSED http://bugs.python.org/issue6573created ssmout patch List the __future__ features in a table 07/26/09 http://bugs.python.org/issue6574created ezio.melotti patch, easy Can't download docs 07/26/09 CLOSED http://bugs.python.org/issue6575created zuo re docs: wrong link targets 07/26/09 CLOSED http://bugs.python.org/issue6576created zuo Links wrongly targeting to builtin functions' instead of module 07/26/09 CLOSED http://bugs.python.org/issue6577created zuo 2 problems with 'Docs for other versions' section on left HTML d 07/26/09 CLOSED http://bugs.python.org/issue6578created zuo No update about automatic numbering of fields in format strings 07/26/09 http://bugs.python.org/issue6579reopened georg.brandl No deprecation warning for list comprehension leak conflict 07/26/09 http://bugs.python.org/issue6580created mrbax
[Python-Dev] standard library mimetypes module pathologically broken?
Hi all, In an attempt to figure out some twisted.web code, I was reading through the Python Standard Library’s mimetypes module today, and was shocked at the poor quality of the code. I wonder how the mimetypes code made it into the standard library, and whether anyone has ever bothered to read it or update it: it is an embarrassment. Much of the code is redundant, portions fail to execute, control flow is routed through a horribly confusing mess of spaghetti, and most of the complexity has no clear benefit as far as I can tell. I probably should drop the subject and get back to work, but as a good citizen, it’s hard to just ignore this sort of thing. mimetypes.py stores its types in a pair of dictionaries, one for "strict" use, and the other for "non-standard types". It creates the strict dictionary by default out of apache's mime.types file, and then overrides the entries it finds with a set of exceptions. Then it creates the non-standard dictionary, which is set to match if the strict parameter is set to False when guessing types. Just in this basic design, and in the list of types in the file, there are several problems: * Various apache mime types files are read, if found, but the ordering of the files is such that older versions of apache are sometimes read after newer ones, overriding updated mime types with out-of-date versions if multiple versions of apache are installed on the system. * The vast majority of types declared in mimetypes.py are duplicates of types already declared by Apache. In a few cases this is to change the apache default (make an exception, that is), but in most cases the mime type and extension are completely identical. This huge number of redundant types makes the file substantially harder to follow. No comments are provided to explain why various sets of exceptions are made to Apache's default mime types, and in several cases mimetypes.py seems to just be out of date as compared to recent versions of Apache, for instance not knowing about the 'text/troff' type which was registered in January 2006 in RFC 4263. * The 'non-standard' type dictionary is nearly useless, because all of the types it declares are already in apache's mime.types file, meaning that types are, as far as I can tell trying to follow ugly program flow, *never* drawn from the non-strict dictionary, except in the improbable situation where the mimetypes module is initialized with a custom set of apache-mime.types–like files, which does not include those 'non-standard' types. I personally cannot see a use case for initializing the module with a custom set of mime types, but then leaving the very few types included as non-strict to the defaults: this seems like a fragile and pathological use case. Given this, I don’t see any benefit to dragging the 'strict' parameter along all the way through the code, and would advise getting rid of it altogether. Does anyone know of any code that uses the mimetypes module with strict set to False, where the non-strict code path ever *actually* is executed? But though these problems, which affect actual use of the code and are therefore probably most important, are significant, they really pale in comparison to the awful quality of implementation. I'll try to briefly outline my understanding of how code flows in mimetypes.py, and what the problems are. I haven't stepped through the code in a debugger, this is just from reading it, so I apologize in advance if I get something wrong. This is, however, some of the worst code I’ve seen in the standard library or anywhere else. * It defines __all__: I didn’t even realize __all__ could be used for single-file modules (w/o submodules), but it definitely shouldn’t be here. This specific __all__ oddly does not include all of the documented variables and functions in the mimetypes class. It’s not clear why someone calling import * here wouldn’t want the bits not included. * It creates a _default_mime_types() function which declares a bunch of global variables, and then immediately calls _default_mime_types() below the definition. There is literally no difference in result between this and just putting those variables at the top level of the file, so I have no idea why this function exists, except to make the code more confusing. * It allows command line usage: I don’t think this is necessary for a part of the standard library like this. There are better tools for finding mime types from the command line which ship with most operating systems. * Its API is pretty poorly designed. It offers 6 functions when about 3 are needed, and it takes a couple reads-through of the code to figure out exactly what any of them are supposed to do. * The operation is crazy: It defines a MimeType
Re: [Python-Dev] standard library mimetypes module pathologically broken?
On Fri, Jul 31, 2009 at 14:16, Jacob Rus wrote: > Hi all, > > In an attempt to figure out some twisted.web code, I was reading > through the Python Standard Library’s mimetypes module today, and > was shocked at the poor quality of the code. I wonder how the > mimetypes code made it into the standard library, and whether anyone > has ever bothered to read it or update it: it is an embarrassment. > Much of the code is redundant, portions fail to execute, control > flow is routed through a horribly confusing mess of spaghetti, and > most of the complexity has no clear benefit as far as I can tell. I > probably should drop the subject and get back to work, but as a good > citizen, it’s hard to just ignore this sort of thing. > I have not looked at the code nor ever used it (that I can remember) so I can't directly address the quality. But I can say the code was added in 1997 which puts it as an addition in Python 1.4. That's why before Python took off mainstream and began to tighten up the quality control on the standard library. I also would like to stay that I am not embarrassed by anything in Python. It's unfortunate if the mimetypes module's code is a mess, but I think putting at embarrassing is taking a little far and borderline insulting (which I don't think you meant to do). > > mimetypes.py stores its types in a pair of dictionaries, one for > "strict" use, and the other for "non-standard types". It creates the > strict dictionary by default out of apache's mime.types file, and > then overrides the entries it finds with a set of exceptions. Then > it creates the non-standard dictionary, which is set to match if the > strict parameter is set to False when guessing types. Just in this > basic design, and in the list of types in the file, there are > several problems: > > * Various apache mime types files are read, if found, but the >ordering of the files is such that older versions of apache are >sometimes read after newer ones, overriding updated mime types >with out-of-date versions if multiple versions of apache are >installed on the system. > > * The vast majority of types declared in mimetypes.py are >duplicates of types already declared by Apache. In a few cases >this is to change the apache default (make an exception, that >is), but in most cases the mime type and extension are >completely identical. This huge number of redundant types makes >the file substantially harder to follow. No comments are >provided to explain why various sets of exceptions are made to >Apache's default mime types, and in several cases mimetypes.py >seems to just be out of date as compared to recent versions of >Apache, for instance not knowing about the 'text/troff' type >which was registered in January 2006 in RFC 4263. > > * The 'non-standard' type dictionary is nearly useless, because >all of the types it declares are already in apache's mime.types >file, meaning that types are, as far as I can tell trying to >follow ugly program flow, *never* drawn from the non-strict >dictionary, except in the improbable situation where the >mimetypes module is initialized with a custom set of >apache-mime.types–like files, which does not include those >'non-standard' types. I personally cannot see a use case for >initializing the module with a custom set of mime types, but >then leaving the very few types included as non-strict to the >defaults: this seems like a fragile and pathological use case. >Given this, I don’t see any benefit to dragging the 'strict' >parameter along all the way through the code, and would advise >getting rid of it altogether. Does anyone know of any code that >uses the mimetypes module with strict set to False, where the >non-strict code path ever *actually* is executed? > > But though these problems, which affect actual use of the code and > are therefore probably most important, are significant, they really > pale in comparison to the awful quality of implementation. I'll try > to briefly outline my understanding of how code flows in > mimetypes.py, and what the problems are. I haven't stepped through > the code in a debugger, this is just from reading it, so I apologize > in advance if I get something wrong. This is, however, some of the > worst code I’ve seen in the standard library or anywhere else. > > * It defines __all__: I didn’t even realize __all__ could be used >for single-file modules (w/o submodules), but it definitely >shouldn’t be here. __all__ is used to control what a module exports when used in an import *, nothing more. Thus it's use in a module compared to a package is completely legitimate. > This specific __all__ oddly does not include >all of the documented variables and functions in the mimetypes >class. It’s not clear why someone calling import * here wouldn’t >want the bits not included. If something is documented by not listed i
Re: [Python-Dev] standard library mimetypes module pathologically broken?
On Fri, Jul 31, 2009 at 09:16:02PM +, Jacob Rus wrote: > > * The operation is crazy: It defines a MimeTypes class which > actually stores the type mappings, but this class is designed to > be a singleton. The way that such a design is enforced is > through the use of the module-global 'init' function, which > makes an instance of the class, and then maps all of the > functions in the module global namespace to instance methods. > But confusingly, all such functions are also defined > independently of the init function, with definitions such as: > > def guess_type(url, strict=True): > if not inited: > init() > return guess_type(url, strict) I can't speak for any of your other complaints, but I know that this weird init stuff is fixed in trunk. For the other stuff, you seem to have some very good points. I'm sure a patch would be welcome. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
On Fri, 31 Jul 2009 at 15:17, Brett Cannon wrote: * It creates a _default_mime_types() function which declares a bunch of global variables, and then immediately calls _default_mime_types() below the definition. There is literally no difference in result between this and just putting those variables at the top level of the file, so I have no idea why this function exists, except to make the code more confusing. It could potentially be used for testing, but that's a guess. regrtest calls it from dash_R_cleanup as part of "clear[ing] assorted module caches". --David ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
On Fri, Jul 31, 2009 at 15:38, Jacob Rus wrote:
> Brett Cannon wrote:
> > Jacob Rus wrote:
> >> * It defines __all__: I didn’t even realize __all__ could be used
> >>for single-file modules (w/o submodules), but it definitely
> >>shouldn’t be here.
> >
> > __all__ is used to control what a module exports when used in an import
> *,
> > nothing more. Thus it's use in a module compared to a package is
> completely
> > legitimate.
> >
> >> This specific __all__ oddly does not include
> >>all of the documented variables and functions in the mimetypes
> >>class. It’s not clear why someone calling import * here wouldn’t
> >>want the bits not included.
> >
> > If something is documented by not listed in __all__ that is a bug.
>
> In this case, everything in the module is documented, including parts
> that should be private, but only a small number are in __all__. My
> recommendation would be to make those private parts be _ variables and
> remove them from the docs (using them has no legitimate use cases I
> can see), and rip out __all__.
>
Well, if the module had stuff that did not lead with an underscore then you
can't remove it. You can deprecate it under the old name and rename it with
an underscore, but backwards-compatibility says someone out there is using
those functions so you can't just batch rename them w/o the proper warning.
>
> >> * It creates a _default_mime_types() function which declares a
> >>bunch of global variables, and then immediately calls
> >>_default_mime_types() below the definition. There is literally
> >>no difference in result between this and just putting those
> >>variables at the top level of the file, so I have no idea why
> >>this function exists, except to make the code more confusing.
> >
> > It could potentially be used for testing, but that's a guess.
>
> Here's an abridged version of this function. I don’t think there’s any
> reason for this that I can see.
>
>def _default_mime_types():
>global suffix_map
>global encodings_map
>global types_map
>global common_types
>
>suffix_map = {
>'.tgz': '.tar.gz', #...
>}
>
>encodings_map = {
>'.gz': 'gzip', #...
>}
>
>types_map = {
>'.a' : 'application/octet-stream', #...
>}
>
>common_types = {
>'.jpg' : 'image/jpg', #...
>}
>
>_default_mime_types()
>
As R. David pointed out, it is being used by regrtest to clean up after
running the test suite.
>
> > Probably came from someone who is very OO happy. Not everyone comes to
> > Python ready to embrace its procedural or slightly functional facets.
>
> Yes, it seems so to me too.
>
> > So the problem of changing fundamentally how the code works, even for a
> > cleanup, is that it will break someone's code out there because they
> > depended on the module's crazy way of doing things. Now if they are
> cheating
> > and looking at things that are meant to be hidden you might be able to
> clean
> > things up, but if the semantics are exposed to the user, then there is
> not
> > much we can do w/o breaking someone's code.
>
> The problem is that the semantics as documented are really ambiguous,
> and what I would consider the reasonable interpretation is different
> from what the code actually does. So anyone using this code naively is
> going to run into trouble, and anyone relying on how the code actually
> works is going behind the back of the docs, but they sort of have to
> in order to use much of the functionality of the module. I agree this
> puts us in a tricky spot.
>
Well, perhaps the docs can be updated to match the code where cleanup would
change the semantics.
>
> > Honestly, if the code is as bad as it seems -- including its API --, the
> > best bet would be to come up with a new module for handling MIME types
> from
> > scratch, put it up on the Cheeseshop/PyPI, and get the community behind
> it.
> > If the community picks it up as the de-facto replacement for mimetypes
> and
> > the code has settled we can then talk about adding it to the standard
> > library and begin deprecating mimetypes.
> > And thanks for willing to volunteer to fix this.
>
> Okay. Well I'd still like to hear a bit about what people really need
> before trying to make a new API. I'm not such an experienced API
> designer, and I haven’t really plumbed the depths of mimetypes use
> cases (though it seems to me like quite a simple module of not more
> than 100 lines of code or so would suffice).
I'm sure you can get help from the community with any of this.
> At the very least, I
> think some changes can be made to this code without altering its basic
> function, which would clean up the actual mime types it returns,
> comment the exceptions to Apache and explain why they're there, and
> make the code flow understandable to someone reading the code.
That all sounds reasonable.
-B
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Brett Cannon wrote:
* It creates a _default_mime_types() function which declares a
bunch of global variables, and then immediately calls
_default_mime_types() below the definition. There is literally
no difference in result between this and just putting those
variables at the top level of the file, so I have no idea why
this function exists, except to make the code more confusing.
>>>
>>> It could potentially be used for testing, but that's a guess.
>>
>> Here's an abridged version of this function. I don’t think there’s any
>> reason for this that I can see.
>>
>> def _default_mime_types():
>> global suffix_map
>> global encodings_map
>> global types_map
>> global common_types
>>
>> suffix_map = {
>> '.tgz': '.tar.gz', #...
>> }
>>
>> encodings_map = {
>> '.gz': 'gzip', #...
>> }
>>
>> types_map = {
>> '.a' : 'application/octet-stream', #...
>> }
>>
>> common_types = {
>> '.jpg' : 'image/jpg', #...
>> }
>>
>> _default_mime_types()
>
> As R. David pointed out, it is being used by regrtest to clean up after
> running the test suite.
Yeah, basically the issue is that the default mime types should be
separate objects from the final set after apache's files have been
parsed and custom additions have been made. If these ones at the top
level are renamed and not modified after creation, if new objects with
all the updated stuff is put at these names, and if the test code is
changed to instead reset the ones at these names based on the default
objects, I think that will maybe fix things. I'll try to write some
potential patches in the next day or two and submit them here for
advice.
>> The problem is that the semantics as documented are really ambiguous,
>> and what I would consider the reasonable interpretation is different
>> from what the code actually does. So anyone using this code naively is
>> going to run into trouble, and anyone relying on how the code actually
>> works is going behind the back of the docs, but they sort of have to
>> in order to use much of the functionality of the module. I agree this
>> puts us in a tricky spot.
>
> Well, perhaps the docs can be updated to match the code where cleanup would
> change the semantics.
I think that would make the docs extremely confusing, and I’m not even
sure it would be possible. The current semantics are vaguely okay if
an API consumer sticks to straight-forward use cases, such as any
which don’t break when the current docs are followed (anything
complicated is going to break unless the code is read a few times),
and assuming such uses it would be possible to swap out most of the
implementation for something relatively straight-forward. But if any
of the edges are pushed, the semantics quickly turn insane, to the
point I’m not sure they’re document-able. Anyone expecting the code to
work that way is going to have a buggy program anyway, so I’m not sure
it makes sense to bend over backwards leaving the particular set of
bugs unchanged.
Cheers,
Jacob Rus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Brett Cannon wrote:
> Jacob Rus wrote:
>> * It defines __all__: I didn’t even realize __all__ could be used
>> for single-file modules (w/o submodules), but it definitely
>> shouldn’t be here.
>
> __all__ is used to control what a module exports when used in an import *,
> nothing more. Thus it's use in a module compared to a package is completely
> legitimate.
>
>> This specific __all__ oddly does not include
>> all of the documented variables and functions in the mimetypes
>> class. It’s not clear why someone calling import * here wouldn’t
>> want the bits not included.
>
> If something is documented by not listed in __all__ that is a bug.
In this case, everything in the module is documented, including parts
that should be private, but only a small number are in __all__. My
recommendation would be to make those private parts be _ variables and
remove them from the docs (using them has no legitimate use cases I
can see), and rip out __all__.
>> * It creates a _default_mime_types() function which declares a
>> bunch of global variables, and then immediately calls
>> _default_mime_types() below the definition. There is literally
>> no difference in result between this and just putting those
>> variables at the top level of the file, so I have no idea why
>> this function exists, except to make the code more confusing.
>
> It could potentially be used for testing, but that's a guess.
Here's an abridged version of this function. I don’t think there’s any
reason for this that I can see.
def _default_mime_types():
global suffix_map
global encodings_map
global types_map
global common_types
suffix_map = {
'.tgz': '.tar.gz', #...
}
encodings_map = {
'.gz': 'gzip', #...
}
types_map = {
'.a' : 'application/octet-stream', #...
}
common_types = {
'.jpg' : 'image/jpg', #...
}
_default_mime_types()
> Probably came from someone who is very OO happy. Not everyone comes to
> Python ready to embrace its procedural or slightly functional facets.
Yes, it seems so to me too.
> So the problem of changing fundamentally how the code works, even for a
> cleanup, is that it will break someone's code out there because they
> depended on the module's crazy way of doing things. Now if they are cheating
> and looking at things that are meant to be hidden you might be able to clean
> things up, but if the semantics are exposed to the user, then there is not
> much we can do w/o breaking someone's code.
The problem is that the semantics as documented are really ambiguous,
and what I would consider the reasonable interpretation is different
from what the code actually does. So anyone using this code naively is
going to run into trouble, and anyone relying on how the code actually
works is going behind the back of the docs, but they sort of have to
in order to use much of the functionality of the module. I agree this
puts us in a tricky spot.
> Honestly, if the code is as bad as it seems -- including its API --, the
> best bet would be to come up with a new module for handling MIME types from
> scratch, put it up on the Cheeseshop/PyPI, and get the community behind it.
> If the community picks it up as the de-facto replacement for mimetypes and
> the code has settled we can then talk about adding it to the standard
> library and begin deprecating mimetypes.
> And thanks for willing to volunteer to fix this.
Okay. Well I'd still like to hear a bit about what people really need
before trying to make a new API. I'm not such an experienced API
designer, and I haven’t really plumbed the depths of mimetypes use
cases (though it seems to me like quite a simple module of not more
than 100 lines of code or so would suffice). At the very least, I
think some changes can be made to this code without altering its basic
function, which would clean up the actual mime types it returns,
comment the exceptions to Apache and explain why they're there, and
make the code flow understandable to someone reading the code.
Cheers,
Jacob Rus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathol ogically broken?
Andrew McNabb wrote: > Jacob Rus wrote: >> * The operation is crazy: It defines a MimeTypes class which >> actually stores the type mappings, but this class is designed to >> be a singleton. The way that such a design is enforced is >> through the use of the module-global 'init' function, which >> makes an instance of the class, and then maps all of the >> functions in the module global namespace to instance methods. >> But confusingly, all such functions are also defined >> independently of the init function, with definitions such as: >> >> def guess_type(url, strict=True): >> if not inited: >> init() >> return guess_type(url, strict) > > I can't speak for any of your other complaints, but I know that this > weird init stuff is fixed in trunk. Actually, this fix changes the semantics of the code quite substantially (not in any way that is incompatible with the extremely vague documentation, but in a way that might break any code that relies on the Python <=2.6 behavior). If such a change is okay, then we can do quite a bit of implementation change under these new semantics. Cheers, Jacob Rus ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standard library mimetypes module pathologically broken?
Jacob Rus wrote: Okay. Well I'd still like to hear a bit about what people really need before trying to make a new API. Try asking some specific question on python-list. "How to you use the stdlib mimetypes module?" ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
