[issue7090] encoding uncode objects greater than FFFF
New submission from Mahmoud : Odd behaviour with str.encode or codecs.Codec.encode or simailar functions, when dealing with uncode objects above with 2.6 >>> u'\u10380'.encode('utf') '\xe1\x80\xb80' with 3.x '\u10380'.encode('utf') '\xe1\x80\xb80' correct output must be: \xf0\x90\x8e\x80 -- components: Unicode messages: 93780 nosy: msaghaei severity: normal status: open title: encoding uncode objects greater than type: behavior versions: Python 2.6, Python 2.7, Python 3.0, Python 3.1 ___ Python tracker <http://bugs.python.org/issue7090> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6396] No conversion specifier in the string, no __getitem__ method in the right hand value
New submission from Mahmoud : When using a class instance as a mapping for the right hand value in a sting format expression without conversion specifier, it seems logical that the class has a __getitem__ method. Therefore following format expression should raise an exception. >>> class AClass(object): ... pass ... >>> c = AClass() >>> "a string with no conversion specifier" % c 'a string with no conversion specifier' -- messages: 89987 nosy: msaghaei severity: normal status: open title: No conversion specifier in the string, no __getitem__ method in the right hand value versions: Python 2.6, Python 2.7, Python 3.0, Python 3.1 ___ Python tracker <http://bugs.python.org/issue6396> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10970] "string".encode('base64') is not the same as base64.b64encode("string")
New submission from Mahmoud Abdelkader : Given a string, encoding it with .encode('base64') is not the same as using base64's b64encode function. I think this is very unclear and unintuitive. Here's some example code to demonstrate the problem. Before I attempt to submit a patch, is this done for legacy reasons? Are there any reasons to use one over the other? import hmac import hashlib import base64 signature = hmac.new('secret', 'url', hashlib.sha512).digest() assert signature.encode('base64') == base64.b64encode(signature) -- components: Library (Lib) messages: 126696 nosy: mahmoudimus priority: normal severity: normal status: open title: "string".encode('base64') is not the same as base64.b64encode("string") versions: Python 2.5, Python 2.6, Python 2.7 ___ Python tracker <http://bugs.python.org/issue10970> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10970] "string".encode('base64') is not the same as base64.b64encode("string")
Mahmoud Abdelkader added the comment: Thanks for the clarification Terry. This is indeed not a bug. For reference, the pieces of code I pasted line-wrapped after the 76th character, which was my main source of confusion. After reading RFC3548, I am now informed that the behavior of string.encode is the correct and expected result, as the documentation per 7.8.3 state that it's MIME 64. -- ___ Python tracker <http://bugs.python.org/issue10970> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44121] Missing implementation for formatHeader and formatFooter methods of the BufferingFormatter class in the logging module.
New submission from Mahmoud Harmouch : While I was browsing in the source code of the logging package, I've encountered missing implementations for formatHeader and formatFooter methods of the BufferingFormatter class(in __init__ file). Therefore, I'm going to implement them and push these changes in a pull request. -- components: Library (Lib) messages: 393565 nosy: Harmouch101 priority: normal severity: normal status: open title: Missing implementation for formatHeader and formatFooter methods of the BufferingFormatter class in the logging module. type: enhancement versions: Python 3.11 ___ Python tracker <https://bugs.python.org/issue44121> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44121] Missing implementation for formatHeader and formatFooter methods of the BufferingFormatter class in the logging module.
Change by Mahmoud Harmouch : -- keywords: +patch pull_requests: +24735 stage: -> patch review pull_request: https://github.com/python/cpython/pull/26095 ___ Python tracker <https://bugs.python.org/issue44121> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12949] Documentation of PyCode_New() lacks kwonlyargcount argument
Changes by Mahmoud Hashemi : -- nosy: +mahmoud ___ Python tracker <http://bugs.python.org/issue12949> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13787] PyCode_New not round-trippable (TypeError)
New submission from Mahmoud Hashemi : On Python 3.1.4, attempting to create a code object will apparently result in a TypeError (must be str, not tuple), even when you're creating a code object from another, working code object: # co_what.py def foo(): return 'bar' co = foo.__code__ co_copy = type(co)(co.co_argcount, co.co_kwonlyargcount, co.co_nlocals, co.co_stacksize, co.co_flags, co.co_code, co.co_consts, co.co_names, co.co_varnames, co.co_freevars, co.co_cellvars, co.co_filename, co.co_name, co.co_firstlineno, co.co_lnotab) # EOF $ python3 co_what.py Traceback (most recent call last): File "co_what.py", line 20, in co.co_lnotab) TypeError: must be str, not tuple Looking at the PyCode_New function, all the arguments look correctly matched up according to the signature in my Python 3.1.4 build source (looks identical to the trunk source): # Objects/codeobject.c PyCode_New(int argcount, int kwonlyargcount, int nlocals, int stacksize, int flags, PyObject *code, PyObject *consts, PyObject *names, PyObject *varnames, PyObject *freevars, PyObject *cellvars, PyObject *filename, PyObject *name, int firstlineno, PyObject *lnotab) { PyCodeObject *co; Py_ssize_t i; /* Check argument types */ if (argcount < 0 || nlocals < 0 || code == NULL || consts == NULL || !PyTuple_Check(consts) || names == NULL || !PyTuple_Check(names) || varnames == NULL || !PyTuple_Check(varnames) || freevars == NULL || !PyTuple_Check(freevars) || cellvars == NULL || !PyTuple_Check(cellvars) || name == NULL || !PyUnicode_Check(name) || filename == NULL || !PyUnicode_Check(filename) || lnotab == NULL || !PyBytes_Check(lnotab) || !PyObject_CheckReadBuffer(code)) { PyErr_BadInternalCall(); return NULL; } And, for the record, this same behavior works just fine in the equivalent Python 2. -- components: Interpreter Core files: co_what.py messages: 151270 nosy: mahmoud priority: normal severity: normal status: open title: PyCode_New not round-trippable (TypeError) type: behavior versions: Python 3.1, Python 3.2 Added file: http://bugs.python.org/file24239/co_what.py ___ Python tracker <http://bugs.python.org/issue13787> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13787] PyCode_New not round-trippable (TypeError)
Mahmoud Hashemi added the comment: And here's the working Python 2 version (works fine on Python 2.7, and likely a few versions prior). -- Added file: http://bugs.python.org/file24240/co_what2.py ___ Python tracker <http://bugs.python.org/issue13787> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13787] PyCode_New not round-trippable (TypeError)
Mahmoud Hashemi added the comment: Yes, I knew it was an issue with crossed wires somewhere. The Python 2 code doesn't translate well to Python 3 because the function signature changed to add kwargonlycount. And I guess the argument order is substantially different, too, as described in Objects/codeobject.c#l291. Thanks for clearing that up, though, Mahmoud -- ___ Python tracker <http://bugs.python.org/issue13787> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17911] traceback: add a new thin class storing a traceback without storing local variables
Mahmoud Hashemi added the comment: Hey all, great to see this being worked on so diligently for so long. Having worked in this area for a while (at home and at PayPal), we've got a few learnings to share: 1) linecache is textbook not-threadsafe. For example, https://hg.python.org/cpython/file/default/Lib/linecache.py#l38 For a lightweight traceback wrapper to be concurrency-friendly, we've had to catch KeyErrors, like so: https://github.com/mahmoud/boltons/blob/master/boltons/tbutils.py#L115 It's kind of a blanket approach, but maybe we could make a separate issue and help out with a linecache refresh? 2) We use something like (filename, lineno) in our DeferredLine class, but for very lightweight areas (e.g., greenlet creation) we just save a reference to the code object, as the additional attribute accesses do end up showing up in the profiles. 3) Generally we've found the APIs in TracebackInfo here to be pretty sufficient/functional: https://github.com/mahmoud/boltons/blob/master/boltons/tbutils.py#L134 Let me know if you've got any questions on that, and keep up the good work! -- nosy: +mahmoud ___ Python tracker <http://bugs.python.org/issue17911> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23479] str.format() breaks object duck typing
New submission from Mahmoud Hashemi: While porting some old code, I found some interesting misbehavior in the new-style string formatting. When formatting objects which support int and float conversion, old-style percent formatting works great, but new-style formatting explodes hard. Here's a basic example: class MyType(object): def __init__(self, func): self.func = func def __float__(self): return float(self.func()) print '%f' % MyType(lambda: 3) # Output (python2 and python3): 3.00 print '{:f}'.format(MyType(lambda: 3)) # Output (python2): # Traceback (most recent call last): # File "tmp.py", line 28, in # print '{:f}'.format(MyType(lambda: 3)) # ValueError: Unknown format code 'f' for object of type 'str' # # Output (python3.4): # Traceback (most recent call last): # File "tmp.py", line 30, in # print('{:f}'.format(MyType(lambda: 3))) # TypeError: non-empty format string passed to object.__format__ And the same holds true for int and so forth. I would expect these behaviors to be the same between the two formatting styles, and tangentially, expect a more python2-like error message for the python 3 case. -- messages: 236192 nosy: mahmoud priority: normal severity: normal status: open title: str.format() breaks object duck typing type: behavior versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue23479> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23479] str.format() breaks object duck typing
Changes by Mahmoud Hashemi : -- nosy: +Mark.Williams ___ Python tracker <http://bugs.python.org/issue23479> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23479] str.format() breaks object duck typing
Mahmoud Hashemi added the comment: Well, thank you for the prompt and helpful replies everyone. Can't say I didn't wish the default behavior were more intuitive, but at least I think I have an idea how to work this. Thanks again! -- resolution: not a bug -> status: closed -> open ___ Python tracker <http://bugs.python.org/issue23479> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25341] File mode wb+ appears as rb+
Changes by Mahmoud Hashemi : -- nosy: +mahmoud ___ Python tracker <http://bugs.python.org/issue25341> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7359] mailbox cannot modify mailboxes in system mail spool
Mahmoud Hashemi added the comment: Got bit by this, and since it's not a bug, here's "not" a fix: http://boltons.readthedocs.org/en/latest/mboxutils.html#boltons.mboxutils.mbox_readonlydir Been in production for a while, working like a charm. Might there be interest in including this in the standard lib? -- nosy: +mahmoud ___ Python tracker <http://bugs.python.org/issue7359> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26623] JSON encode: more informative error
New submission from Mahmoud Lababidi: The json.dumps()/encode functionality will raise an Error when an object that cannot be json-encoded is encountered. The current Error message only shows the Object itself. I would like to enhance the error message by also providing the Type. This is useful when numpy.int objects are passed in, but not clear that they are numpy objects. -- components: Library (Lib) messages: 262272 nosy: Mahmoud Lababidi priority: normal severity: normal status: open title: JSON encode: more informative error type: enhancement versions: Python 3.6 ___ Python tracker <http://bugs.python.org/issue26623> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26623] JSON encode: more informative error
Changes by Mahmoud Lababidi : -- keywords: +patch Added file: http://bugs.python.org/file42258/json_encode.patch ___ Python tracker <http://bugs.python.org/issue26623> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26623] JSON encode: more informative error
Mahmoud Lababidi added the comment: Is there a use case where the representation is too long? I think it may be useful to see the representation, but perhaps you are correct. -- ___ Python tracker <http://bugs.python.org/issue26623> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26623] JSON encode: more informative error
Mahmoud Lababidi added the comment: Serhiy, I've attached a patch without the Object representation. Choose whichever you feel is better. -- Added file: http://bugs.python.org/file42366/json_encode.patch ___ Python tracker <http://bugs.python.org/issue26623> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24019] str/unicode encoding kwarg causes exceptions
New submission from Mahmoud Hashemi: The encoding keyword argument to the Python 3 str() and Python 2 unicode() constructors is excessively constraining to the practical use of these core types. Looking at common usage, both these constructors' primary mode is to convert various objects into text: >>> str(2) '2' But adding an encoding yields: >>> str(2, encoding='utf8') Traceback (most recent call last): File "", line 1, in TypeError: coercing to str: need bytes, bytearray or buffer-like object, int found While the error message is fine for an experienced developer, I would like to raise the question: is it necessary at all? Even harmlessly getting a str from a str is punished, but leaving off encoding is fine again: >>> str('hi', encoding='utf8') Traceback (most recent call last): File "", line 1, in TypeError: decoding str is not supported >>> str('hi') 'hi' Merging and simplifying the two modes of these constructors would yield much more predictable results for experienced and beginning Pythonists alike. Basically, the encoding argument should be ignored if the argument is already a unicode/str instance, or if it is a non-string object. It should only be consulted if the primary argument is a bytestring. Bytestrings already have a .decode() method on them, another, obscurer version of it isn't necessary. Furthermore, despite the core nature and widespread usage of these types, changing this behavior should break very little existing code and understanding. unicode() and str() will simply behave as expected more often, returning text versions of the arguments passed to them. Appendix: To demonstrate the expected behavior of the proposed unicode/str, here is a code snippet we've employed to sanely and safely get a text version of an arbitrary object: def to_unicode(obj, encoding='utf8', errors='strict'): # the encoding default should look at sys's value try: return unicode(obj) except UnicodeDecodeError: return unicode(obj, encoding=encoding, errors=errors) After many years of writing Python and teaching it to developers of all experience levels, I firmly believe that this is the right interaction pattern for Python's core text type. I'm also happy to expand on this issue, turn it into a PEP, or submit a patch if there is interest. -- components: Unicode messages: 241699 nosy: ezio.melotti, haypo, mahmoud priority: normal severity: normal status: open title: str/unicode encoding kwarg causes exceptions type: behavior versions: Python 2.7, Python 3.4, Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue24019> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24019] str/unicode encoding kwarg causes exceptions
Mahmoud Hashemi added the comment: Python already has one approach that fails to decode non-bytestrings: the .decode() method. This is about removing unicode barriers to entry and making the str constructor in Python 3 as succinctly useful as possible. There are several problems the helper does not solve: 1) Usage-wise, str/unicode is used to turn values into text. From a high-level perspective, the content does not change, only the representation format. Should this fundamental operation really require type inspection and explicit try/except blocks every single time? Or should it just work? sorted() does not raise an exception if the values are already sorted, why does str() raise an exception when the value is already a str?* 2) By and large, among developers, keyword arguments are viewed as "optional" arguments that have defaults which can be overridden. However, that is not the case here; str is not simply str(obj, encoding=sys.getdefaultencoding()). Explicitly passing the keyword argument breaks the call. 3) The helper does not help promote Python adoption when it must be copied and pasted it into new developer's projects. It does not help break down the misconception that unicode is a punishing concept to be around in Python. * This question is posed here rhetorically, but I have gotten variations on it from multiple Python developers in training. -- versions: +Python 2.7 ___ Python tracker <http://bugs.python.org/issue24019> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24019] str/unicode encoding kwarg causes exceptions
Mahmoud Hashemi added the comment: Martin, it sounds that way because that is what is being proposed: "Merging and simplifying the two modes". Given the existence of .decode() on bytestrings, the only objects that generally need decoding in Python 2 and 3, the existence of str/unicode's second mode constitutes a design bug. Without a doubt, Python has frequently preferred convenient idioms over EAFP. Look at dict.get for an excellent example of defaults being used instead of forcing users to catch KeyErrors. That conversation could have gone a different way, but Python is better off having stuck to its pragmatic roots. In answer to your questions, Martin, 1) I'd expect str(b"123", encoding=None) to do the same thing as str(b"123") and 2) I'd expect str(obj) behavior to continue to depend on whether the object passed is string-like. Python is a duck-typed, dynamic language, and dynamic languages are most powerful when their core types reflect usability. Consistency is one of the foremost factors of usability, and having to frequently switch between two call patterns of the str constructor feels inconsistent and unusable. -- ___ Python tracker <http://bugs.python.org/issue24019> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24019] str/unicode encoding kwarg causes exceptions
Mahmoud Hashemi added the comment: I would urge you all take a stronger look at usability, rather than parroting the current state of the design and docs. Python gained renown over the years for its ability to stay flexible while maturing. Focusing on purity and ignoring the needs of practical programmers is exactly how PEP #461 ended up coming into play so late. The inflexible arguments of str makes a common task, turning data into text, an order of magnitude harder than it needs to be. -- ___ Python tracker <http://bugs.python.org/issue24019> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24172] Errors in resource.getpagesize module documentation
New submission from Mahmoud Hashemi: The resource module's description of resource.getpagesize is woefully misguiding. Reproduced in full for convenience: resource.getpagesize() Returns the number of bytes in a system page. (This need not be the same as the hardware page size.) This function is useful for determining the number of bytes of memory a process is using. The third element of the tuple returned by getrusage() describes memory usage in pages; multiplying by page size produces number of bytes. Besides being vague by not referring to the third element as ru_maxrss, the peak RSS for the process (i.e., not the current memory usage), tests on Linux, Darwin, and FreeBSD show the following: * Linux: ru_maxrss is in kilobytes * Darwin (OS X): ru_maxrss is in bytes * FreeBSD: ru_maxrss is in kilobytes (same as Linux) Knowing the page size is probably useful to someone, but the misinformation has definitely sent more than one person down the wrong path here. Additionally, the correct information should be up in the getrusage() method documentation, closer to relevant field descriptions. Mahmoud -- assignee: docs@python components: Documentation messages: 243043 nosy: docs@python, mahmoud priority: normal severity: normal status: open title: Errors in resource.getpagesize module documentation versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue24172> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31561] difflib pathological behavior with mixed line endings
Mahmoud Al-Qudsi added the comment: Attaching file2 -- Added file: https://bugs.python.org/file47165/file2 ___ Python tracker <https://bugs.python.org/issue31561> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31561] difflib pathological behavior with mixed line endings
New submission from Mahmoud Al-Qudsi: While using the icdiff command line interface to difflib, I ran into an interesting issue where difflib took 47 seconds to compare two simple text documents (a PHP source code file that had been refactored via phptidy). On subsequent analysis, it turned out to be some sort of pathological behavior triggered by the presence of mixed line endings. Normalizing the line endings in both files to \r\n via unix2dos and then comparing (making no other changes) resulted in the diff calculation completing in under 2 seconds. I have attached the documents in question (file1 and file2) to this bug report. -- components: Library (Lib) files: file1 messages: 302788 nosy: Mahmoud Al-Qudsi priority: normal severity: normal status: open title: difflib pathological behavior with mixed line endings versions: Python 3.6 Added file: https://bugs.python.org/file47164/file1 ___ Python tracker <https://bugs.python.org/issue31561> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31561] difflib pathological behavior with mixed line endings
Mahmoud Al-Qudsi added the comment: @tim.peters No, `icdiff` is not part of core and probably should be omitted from the remainder of this discussion. I just checked and it's actually not a mix of line endings in each file, it's just that one file is \n and the other is \r\n You can actually just duplicate this bug by taking _any_ file and copying it, then executing `unix2dos file1; dos2unix file2` - you'll have to perfectly "correct" files2 that difflib will struggle to handle. (as a preface to what follows, I've written a binary diff and incremental backup utility, so I'm familiar with the intricacies and pitfalls when it comes to diffing. I have not looked at difflib's source code, however. Looking at the documentation for difflib, it's not clear whether or not it should be considered a naive binary diffing utility, since it does seem to have the concept of "lines".) Given that _both_ input files are "correct" without line ending errors, I think the correct optimization here would be for difflib to "realize" that two chunks are "identical" but with different line endings (aka just plain different, not asking for this to be treated as a special case) but instead of going on to search for a match to either buffer, it should assume that no better match will be found later on and simply move on to the next block/chunk. Of course, in the event where file2 has a line from file1 that is first present with a different line ending then repeated with the same line ending, difflib will not choose the correct line.. but that's probably not something worth fretting over (like you said, mixed line endings == recipe for disaster). Of course I can understand if all this is out of the scope of difflib and not an endeavor worth taking up. -- ___ Python tracker <https://bugs.python.org/issue31561> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com