[issue46173] float(x) with large x not raise OverflowError
New submission from Cyker Way : Acccording to: https://docs.python.org/3/library/functions.html#float >If the argument is outside the range of a Python float, an OverflowError > will be raised. It is well known that the maximum value in IEEE 754 binary64 format is: >>> M = ((1<<53)-1)<<(1023-52) ... So `(M+1)` will be out of range. But `float(M+1)` gives me: >>> float(M+1) 1.7976931348623157e+308 No OverflowError is thrown. Contrast this with: >>> float(M+M) Traceback (most recent call last): File "", line 1, in OverflowError: int too large to convert to float In another text: https://docs.python.org/3/tutorial/floatingpoint.html#representation-error > ...Almost all machines today (November 2000) use IEEE-754 floating point > arithmetic, and almost all platforms map Python floats to IEEE-754 “double > precision”... Is Python not following IEEE 754 binary64 format or something missing here? -- components: Interpreter Core messages: 409143 nosy: cykerway priority: normal severity: normal status: open title: float(x) with large x not raise OverflowError type: behavior versions: Python 3.10, Python 3.9 ___ Python tracker <https://bugs.python.org/issue46173> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46173] float(x) with large x not raise OverflowError
Cyker Way added the comment: If python's float strictly adheres to IEEE 754 binary64 (is that so?), then the problem may be caused by rounding. However, even in that case I still don't get the benefits of doing range check on a rounded input but not the input itself. Does IEEE 754 itself give any specifications about range check and rounding during such type conversion? -- ___ Python tracker <https://bugs.python.org/issue46173> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46173] float(x) with large x not raise OverflowError
Cyker Way added the comment: IEEE 754, 7.4 Overflow: > The overflow exception shall be signaled if and only if the destination > format’s largest finite number is exceeded in magnitude by what would have > been the rounded floating-point result (see 4) were the exponent range > unbounded... Were this the basis of the float implementation, not raising an overflow exception seems to be the correct behavior? Then this is not a bug, but the python float doc might better say "If the *rounded* argument is outside..." -- ___ Python tracker <https://bugs.python.org/issue46173> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46173] float(x) with large x not raise OverflowError
Cyker Way added the comment: Alright that helps. I guess I now understand what's happening here. Here are the two numbers in question: >>> M = int('1'*53+'0'*971, 2) >>> N = int('1'*53+'0'+'1'*970, 2) M is the largest number in binary64 range, while N is the largest number that does not emit an OverflowError when converted to binary64. N+1 will emit that error. N-M == 2**970-1 which means N is larger than M that caused the confusion. -- ___ Python tracker <https://bugs.python.org/issue46173> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39145] Innocuous parent class changes multiple inheritance MRO
New submission from Cyker Way : With an inheritance graph like this: A C B D (X) A E Adding or removing class X in E's parents will change the order of A and C in E's MRO: EBDAC vs EBDCXA. I couldn't imagine what would be the "perfect" MRO. However, given X is completely independent from A and C, this behavior looks strange and problematic. -- components: Interpreter Core files: mro.py messages: 358921 nosy: cykerway priority: normal severity: normal status: open title: Innocuous parent class changes multiple inheritance MRO type: behavior versions: Python 3.8 Added file: https://bugs.python.org/file48804/mro.py ___ Python tracker <https://bugs.python.org/issue39145> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39145] Innocuous parent class changes multiple inheritance MRO
Cyker Way added the comment: Thanks for reply. It's not about the Python's implementation of C3 but C3 itself being used as the MRO algorithm in Python. It bites when you remove an independent interface from your class definition and its method calls become something else. I think I can propose an easy fix to C3. I would just call it C3.9 if it's going to be merged in Python 3.9. The difference between C3.9 and C3 is: During a merge, put the list of the parents in front of the linearizations of the parents, namely: Instead of: L[C(B1 ... BN)] = C + merge(L[B1], ..., L[BN], B1 ... BN) We do: L[C(B1 ... BN)] = C + merge(B1 ... BN, L[B1], ... , L[BN]) This preserves the guarantees of C3 (for example, local precedence ordering, monotonicity), plus another property: When you remove a leaf node in the dependency graph, the MRO of other nodes remain the same. Practically, this means when a developer removes an independent interface from his class, he knows the MRO of the remaining classes keep the same. Here the independent interface can even have its own class hierarchy, and the proof goes by removing each leaf one by one in its hierarchy. For example, using the same mro.py: E + [BDA BA DC A] EB + [DA A DC A] EBD + [A A C A] EBDA + [C] EBDAC E + [BDXA BA DC X A] EB + [DXA A DC X A] EBD + [XA A C X A] EBDX + [A A C A] EBDXA + [C] EBDXAC You can see EBDAC is a sub-sequence of EBDXAC. The proof is intuitive. I can give a sketch here. Assume without loss of generality, class E extends A, B, C, D, and we add a new base class X between B and C, now we have: (f1) E + [ABXCD L(A) L(B) X L(C) L(D)] Compare this with: (f0) E + [ABCD L(A) L(B) L(C) L(D)] We can carry all the computation in f1 just as in f0 until X becomes the head of the first list item: (f1') E... + [XCD ... X ... ] (f0') E... + [CD ... ... ] At this time we know we can extract X in (f1') because X is in any tail. After we extract X, (f1') becomes (f0') and we can carry all the operations just as in (f0'). So the result of (f1) is merely an added X and all other elements keep the same order. Intuitively, the performance of C3.9 should be almost the same as C3. The only drawback I can think of is existing code may have a different MRO, but that happened when Python adopted C3. Not sure if this is worth a PEP. If anyone is interested feel free to leave a message. -- ___ Python tracker <https://bugs.python.org/issue39145> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39145] Innocuous parent class changes multiple inheritance MRO
Cyker Way added the comment: a typo: ...at this time we know we can extract X in (f1') because X is NOT in any tail... Missed the "NOT" in the previous text. -- ___ Python tracker <https://bugs.python.org/issue39145> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39145] Innocuous parent class changes multiple inheritance MRO
Cyker Way added the comment: Thank you for the links. I doubt this c3 variant could break EPG consistency making it c2. May run some tests later and move on to discussion board. Guess I'm done here. -- ___ Python tracker <https://bugs.python.org/issue39145> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39145] Innocuous parent class changes multiple inheritance MRO
Cyker Way added the comment: Ahhh, 1 second, I haven't really quit from this. I could open another thread but it's highly related to this one. I just came up with something that looks like a bug not a feature in the original c3. #!/usr/bin/env python3 #A C #B D X A #E F #G class A(object): pass class B(A): pass class C(object): pass class D(C): pass class X(object): pass class E(B, D, A): pass class F(B, D, X, A): pass class G(E, F): pass print(G.mro()) The script fails. It cannot find GEFBDCXA, which looks valid? You can close again if you feel this isn't a bug. I'm gone for a while. -- resolution: not a bug -> status: closed -> open ___ Python tracker <https://bugs.python.org/issue39145> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14074] argparse allows nargs>1 for positional arguments but doesn't allow metavar to be a tuple
Cyker Way added the comment: Tuple support is documented: https://github.com/python/cpython/commit/98047eb806227f11212f6a42c242030a51964e30#diff-9c4a053d29149ba40370fb3e34faR1059 https://docs.python.org/3/library/argparse.html#metavar > Providing a tuple to ``metavar`` specifies a different display for each of > the arguments: >From his word here I feel like our professor meant to support tuple metavar >but the implementation was not careful. He called for a patch eight years ago. >Let's summon him and see if he has changed his mind? -- ___ Python tracker <https://bugs.python.org/issue14074> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35314] fnmatch failed with leading caret (^)
New submission from Cyker Way : In short, `fnmatch.fnmatch` doesn't match shell result. To test this, create a dir with 2 files: `a.py` and `b.py`. Then `ls [!b].py` and `ls [^b].py` will both show `a.py`. However, `fnmatch.fnmatch('a.py', '[!b].py')` returns `True` but `fnmatch.fnmatch('a.py', '[^b].py')` returns `False`. Problem seems to come from an escaped caret: https://github.com/python/cpython/blob/master/Lib/fnmatch.py#L124 I don't see why caret and exclamation mark are different from `man bash`: > ...If the first character following the [ is a ! or a ^ then any character > not enclosed is matched... Could someone please confirm it's a bug or intended behavior? -- components: Library (Lib) messages: 330417 nosy: cykerway priority: normal severity: normal status: open title: fnmatch failed with leading caret (^) versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue35314> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35314] fnmatch failed with leading caret (^)
Cyker Way added the comment: Thank you for confirmation. Knowing it is not fully POSIX-compliant helps with understanding. I'm asking this because I had interoperability issues writing python scripts providing shell-like utilities for filename expansion and the result may surprise users. The glibc fnmatch provides a flag named `FNM_PATHNAME`, which is missing in the python fnmatch implementation. So I think there is currently no way to tell the python library if we are matching a filename or not. All right so this is not a bug, but probably a good enhancement. ## TLDR This is what POSIX says **for filename expansion**, in section 2.13.3: <http://pubs.opengroup.org/onlinepubs/9699919799.2008edition/> > when pattern matching notation is used for filename expansion: > > 1. The character in a pathname shall be explicitly matched by > using one or more characters in the pattern; it shall neither be > matched by the or special characters nor by a > bracket expression. characters in the pattern shall be identified > before bracket expressions; thus, a cannot be included in a pattern > bracket expression used for filename expansion. If a character is > found following an unescaped character before a > corresponding is found, the open bracket shall be > treated as an ordinary character. For example, the pattern "a[b/c]d" does not > match such pathnames as abd or a/d. It only matches a pathname of literally > a[b/c]d. Currently python fnmatch.fnmatch gives: >>> fnmatch('abd', 'a[b/c]d') True >>> fnmatch('a/d', 'a[b/c]d') True >>> fnmatch('a[b/c]d', 'a[b/c]d') False Ideally we can call `fnmatch('a/d', 'a[b/c]d', fnm_pathname=True)` to correct the behavior. -- ___ Python tracker <https://bugs.python.org/issue35314> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14074] argparse allows nargs>1 for positional arguments but doesn't allow metavar to be a tuple
Cyker Way added the comment: Can confirm this bug still exists on master branch, python3.7, python3.6, and very likely other versions since it's reported. It seems only `_format_action_invocation` and `_get_action_name` need to be fixed. So we can do it more lightweight (<10 lines). -- nosy: +cykerway Added file: https://bugs.python.org/file47965/14074.patch ___ Python tracker <https://bugs.python.org/issue14074> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36293] Nonblocking read sys.stdin raises error
New submission from Cyker Way : This piece of code will raise an error: import os import sys os.set_blocking(sys.stdin.fileno(), False) sys.stdin.read() Error: > TypeError: can't concat NoneType to bytes Not sure if this is relevant (for a different version of Python): https://bugs.python.org/issue24560 -- components: Library (Lib), Unicode messages: 337940 nosy: cykerway, ezio.melotti, vstinner priority: normal severity: normal status: open title: Nonblocking read sys.stdin raises error versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue36293> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36294] `io.BufferedIOBase` returns `None`
New submission from Cyker Way : Document of [BufferedIOBase](https://docs.python.org/3/library/io.html#io.BufferedIOBase) says: > ...unlike their RawIOBase counterparts, they will never return None. But this example shows the above statement is not true: import io import os import sys os.set_blocking(sys.stdin.fileno(), False) print(isinstance(sys.stdin.buffer, io.BufferedIOBase)) print(sys.stdin.buffer.read()) Output: True None -- components: IO, Library (Lib) messages: 337941 nosy: cykerway priority: normal severity: normal status: open title: `io.BufferedIOBase` returns `None` versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue36294> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33210] pkgutil.walk_packages gives incomplete results
New submission from Cyker Way : The current implementation of `pkgutil.walk_packages()` is confusing. Users may be given incomplete results while no exception is raised if they don't explicitly provide the `prefix` parameter. The doc says: > prefix is a string to output on the front of every module name on output. But the fact is, `prefix` is not merely an output formatter at all. This function cannot handle an empty prefix (which is the default) and will often encounter import errors which are then simply ignored without an `onerror` function (which is default again). See test program for details. Doc: https://docs.python.org/3.6/library/pkgutil.html#pkgutil.walk_packages Source: https://github.com/python/cpython/blob/3.6/Lib/pkgutil.py -- components: Library (Lib) files: test.py messages: 314850 nosy: cykerway priority: normal severity: normal status: open title: pkgutil.walk_packages gives incomplete results type: behavior versions: Python 3.6 Added file: https://bugs.python.org/file47516/test.py ___ Python tracker <https://bugs.python.org/issue33210> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33210] pkgutil.walk_packages gives incomplete results
Change by Cyker Way : Removed file: https://bugs.python.org/file47516/test.py ___ Python tracker <https://bugs.python.org/issue33210> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33210] pkgutil.walk_packages gives incomplete results
Cyker Way added the comment: Update test program. -- Added file: https://bugs.python.org/file47519/test.py ___ Python tracker <https://bugs.python.org/issue33210> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34296] Speed up python startup by pre-warming the vm
New submission from Cyker Way : I'm currently writing some shell tools with python3. While python can definitely do the job succinctly, there is one problem which made me feel I might have to switch to other languages or even pure bash scripts: python startup time. Shell tools are used very often, interactively. users do feel the lag when they hit enter after their command. i did 2 implementations in both python and pure bash, python takes about 500ms to run while bash is more than 10 times faster. I'm not saying bash is better than python, but for this task bash, or even perl, is a better choice. however, i think there is an easy way to improve python as i believe the lag is mostly due to its slow startup: pre-warm its vm. I can think of 2 scenarios for python to do a better shell job: 1. Run a universal python as a daemon, which reads scripts from a socket, runs it, and returns the result to a socket. Because it's running as a daemon, the startup time is avoided each time user runs a script. 2. Warm a python zygote during system boot. Every time a python script is run, fork from the zygote instead of cold-boot the vm. this is a similar approach to android zygote. I haven't done experiments to see whether there will be obstacles in implementing these scenarios. But I think this should become a priority because it's real and tangible, and other people may face the slow startup problem as well. If there's ongoing work on these, I'd be happy to have a look. But I don't think these scenarios have already been put into released versions of python. -- components: Interpreter Core messages: 322800 nosy: cykerway priority: normal severity: normal status: open title: Speed up python startup by pre-warming the vm type: enhancement versions: Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue34296> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34296] Speed up python startup by pre-warming the vm
Cyker Way added the comment: > VM startup + `import site` are done in 10~20ms on common Linux machine. > While Python VM startup is not lightning fast, library import time is much > slower than VM startup in many cases. In my tests, a helloworld python script generally takes about 30-40 ms, while a similar helloworld bash script takes about 3-4 ms. Adding some common library imports (`os`, `sys`, etc.), then the python script's running time goes up to 60-70ms. I'm not sure how to compare this against bash because there are no library imports in bash. The 500ms (python) vs 50ms (bash) comparison is based on minimal implementations of the same simple job and meant to reflect the minimal amount of time needed for such a job in different languages. While it doesn't cover everything and may not even be fair enough, the result does match that of the helloworld test (10 times faster/slower). Plus, in your linked post, it shows importing pipenv takes about 700ms. Therefore I believe some hundreds of milliseconds are necessary for such scripts that do a simple but meaningful job. I understand many things can happen while importing a library. But for a specific program, its imports are usually fixed and very much likely the same between runs. That's why I believe a zygote/fork/snapshot feature would still be helpful to help avoid the startup delay. Finally, for simple and quick user scrips, the 30-40 ms startup time without any import statements may not be a huge problem, but it's still tangible and makes the program feel not that sleek. -- ___ Python tracker <https://bugs.python.org/issue34296> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34296] Speed up python startup by pre-warming the vm
Cyker Way added the comment: It was tested on a x86_64 Linux system. The machine is not quite new but is OK for building and running python. The test script is actually a management tool for a larger project that is not released in public so I don't have right to disclose it here. When tested on python 3.7 it did run faster than python 3.6 so there were indeed great improvements. While optimizing standard libraries definitely makes the program start up faster, we should also note that each user program also has its own specific initialization procedure. This is in general out of control of language platform developers, but forking a user-initialized vm or using a snapshot chosen by the user still helps. -- ___ Python tracker <https://bugs.python.org/issue34296> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34296] Speed up python startup by pre-warming the vm
Cyker Way added the comment: > While this issue is "pre warming VM", VM startup is not significant part of > your 500ms. 10-20ms should be OK for shell scripts. But a fork is still faster. > You're talking about application specific strategy now. It's different of > this issue. Actually, this issue is created to look for a generic approach that can optimize the running time for most, or even all, python scripts. Different scripts may import different modules, but this doesn't mean there isn't a method that works for all of them. > And many ideas like yours are already discussed on ML, again and again. I browsed about 6-7 threads on python-dev. I think 2-3 of them provide great information. But I don't think any of them gives concrete solutions. So we are still facing this problem today. > I want to close this issue. Please give us more concrete idea or patch with > target sample application you want to optimize. As said I'm looking for a generic approach. So optimizing specific applications isn't really the goal of this issue (though work on specific modules still helps). I did implement a proof of concept (link: <https://github.com/cykerway/pyforkexec>) for the fork-exec startup approach. It's still very rough and elementary, but proves this approach has its value. As Nick said: > ...the CPython runtime is highly configurable, so it's far from clear what, > if anything, could be shared from run to run... What I hope is we can inspect these configurations and figure out the invariants. This would help us make a clean environment as the forking base. If this is impossible, I think we can still fork from a known interpreter state chosen by the user script author. You may close this issue if nobody has more to say, but I hope the fork-exec startup can be standardized one day as I believe, for quick scripts, however much you optimize the cold start it can't be faster than a fork. -- ___ Python tracker <https://bugs.python.org/issue34296> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34296] Speed up python startup by pre-warming the vm
Cyker Way added the comment: I'm fine with stdlib, 3rd party tools, or whatever. My focus is to understand is whether this idea can be correctly implemented on the python VM or not. I've been searching for similar implementations on standard JVM, but the results mostly come from research projects rather than industrial solutions. That being said, Android does have preloading implemented in its Dalvik/ART VM (which is more or less a variant of JVM). Cited from <https://source.android.com/devices/tech/dalvik/configure>: > The preloaded classes list is a list of classes the zygote will initialize > on startup. This saves each app from having to run these class initializers > separately, allowing them to start up faster and share pages in memory. I was wondering what makes it difficult for standard JVM (eg. HotSpot) to have such feature and why Dalvik/ART is able to do it, and what would be the case for the python VM? A few more words about my original vision: I was hoping to speed up python script execution using template VMs in which a list of selected modules are preloaded. For example, if you have one script for regex matching, and another for dir listing, then you can create 2 template VMs with `re` and `os` modules preloaded, respectively. The template VMs run as system service so that you can always fork from them to create something like a runtime version of *virtualenv* where only relevant modules are loaded. The preloaded modules can be standard modules or user modules. I don't really see what makes a difference here if the module is standard or not. > In particular Windows which doesn't have "fork" behaviour. Forking helps the parent process keep a clean state since it basically does nothing after the fork. If the system doesn't natively support fork then the parent process can do the job by itself instead of forking a child process to do so. But additional work might be needed to remove the artifacts resulting from the execution of user script. -- ___ Python tracker <https://bugs.python.org/issue34296> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35010] sort by partially reversed key tuple
New submission from Cyker Way : The current `sorted` function is somewhat limited and doesn't cover a use case that frequently occurs in real applications: sort by a tuple of keys where each key can be in asc or desc order. For example, you may have a list of site configs where each of specify which user on which site gets how much quota. This data may be loaded from a SQL table where there are 3 columns: url, user, quota. Often you may want to sort by url, but when you want to check which users have been allocated most quota, you probably want to sort by quota. Even more likely, when you are sorting by quota, you still want to sort by url for those having the same quota. Unfortunately, current `sorted` function doesn't allow you to sort by desc quota and asc url at the same time, because the `reverse` parameter acts on the final result, regardless of columns. For numeric columns, there is a trick of using minus sign on the data. But this approach usually doesn't apply to other types of data. The general solution is to enhance the key function, allowing users to specify each column to be considered, with an asc/desc flag: keyspec = [ (itemgetter('url'), False), (itemgetter('user'), True), ] An example is worth 1k words. The full example is provided in the attachment. It's not a lot of code to write but making this feature builtin can save a lot of work since sorting a SQL table or other sheet data is quite common in real applications. -- components: Library (Lib) files: c.py messages: 327886 nosy: cykerway priority: normal severity: normal status: open title: sort by partially reversed key tuple type: enhancement versions: Python 3.8 Added file: https://bugs.python.org/file47873/c.py ___ Python tracker <https://bugs.python.org/issue35010> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35010] sort by partially reversed key tuple
Cyker Way added the comment: Multi-pass stable sorts should produce the correct result. But as the number of columns grow the code gets messy. For brevity this example only has 2 columns but it may be 10 or more in a real application. Furthermore, in some cases the application may need to remember which columns are sorted asc/desc so you have to keep that piece of data (termed `keyspec` but can be any meaningful name) anyway. It would be ideal if a builtin function can directly operate on this data instead of manually writing a multi-pass sorting logic; The proposed prototype (`c.py`) is aimed at minimizing the amount of code needed to illustrate the problem and is not optimized for performance. There is a performance hit where it wraps one object into another, but this should be able to be avoided. If you want to compare real performance between single- and multi-pass sorts, you can run this script `performance.py` instead. My test result is that multi-pass sorting takes 75% more time, YMMV. -- Added file: https://bugs.python.org/file47875/performance.py ___ Python tracker <https://bugs.python.org/issue35010> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35010] sort by partially reversed key tuple
Cyker Way added the comment: Previous upload `example.py` was missing `__eq__`. Updated in `example-1.py`. -- Added file: https://bugs.python.org/file47876/example-1.py ___ Python tracker <https://bugs.python.org/issue35010> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35010] sort by partially reversed key tuple
Cyker Way added the comment: As for performance, I think both single-pass and multi-pass sorts have worst-case time complexity `m * n * log(n)`, assuming the number of items is `n` and each item has dimension `m`. Whichever is faster seems to be data-dependent. So I made a more comprehensive performance evaluation in `performance-1.py`. It turns out either single-pass or multi-pass can be 80-100% faster than the other, on different inputs. Since single-pass and multi-pass sorts are good at different inputs and there is currently no statistics on real application data supporting either, both look OK to me. But I hope you can add something in the interface of sorting functions (mainly `sorted` and `list.sort`) so that users don't have to write that multi-pass sort again and again. If the `keyspec` format is deemed too complicated, `keys` and `reverses` also look good to me: sorted(items, keys=(key0, key1, key2), reverses=(True, False, True)) And you are free to use whatever sorting algorithms in its implementation for this kind of task. -- Added file: https://bugs.python.org/file47877/performance-1.py ___ Python tracker <https://bugs.python.org/issue35010> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35010] sort by partially reversed key tuple
Cyker Way added the comment: Thank you very much for the great discussion here, especially Tim's great threads in *python-ideas* that give neat and insightful answers to this problem in different ways: - <https://mail.python.org/pipermail/python-ideas/2016-October/043045.html> - <https://mail.python.org/pipermail/python-ideas/2016-October/043126.html> Since this topic is closed, future discussions probably should go to other python forums. But it might be good to draw some conclusions here for future reference. First of all, either single-pass sort with a vector key or multi-pass sort with a scalar key may work better, depending on the input. However, in most cases, using multi-pass sort for such problem is the right way to go in the current python implementation. The multi-pass sort algorithm typically runs 2x faster or so than a single-pass sort algorithm. This is likely due to constants rather than asymptotic complexity. But when measured in real time, multi-pass sort algorithm clearly wins in most cases. If your input is so special that it aligns much better with single-pass sort algorithms (overwhelming the constant advantage of multi-pass sort algorithm), you may use a single-pass sort algorithm. But there are actually different ways of implementing so. The algorithm posted in Tim's second thread on python-ideas is in fact different from mine in this bug thread, where Tim used a wrapper class for the keys and I used a wrapper class for the scalars. Since there are `n` keys but can be as many as `n * m` scalars, my method would be using more wrapper objects. So I expected it to run slower than Tim's. To my surprise, sometimes it works better. The reason is later found to be easy to understand: Wrapper objects are created only when a column needs to be reversed. The number of columns to be reversed is actually a factor controlling the running time. To give a fair evaluation, I created 50 random rows with 8 columns and control the number of columns to be reversed. The evaluation s hows my algorithm could have running time 80%-120% that of Tim's (excluding the special case where no column needs to be reversed). This shows a new direction of optimizing such algorithms. Finally, this issue was rejected because the added benefits were deemed not enough to complicate the implementation. Considering the current multi-pass sort algorithm has a huge advantage and is easy to implement in python, this decision is fair. Users who care less about performance may write a key adapter in their own code if they want to stick with builtin sort functions. Users who do care about performance can use single-pass sort techniques mentioned in this issue in case multi-pass sort doesn't work well with their data. -- Added file: https://bugs.python.org/file47880/performance-2.py ___ Python tracker <https://bugs.python.org/issue35010> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29030] argparse: choices override metavar
New submission from Cyker Way: Using `choices` option in `argparse` module caused unwanted behavior. # Without `choices` parser = argparse.ArgumentParser() parser.add_argument('length') parser.print_help() ## Output usage: demo.py [-h] length positional arguments: length optional arguments: -h, --help show this help message and exit # With choices parser = argparse.ArgumentParser() parser.add_argument('length', choices=[1,2,3]) parser.print_help() ## Output usage: demo.py [-h] {1,2,3} positional arguments: {1,2,3} optional arguments: -h, --help show this help message and exit I think the doc says ArgumentParser objects use the `dest` value as the “name” of each object. And the doc doesn't mention anything about `dest` be changed by `choices`. The help text looks ugly when there is a long list of choices. Currently the problem can be remedied by adding a `metavar` option. But I hope it will be fixed in the library itself. -- components: Library (Lib) messages: 283713 nosy: cyker priority: normal severity: normal status: open title: argparse: choices override metavar type: behavior versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue29030> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com