[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD

2017-10-02 Thread Tim Peters
Tim Peters added the comment: If someone opens a bug report with OpenBSD, or just for us to get more info, it could be useful to have a larger universe of troublesome tan inputs to stare at. So the attached tanny.py supplies them, testing all inputs within 100 ulps of math.pi/2 (or change N

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD

2017-10-02 Thread Tim Peters
Tim Peters added the comment: Thanks for tanny-openbsd.txt, Serhiy! OpenBSD didn't get anywhere close to the best answer on any of those 201 inputs. I was hoping we could, e.g., test something a little more removed from pi/2 - but even its best cases in this range are hundreds of mil

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD

2017-10-02 Thread Tim Peters
Tim Peters added the comment: When Sun was developing fdlibm, I was (among other things) working on a proprietary libm for Kendall Square Research. I corresponded with fdlibm's primary author (KC Ng) often at the time. There's no way he would have left errors this egregious s

[issue31327] bug in dateutil\tz\tz.py

2017-10-11 Thread Tim Peters
Tim Peters added the comment: The docs for the `time` module say: """ Although this module is always available, not all functions are available on all platforms. Most of the functions defined in this module call platform C library functions with the same name. It may sometime

[issue31327] bug in dateutil\tz\tz.py

2017-10-11 Thread Tim Peters
Tim Peters added the comment: Since this is a pretty common gotcha, I'd prefer to add it as an example to the text I already quoted; e.g., add: """ For example, the native Windows C libraries do not support times before the epoch, and `localtime(n)` for negative `n`

[issue31327] bug in dateutil\tz\tz.py

2017-10-11 Thread Tim Peters
Tim Peters added the comment: I'll just add that it may be a different issue to argue about how `_naive_is_dst()` is implemented. -- nosy: +belopolsky ___ Python tracker <https://bugs.python.org/is

[issue31759] re wont recover nor fail on runaway regular expression

2017-10-11 Thread Tim Peters
Tim Peters added the comment: Well, the problem in the regexp is this part: "\d+,? ?". You're not _requiring_ that strings of digits be separated by a comma or blank, you're only _allowing_ them to be so separated. A solid string of digits is matched by this,

[issue31759] re wont recover nor fail on runaway regular expression

2017-10-11 Thread Tim Peters
Tim Peters added the comment: Sure! The OP was obviously asking about the engine that ships with Python, so that's what I talked about. Raphaël, Matthew develops an excellent replacement ("regex") for Python's re module, which you can install via, e.g., "pip insta

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD

2017-10-16 Thread Tim Peters
Tim Peters added the comment: On 16 Oct 2017, exactly the same test failures were reported on python-dev: https://mail.python.org/pipermail/python-dev/2017-October/149880.html >From the test output posted there: """ == CPython 3.6.3 (default, Oct 16 2017, 14:42:21) [GCC 4.7

[issue31815] Make itertools iterators interruptible

2017-10-19 Thread Tim Peters
Tim Peters added the comment: Segfaults are different: they usually expose an error in CPython's implementation. We don't prioritize them because the user may have to restart their program (who cares? <0.5 wink>), but because they demonstrate the language implementation is

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD and NetBSD

2017-10-29 Thread Tim Peters
Tim Peters added the comment: BTW, has anyone tried running a tiny C program on these platforms to see what tan(1.5707963267948961) delivers? The kind of code fdlibm uses is sensitive not only to compiler (mis)optimization, but also to stuff like how the FPU's "precision contr

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD and NetBSD

2017-11-01 Thread Tim Peters
Tim Peters added the comment: Since fdlibm uses tan(x) ~= -1/(x-pi/2) in this range, and the reciprocals of the bad results have a whole of bunch of trailing zero bits, my guess is that argument reduction (the "x-pi/2" part) is screwing up (losing bits of pi/2 beyond the long

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD and NetBSD

2017-11-01 Thread Tim Peters
Tim Peters added the comment: Oops! I mixed up `sin` and `cos` in that comment. If it's argument reduction that's broken, then for x near pi/2 cos(x) will be evaluated as -sin(x - pi/2), which is approximately -(x - pi/2), and so error in argument reduction (the "x - pi/2&q

[issue31889] difflib SequenceMatcher ratio() still have unpredictable behavior

2017-11-03 Thread Tim Peters
Tim Peters added the comment: Pass "autojunk=False" to your SequenceMatcher constructor and the ratio you get back will continue to increase as `i` increases. The docs: """ Automatic junk heuristic: SequenceMatcher supports a heuristic that automatically treats

[issue32042] Option for comparing values instead of reprs in doctest

2017-11-19 Thread Tim Peters
Tim Peters added the comment: `doctest` is intended to be anal - there are few things more pointlessly confusing for a user than to see docs that don't match what they actually see when they run the doc's examples. "Is it a bug? Did I do it wrong? Why can't they docum

[issue32042] Option for comparing values instead of reprs in doctest

2017-11-19 Thread Tim Peters
Tim Peters added the comment: Tomáš, of course you can combine testing methods any way you like. Don't oversell this - there's nothing actually magical about comparing objects instead of strings ;-) I'm only -0 on this. It grates a bit against doctest's original intent

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD and NetBSD

2017-11-19 Thread Tim Peters
Tim Peters added the comment: Best I can tell, the fdlibm 5.3 on netlib was released in 2002, and essentially stopped existing as a maintained project then. Everyone else copied the source code, and made their own changes independently ever since :-( At least the folks behind the Julia

[issue31630] math.tan has poor accuracy near pi/2 on OpenBSD and NetBSD

2017-11-20 Thread Tim Peters
Tim Peters added the comment: I have no opinion about any version of xxxBSD, because I've never used one ;-) If current versions of those do have this failure, has anyone opened a bug report on _their_ tracker(s)? I've seen no reason yet to imagine these failures are a fault

[issue32099] Use range in itertools roundrobin recipe

2017-11-20 Thread Tim Peters
Tim Peters added the comment: I agree the current recipe strikes a very nice balance among competing interests, and is educational on several counts. s/pending/numactive/ # Remove the iterator we just exhausted from the cycle. numactive -= 1 nexts = cycle(islice(nexts, numactive

[issue32171] Inconsistent results for fractional power of -infinity

2017-11-29 Thread Tim Peters
Tim Peters added the comment: As a comment in the referenced patch says, the intent of the patch was to make behavior match the C99 spec. Among other things, C99's annex F (section F.9.4.4 "The pow functions") says: """ — pow(−∞, y) returns −0 for y an odd int

[issue32171] Inconsistent results for fractional power of -infinity

2017-11-29 Thread Tim Peters
Tim Peters added the comment: No worries, Mark :-) Odd things happen sometimes when people are editing near the same time. BTW, of course I agree with closing this! -- ___ Python tracker <https://bugs.python.org/issue32

[issue32171] Inconsistent results for fractional power of -infinity

2017-11-30 Thread Tim Peters
Tim Peters added the comment: Mark, indeed, in the email from Vincent Lefevre you linked to, his entire argument was: (a) we already specified what happens when the base is a zero; so, (b) for each of the six pow(a_zero, y) cases we specified, derive a matching rule for an inf base via

[issue29710] Incorrect representation caveat on bitwise operation docs

2017-12-02 Thread Tim Peters
Tim Peters added the comment: To answer the old accusation ;-), no, this isn't my wording. I _always_ explain that Python's integer bit operations act as if the integers were stored in 2's-complement representation but with an infinite number of sign bits. That's

[issue32382] Python mulitiprocessing.Queue fail to get according to correct sequence

2017-12-20 Thread Tim Peters
Tim Peters added the comment: First thing: the code uses the global name `outputer` for two different things, as the name of a module function and as the global name given to the Process object running that function. At least on Windows under Python 3.6.4 that confusion prevents the

[issue32509] doctest syntax ambiguity between continuation line and ellipsis

2018-01-06 Thread Tim Peters
Tim Peters added the comment: Right, "..." immediately after a ">>>" line is taken to indicate a code continuation line, and there's no way to stop that short of rewriting the parser. The workaround you already found could be made more palatable if

[issue32509] doctest syntax ambiguity between continuation line and ellipsis

2018-01-06 Thread Tim Peters
Tim Peters added the comment: And I somehow managed to unsubscribe Steven :-( -- nosy: +steven.daprano ___ Python tracker <https://bugs.python.org/issue32

[issue32509] doctest syntax ambiguity between continuation line and ellipsis

2018-01-07 Thread Tim Peters
Tim Peters added the comment: Jason, an ellipsis will match an empty string. But if your expected output is: """ x... abcd ... """ you're asking for output that: - starts with "x" - followed by 0 or more of anything - FOLLOWED BY A NEWLINE (I t

[issue32509] doctest syntax ambiguity between continuation line and ellipsis

2018-01-07 Thread Tim Peters
Tim Peters added the comment: By the way, going back to your original problem, "the usual" solution to that different platforms can list directories in different orders is simply to sort the listing yourself. That's pretty easy in Python ;-) Then your test can verify the h

[issue33566] re.findall() dead locked whent the expected ending char not occur until end of string

2018-05-18 Thread Tim Peters
Tim Peters added the comment: Min, you need to give a complete example other people can actually run for themselves. Offhand, this part of the regexp (.|\s)* all by itself _can_ cause exponential-time behavior. You can run this for yourself: >>> import re >>> p = r"

[issue33572] False/True as dictionary keys treated as integers

2018-05-18 Thread Tim Peters
Tim Peters added the comment: I expect these docs date back to when ints, longs, and floats were the only hashable language-supplied types for which mixed-type comparison could ever return True. They could stand some updates ;-) `fractions.Fraction` and `decimal.Decimal` are more language

[issue33579] calendar.timegm not always an inverse of time.gmtime

2018-05-19 Thread Tim Peters
Tim Peters added the comment: They both look wrong to me. Under 3.6.5 on Win10, `one` and `three` are the same. Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32 time.struct_time(tm_year=2009, tm_mon=2, tm_mday=13, tm_hour=23, tm_min=31, tm_sec

[issue32832] doctest should support custom ps1/ps2 prompts

2018-05-27 Thread Tim Peters
Tim Peters added the comment: doctest was intended to deal with the standard CPython terminal shell. I'd like to keep it that way, but recognize that everyone wants to change everything into "a framework" ;-) How many other shells are there? As Sergey linked to, IPython alre

[issue32832] doctest should support custom ps1/ps2 prompts

2018-05-27 Thread Tim Peters
Tim Peters added the comment: Sergey, I understand that, but I don't care. The only people I've ever seen _use_ this are people writing an entirely different shell interface. They're rare. There's no value in complicating doctest to cater to theoretical use cases that

[issue32832] doctest should support custom ps1/ps2 prompts

2018-05-28 Thread Tim Peters
Tim Peters added the comment: You missed my point about IPython: forget "In/Out arrays, etc". What you suggest is inadequate for _just_ changing PS1/PS2 for IPython. Again, read their `parse()` function. They support _more than one_ set of PS1/PS2 conventions. So the code c

[issue21196] Name mangling example in Python tutorial

2018-06-06 Thread Tim Peters
Tim Peters added the comment: Berker Peksag's change (PR 5667) is very simple and, I think, helpful. -- nosy: +tim.peters ___ Python tracker <https://bugs.python.org/is

[issue33812] Different behavior between datetime.py and its C accelerator

2018-06-08 Thread Tim Peters
Tim Peters added the comment: The message isn't confusing - the definition of "aware" is confusing ;-) """ A datetime object d is aware if d.tzinfo is not None and d.tzinfo.utcoffset(d) does not return None. If d.tzinfo is None, or if d.tzinfo is not None but

[issue33812] Different behavior between datetime.py and its C accelerator

2018-06-08 Thread Tim Peters
Tim Peters added the comment: I copy/pasted the definitions of "aware" and "naive" from the docs. Your TZ's .utcoffset() returns None, so, yes, any datetime using an instance of that for its tzinfo is naive. In print(datetime(2000,1,1).astimezone(timezone.utc))

[issue33814] exec() maybe has a memory leak

2018-06-09 Thread Tim Peters
Tim Peters added the comment: Dan, your bug report is pretty much incoherent ;-) This standard Stack Overflow advice applies here too: https://stackoverflow.com/help/mcve Guessing your complaint is that: sys.getrefcount(itertools.repeat) keeps increasing by 1 across calls to `leaks

[issue33812] Different behavior between datetime.py and its C accelerator

2018-06-09 Thread Tim Peters
Tim Peters added the comment: I'd call it a bug fix, but I'm really not anal about what people call things ;-) -- ___ Python tracker <https://bugs.python.o

[issue33089] Add multi-dimensional Euclidean distance function to the math module

2018-06-24 Thread Tim Peters
Tim Peters added the comment: Raymond, I'd say scaling is vital (to prevent spurious infinities), but complications beyond that are questionable, slowing things down for an improvement in accuracy that may be of no actual benefit. Note that your original "simple homework problem

[issue24567] random.choice IndexError due to double-rounding

2018-06-24 Thread Tim Peters
Tim Peters added the comment: There are a couple bug reports here that have been open for years, and it's about time we closed them. My stance: if any platform still exists on which "double rounding" is still a potential problem, Python _configuration_ should be changed to

[issue24567] random.choice IndexError due to double-rounding

2018-06-25 Thread Tim Peters
Tim Peters added the comment: Mark, do you believe that 32-bit Linux uses a different libm? One that fails if, e.g., SSE2 were used instead? I don't know, but I'd sure be surprised it if did. Very surprised - compilers have been notoriously unpredictable in exactly when

[issue24567] random.choice IndexError due to double-rounding

2018-06-26 Thread Tim Peters
Tim Peters added the comment: Mark, ya, I agree it's most prudent to let sleeping dogs lie. In the one "real" complaint we got (issue 24546) the cause was never determined - but double rounding was ruled out in that specific case, and no _plausible_ cause was identified (sho

[issue24567] random.choice IndexError due to double-rounding

2018-06-26 Thread Tim Peters
Tim Peters added the comment: [Mark] > If we do this, can we also persuade Guido to Pronounce that > Python implementations assume IEEE 754 format and semantics > for floating-point? On its own, I don't think a change to force 53-bit precision _on_ 754 boxes would justify that

[issue24567] random.choice IndexError due to double-rounding

2018-06-26 Thread Tim Peters
Tim Peters added the comment: Victor, look at Raymond's patch. In Python 3, `randrange()` and friends already use the all-integer `getrandbits()`. He's changing three other lines, where some variant of `int(random() * someinteger)` is being used in an inner loop for speed. Pres

[issue24567] random.choice IndexError due to double-rounding

2018-06-26 Thread Tim Peters
Tim Peters added the comment: [Victor] > This method [shuffle()] has a weird API. What is > the point of passing a random function, > ... I proposed to deprecate this argument and remove it later. I don't care here. This is a bug report. Making backward-incompatible API

[issue34016] Bug in sort()

2018-07-01 Thread Tim Peters
Tim Peters added the comment: Lucas, as Mark said you're sorting _strings_ here, not sorting integers. Please study his reply. As strings, "10" is less than "9", because "1" is less than "9". >>> "10

[issue34100] Same integers in a tuple of constant literals are not merged

2018-07-11 Thread Tim Peters
Tim Peters added the comment: The language doesn't define anything about this - any program relying on accidental identity is in error itself. Still, it's nice if a code object's co_consts vector is as short as reasonably possible. That's a matter of pragmatics

[issue34100] Same integers in a tuple of constant literals are not merged

2018-07-11 Thread Tim Peters
Tim Peters added the comment: Fine, Serhiy, so reword it a tiny bit: it's nice if a code object's co_consts vector references as few distinct objects as possible. Still a matter of pragmatics, not of correctness. -- ___ Python track

[issue34109] Accumulator bug

2018-07-13 Thread Tim Peters
Tim Peters added the comment: ? I expect your code to return -1 about once per 7**4 = 2401 times, which would be about 400 times per million tries, which is what your output shows. If you start with -5, and randint(1, 7) returns 1 four times in a row, r5 is left at -5 + 4 = -1

[issue29710] Incorrect representation caveat on bitwise operation docs

2018-07-14 Thread Tim Peters
Tim Peters added the comment: Nick, that seems a decent compromise. "Infinite string of sign bits" is how Guido & I both thought of it when the semantics of longs were first defined, and others in this report apparently find it natural enough too. It also applies to all 6

[issue29710] Incorrect representation caveat on bitwise operation docs

2018-07-15 Thread Tim Peters
Tim Peters added the comment: Well, all 6 operations "are calculated as though carried out in two's complement with an infinite number of sign bits", so I'd float that part out of the footnote and into the main text. When, e.g., you're thinking of ints _as_ bit

[issue29710] Incorrect representation caveat on bitwise operation docs

2018-07-16 Thread Tim Peters
Tim Peters added the comment: Ya, Mark's got a point there. Perhaps s/the internal/a finite two's complement/ ? -- ___ Python tracker <https://bugs.python.o

[issue34168] RAM consumption too high using concurrent.futures (Python 3.7 / 3.6 )

2018-07-20 Thread Tim Peters
Tim Peters added the comment: If your `bucket` has 30 million items, then for element in bucket: executor.submit(kwargs['function']['name'], element, **kwargs) is going to create 30 million Future objects (and all the under-the-covers objects needed to mana

[issue34168] RAM consumption too high using concurrent.futures (Python 3.7 / 3.6 )

2018-07-20 Thread Tim Peters
Tim Peters added the comment: Note that you can consume multiple gigabytes of RAM with this simpler program too, and for the same reasons: """ import concurrent.futures as cf bucket = range(30_000_000) def _dns_query(target): from time import sleep sleep(0.1) def

[issue34180] bool(Q) always return True for a priority queue Q

2018-07-22 Thread Tim Peters
Tim Peters added the comment: I'm sure Guido designed the API to discourage subtly bug-ridden code relying on the mistaken belief that it _can_ know the queue's current size. In the general multi-threaded context Queue is intended to be used, the only thing `.qsize()`'s cal

[issue29710] Incorrect representation caveat on bitwise operation docs

2018-07-23 Thread Tim Peters
Tim Peters added the comment: @CuriousLearner, does the PR also include Nick's first suggested change? Here: """ 1. Replace the opening paragraph of https://docs.python.org/3/library/stdtypes.html#bitwise-operations-on-integer-types (the one I originally quoted whe

[issue29710] Incorrect representation caveat on bitwise operation docs

2018-07-23 Thread Tim Peters
Tim Peters added the comment: Nick suggested two changes on 2018-07-15 (look above). Mark & I agreed about the first change, so it wasn't mentioned again after that. All the rest has been refining the second change. -- ___ Pytho

[issue33113] Query performance is very low and can even lead to denial of service

2018-07-28 Thread Tim Peters
Tim Peters added the comment: Note: if you found a regexp like this _in_ the Python distribution, then a bug report would be appropriate. It's certainly possible to write regexps that can suffer catastrophic backtracking, and we've repaired a few of those, over the years, th

[issue33566] re.findall() dead locked whent the expected ending char not occur until end of string

2018-07-28 Thread Tim Peters
Tim Peters added the comment: Closing as not-a-bug - not enough info to reproduce, but the regexp looked prone to exponential-time backtracking to both MRAB and me, and there's been no response to requests for more info. -- components: +Regular Expressions nosy: +ezio.me

[issue34291] UnboundLocalError raised on call to global

2018-07-31 Thread Tim Peters
Tim Peters added the comment: Yes, the assignment does "hide the global definition of g". But this determination is made at compile time, not at run time: an assignment to `g` _anywhere_ inside `f()` makes _every_ appearance of `g` within `f()` local to `f`. -- nosy: +

[issue34376] Improve accuracy of math.hypot() and math.dist()

2018-08-10 Thread Tim Peters
Tim Peters added the comment: Not that it matters: "ulp" is a measure of absolute error, but the script is computing some notion of relative error and _calling_ that "ulp". It can understate the true ulp error by up to a factor of 2 (the "wobble" of base 2 f

[issue34376] Improve accuracy of math.hypot() and math.dist()

2018-08-11 Thread Tim Peters
Tim Peters added the comment: Thanks for doing the "real ulp" calc, Raymond! It was intended to make the Kahan gimmick look better, and it succeeded ;-) I don't personally care whether adding 10K things ends up with 50 ulp error, but to each their own. Division can be most

[issue34376] Improve accuracy of math.hypot() and math.dist()

2018-08-12 Thread Tim Peters
Tim Peters added the comment: Sure, if we make more assumptions. For 754 doubles, e.g., scaling isn't needed if `1e-100 < absmax < 1e100` unless there are a truly ludicrous number of points. Because, if that holds, the true sum is between 1e-200 and number_of_points*1e200, bo

[issue34397] remove redundant overflow checks in tuple and list implementations

2018-08-14 Thread Tim Peters
Tim Peters added the comment: I agree there's pointless code now, but don't understand why the patch replaces it with mysterious asserts. For example, what's the point of this? assert(Py_SIZE(a) <= PY_SSIZE_T_MAX / sizeof(PyObject*)); assert(Py_SIZE(b) <= PY_SSIZE_T_

[issue34397] remove redundant overflow checks in tuple and list implementations

2018-08-14 Thread Tim Peters
Tim Peters added the comment: Bah - the relevant thing to assert is really assert((size_t)Py_SIZE(a) + (size_t)Py_SIZE(b) <= (size_t)PY_SSIZE_T_MAX); C sucks ;-) -- ___ Python tracker <https://bugs.python.org/issu

[issue34561] Replace list sorting merge_collapse()?

2018-08-31 Thread Tim Peters
New submission from Tim Peters : The invariants on the run-length stack are uncomfortably subtle. There was a flap a while back when an attempt at a formal correctness proof uncovered that the _intended_ invariants weren't always maintained. That was easily repaired (as the resear

[issue34561] Replace list sorting merge_collapse()?

2018-09-01 Thread Tim Peters
Tim Peters added the comment: The attached runstack.py models the relevant parts of timsort's current merge_collapse and the proposed 2-merge. Barring conceptual or coding errors, they appear to behave much the same with respect to "total cost", with no clear overall win

[issue34561] Replace list sorting merge_collapse()?

2018-09-03 Thread Tim Peters
Tim Peters added the comment: Looks like all sorts of academics are exercised over the run-merging order now. Here's a paper that's unhappy because timsort's strategy, and 2-merge too, aren't always near-optimal with respect to the entropy of the distribution of

[issue34561] Replace list sorting merge_collapse()?

2018-09-04 Thread Tim Peters
Tim Peters added the comment: "Galloping" is the heart & soul of Python's sorting algorithm. It's explained in detail here: https://github.com/python/cpython/blob/master/Objects/listsort.txt The Java fork of the sorting code has had repeated bugs due to reducing

[issue34561] Replace list sorting merge_collapse()?

2018-09-04 Thread Tim Peters
Tim Peters added the comment: A new version of the file models a version of the `powersort` merge ordering too. It clearly dominates timsort and 2-merge in all cases tried, for this notion of "cost". Against it, its code is much more complex, and the algorithm is very far fro

[issue34561] Replace list sorting merge_collapse()?

2018-09-06 Thread Tim Peters
Tim Peters added the comment: The notion of cost is that merging runs of lengths A and B has "cost" A+B, period. Nothing to do with logarithms. Merge runs of lengths 1 and 1000, and it has cost 1001. They don't care about galloping, only about how the order in which merges

[issue34561] Replace list sorting merge_collapse()?

2018-09-06 Thread Tim Peters
Tim Peters added the comment: No, there's no requirement that run lengths on the stack be ordered in any way by magnitude. That's simply one rule timsort uses, as well as 2-merge and various other schemes discussed in papers. powersort has no such rule, and that's fine. Re

[issue34691] _contextvars missing in x64 master branch Windows build?

2018-09-14 Thread Tim Peters
New submission from Tim Peters : Using Visual Studio 2017 to build the current master branch of Python (something I'm trying for the first time in about two years - maybe I'm missing something obvious!), with the x64 target, under both the Release and Debug builds I get a Python

[issue34561] Replace list sorting merge_collapse()?

2018-09-15 Thread Tim Peters
Tim Peters added the comment: New version of runstack.py. - Reworked code to reflect that Python's sort uses (start_offset, run_length) pairs to record runs. - Two unbounded-integer power implementations, one using a loop and the other division. The loop version implies that, in Pyt

[issue34561] Replace list sorting merge_collapse()?

2018-09-16 Thread Tim Peters
Tim Peters added the comment: Another runstack.py adds a bad case for 2-merge, and an even worse (percentage-wise) bad case for timsort. powersort happens to be optimal for both. So they all have contrived bad cases now. powersort's bad cases are the least bad. So far ;-) But I e

[issue34691] _contextvars missing in x64 master branch Windows build?

2018-09-17 Thread Tim Peters
Tim Peters added the comment: FYI, I bet I didn't see a problem with the Win32 target because I followed instructions ;-) and did my first build using build.bat. Using that for the x64 too target makes the problem go away. -- ___ Python tr

[issue34659] Inconsistency between functools.reduce & itertools.accumulate

2018-09-17 Thread Tim Peters
Tim Peters added the comment: Ya, I care: `None` was always intended to be an explicit way to say "nothing here", and using unique non-None sentinels instead for that purpose is needlessly convoluted. `initial=None` is perfect. But then I'm old & in the way ;

[issue34751] Hash collisions for tuples

2018-09-20 Thread Tim Peters
Tim Peters added the comment: @jdemeyer, please define exactly what you mean by "Bernstein hash". Bernstein has authored many hashes, and none on his current hash page could possibly be called "simple": https://cr.yp.to/hash.html If you're talking about the

[issue34751] Hash collisions for tuples

2018-09-20 Thread Tim Peters
Tim Peters added the comment: Ah! I see that the original SourceForge bug report got duplicated on this tracker, as PR #942952. So clicking on that is a lot easier than digging thru the mail archive. One message there noted that replacing xor with addition made collision statistics much

[issue34751] Hash collisions for tuples

2018-09-20 Thread Tim Peters
Change by Tim Peters : -- nosy: +ned.deily ___ Python tracker <https://bugs.python.org/issue34751> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue34751] Hash collisions for tuples

2018-09-20 Thread Tim Peters
Tim Peters added the comment: @jdemeyer, you didn't submit a patch, or give any hint that you _might_. It _looked_ like you wanted other people to do all the work, based on a contrived example and a vague suggestion. And we already knew from history that "a simple Bernstein has

[issue34751] Hash collisions for tuples

2018-09-20 Thread Tim Peters
Tim Peters added the comment: You said it yourself: "It's not hard to come up with ...". That's not what "real life" means. Here: >>> len(set(hash(1 << i) for i in range(100_000))) 61 Wow! Only 61 hash codes across 100 thousand distinct int

[issue34751] Hash collisions for tuples

2018-09-21 Thread Tim Peters
Tim Peters added the comment: For me, it's largely because you make raw assertions with extreme confidence that the first thing you think of off the top of your head can't possibly make anything else worse. When it turns out it does make some things worse, you're equally con

[issue34751] Hash collisions for tuples

2018-09-21 Thread Tim Peters
Tim Peters added the comment: Oops! """ "j odd implies j^(-2) == -j, so that m*(j^(-2)) == -m" """ The tail end should say "m*(j^(-2)) == -m*j" instead. -- ___ P

[issue34561] Replace list sorting merge_collapse()?

2018-09-21 Thread Tim Peters
Tim Peters added the comment: Thank you, Vincent! I very much enjoyed - and appreciated - your paper I referenced at the start. Way back when, I thought I had a proof of O(N log N), but never wrote it up because some details weren't convincing - even to me ;-) . Then I had to move

[issue34751] Hash collisions for tuples

2018-09-21 Thread Tim Peters
Tim Peters added the comment: >> Why do you claim the original was "too small"? Too small for >> what purpose? > If the multiplier is too small, then the resulting hash values are > small too. This causes collisions to appear for smaller numbers: All right! An

[issue34397] remove redundant overflow checks in tuple and list implementations

2018-09-21 Thread Tim Peters
Tim Peters added the comment: Because the behavior of signed integer overflow isn't defined in C. Picture a 3-bit integer type, where the maximum value of the signed integer type is 3. 3+3 has no defined result. Cast them to the unsigned flavor of the integer type, though, and the r

[issue34751] Hash collisions for tuples

2018-09-22 Thread Tim Peters
Tim Peters added the comment: So you don't know of any directly relevant research either. "Offhand I can't see anything wrong" is better than nothing, but very far from "and we know it will be OK because [see references 1 and 2]". That Bernstein's DJBX3

[issue34751] Hash collisions for tuples

2018-09-22 Thread Tim Peters
Tim Peters added the comment: I strive not to believe anything in the absence of evidence ;-) FNV-1a supplanted Bernstein's scheme in many projects because it works better. Indeed, Python itself used FNV for string hashing before the security wonks got exercised over collision attacks

[issue34751] Hash collisions for tuples

2018-09-22 Thread Tim Peters
Tim Peters added the comment: Raymond, I share your concerns. There's no reason at all to make gratuitous changes (like dropping the "post-addition of a constant and incorporating length signature"), apart from that there's no apparent reason for them existing to begin

[issue34751] Hash collisions for tuples

2018-09-23 Thread Tim Peters
Tim Peters added the comment: Oh, I don't agree that it's "broken" either. There's still no real-world test case here demonstrating catastrophic behavior, neither even a contrived test case demonstrating that, nor a coherent characterization of what "the proble

[issue34751] Hash collisions for tuples

2018-09-23 Thread Tim Peters
Tim Peters added the comment: Has anyone figured out the real source of the degeneration when mixing in negative integers? I have not. XOR always permutes the hash range - it's one-to-one. No possible outputs are lost, and XOR with a negative int isn't "obviously degener

[issue34751] Hash collisions for tuples

2018-09-23 Thread Tim Peters
Tim Peters added the comment: [Raymond, on boosting the multiplier on 64-bit boxes] > Yes, that would be perfectly reasonable (though to some > extent the objects in the tuple also share some of the > responsibility for getting all bits into play). It's of value independent of

[issue34751] Hash collisions for tuples

2018-09-23 Thread Tim Peters
Tim Peters added the comment: FYI, using this for the guts of the tuple hash works well on everything we've discussed. In particular, no collisions in the current test_tuple hash test, and none either in the cases mixing negative and positive little ints. This all remains so usin

[issue34751] Hash collisions for tuples

2018-09-23 Thread Tim Peters
Tim Peters added the comment: BTW, those tests were all done under a 64-bit build. Some differences in a 32-bit build: 1. The test_tuple hash test started with 6 collisions. With the change, it went down to 4. Also changing to the FNV-1a 32-bit multiplier boosted it to 8. The test

[issue34751] Hash collisions for tuples

2018-09-24 Thread Tim Peters
Tim Peters added the comment: > when you do t ^= t << 7, then you are not changing > the lower 7 bits at all. I want to leave low-order hash bits alone. That's deliberate. The most important tuple component types, for tuples that are hashable, are strings and contiguous ra

[issue34751] Hash collisions for tuples

2018-09-24 Thread Tim Peters
Tim Peters added the comment: Jeroen, I understood the part about -2 from your initial report ;-) That's why the last code I posted didn't use -2 at all (neither -1, which hashes to -2). None of the very many colliding tuples contained -2 in any form. For example, these 8 tuple

[issue34751] Hash collisions for tuples

2018-09-24 Thread Tim Peters
Tim Peters added the comment: > advantage of my approach is that high-order bits become more > important: I don't much care about high-order bits, beyond that we don't systematically _lose_ them. The dict and set lookup routines have their own strategies for incorporating

[issue34751] Hash collisions for tuples

2018-09-24 Thread Tim Peters
Tim Peters added the comment: Just noting that this Bernstein-like variant appears to work as well as the FNV-1a version in all the goofy ;-) endcase tests I've accumulated: while (--len >= 0) { y = PyObject_Hash(*p++); if (y == -1) r

<    3   4   5   6   7   8   9   10   11   12   >