Tim Peters added the comment:
And one more:
x = (x * mult) ^ t;
also appears to work equally well. So, way back when, it appears we _could_
have wormed around the disaster du jour just by applying a shift-xor
permutation to the raw hash results.
Note the implication: if we
Tim Peters added the comment:
> Suppose that there is a hash collision, say hash((3, 3)) ==
> hash((-3, -3)) and you change the hashing algorithm to fix
> this collision.
There are _two_ hash functions at play in that collision: the tuple hash
function, and the integer hash
Tim Peters added the comment:
>> j is even implies (j ^ -3) == -(j ^ 3)
> This follows from what I posted before: if j is even, then
> j ^ 3 is odd, so we can apply the rule x ^ -2 = -x to x = j ^ 3
> ...
Thanks! That helps a lot. I had a blind spot there. This kind of th
Tim Peters added the comment:
High-order bit: please restore the original tuple hash test. You have the
worst case of "but I didn't write it" I've ever encountered ;-) Your new test
is valuable, but I've seen several cases now where it fails to detect any
problem
Tim Peters added the comment:
>> The two-liner above with the xor in the second line is
>> exactly Bernstein 33A, followed by a permutation
>> of 33A's _output_ space.
> Not output space, but internal state
? 33A's output _is_ its internal state at the e
Tim Peters added the comment:
I should have spelled this out before: these are all permutations, so in
general permuting the result space of `x * mult + y` (or any other permutation
involving x and y) is exactly the same as not permuting it but applying a
different permutation to y instead
Tim Peters added the comment:
Perhaps worth noting that FNV-1a works great if used as _intended_: on a
stream of unsigned bytes. All tests except the new tuple hash test suffer no
collisions; the new test suffers 14. Nothing is needed to try to worm around
nested tuple catastrophes, or
Tim Peters added the comment:
Also worth noting: other projects need to combine hashes too. Here's a 64-bit
version of the highly regarded C++ Boost[1] library's hash_combine() function
(I replaced its 32-bit int literal with "a random" 64-bit one):
Tim Peters added the comment:
[Tim]
> Perhaps worth noting that FNV-1a works great if used as
> _intended_: on a stream of unsigned bytes.
> ...
>
>Py_uhash_t t = (Py_uhash_t)y;
>for (int i = 0; i < sizeof(t); ++i) {
>x = (x ^ (t & 0xff)) *
Tim Peters added the comment:
Jeroen, thanks for helping us fly slightly less blind! ;-) It's a lot of work.
I'd say you may as well pick a prime. It's folklore, but "a reason" is that
you've discovered that regular bit patterns in multipliers can hurt, and
Tim Peters added the comment:
An "aha!" moment for me: I noted before that replacing the tuple hash loop with
Py_uhash_t t = (Py_uhash_t)y;
t ^= t << 1;
x = (x * mult) + t;
does OK (64-bit build): most tests had no collisions, but the original tuple
test
Tim Peters added the comment:
Noting that the failure of high-order bits to propagate is one form of
"avalanche" failure. A recent 64-bit hash, SeaHash, takes this approach which
is said to have provably "perfect" avalanche behavior:
Py_uhash_t t = (Py_uhash_t)y;
Tim Peters added the comment:
If SeaHash is interesting to us (it is to me), I suggest dropping our DJB/FNV
combining part entirely and using a purer form, like so:
Py_uhash_t t = (Py_uhash_t)y;
x ^= t ^ (t << 1);
x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
x ^= (x >&g
Tim Peters added the comment:
>Py_uhash_t t = (Py_uhash_t)y;
>x ^= t ^ (t << 1);
>x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
>x ^= (x >> 32) >> (x >> 60);
>x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
Comment out either one of the multip
Tim Peters added the comment:
A meta-comment: a great deal of work has been done on non-crypto hashes in
recent years. I'm not interested in rolling our own if one of the "winners"
from that process can be adapted. For that reason, I've only been looking at
those
Tim Peters added the comment:
>> the author wants this transformation to be easily
>> invertible, so a prime is necessary
> A multiplication by any odd number modulo 2**64 is
> invertible.
Right! My mistake.
> As I argued before, the concept of primes is
> mea
Tim Peters added the comment:
> So my suggestion remains
>
> for y in INPUT:
>t = hash(y)
>t ^= t * SOME_LARGE_EVEN_NUMBER
>h ^= t
>h *= MULTIPLIER
On the face of it, I'd be amazed if that passed SMHasher, because it only
propagates bits "to th
Tim Peters added the comment:
> If we, e.g., tested tuples of little floats instead ...
Speaking of which:
>>> from itertools import product
>>> len(set(map(hash, product([0.5, 0.25], repeat=20
32
32 hash codes out of 1048576 distinct two-tuples isn't exa
Tim Peters added the comment:
>> I've noted before, e.g., that sticking to a prime
>> eliminates a world of regular bit patterns in the
>> multiplier.
> Why do you think this? 0x1fff is prime :-)
Point taken ;-) But "a world of" is not the s
Tim Peters added the comment:
>> For that reason, I've only been looking at those that
>> scored 10 (best possible) on Appleby's SMHasher[1] test suite
> Do you have a list of such hash functions?
A world of searching. The ones rated "Excellent" here (pre
Tim Peters added the comment:
> >>> from itertools import product
> >>> len(set(map(hash, product([0.5, 0.25], repeat=20
> 32
> Good catch! Would you like me to add this to the testsuite?
It's in mine already ;-) I've added all the "bad ex
Tim Peters added the comment:
> So it seems that this SMHasher test suite doesn't
> catch the problem that we're seeing with negative integers.
Seems to be so, but I've never run SMHasher myself. I believe it's focused on
statistical properties, like aval
Tim Peters added the comment:
FYI, this appears to be a good overview of what SMHasher is looking for:
https://github.com/aappleby/smhasher/wiki/SMHasher
Someg of everything: performance, correctness, statistical measures, and
specific kinds of keysets that have proved problematic for other
Tim Peters added the comment:
> SeaHash seems to be designed for 64 bits.
Absolutely.
> I'm guessing that replacing the shifts by
>
> x ^= ((x >> 16) >> (x >> 29))
>
> would be what you'd do for a 32-bit hash.
My guess too. But "the pri
Tim Peters added the comment:
Thanks for looking at xxHash! An advantage is that it already comes in 32- and
64-bit versions.
> A (simplified and slightly modified version of) xxHash
> seems to work very well, much better than SeaHash.
? I've posted several SeaHash cores tha
Tim Peters added the comment:
Here's a complete xxHash-based implementation via throwing away C++-isms from
https://github.com/RedSpah/xxhash_cpp/blob/master/xxhash/xxhash.hpp
In the 64-bit build there are no collisions across my tests except for 11 in
the new tuple test.
The 32-bit
Tim Peters added the comment:
> Note that I'm always considering parametrized
> versions of the hash functions that I'm testing.
> I'm replacing the fixed multiplier (all algorithms
> mentioned here have such a thing) by a random
> multiplier which is 3 mod 8.
Tim Peters added the comment:
>> In the 64-bit build there are no collisions across my
>> tests except for 11 in the new tuple test.
> That's pretty bad actually. With 64 bits, you statistically
> expect something in the order of 10**-8 collisions. So
> what yo
Change by Tim Peters :
--
components: +Interpreter Core -XML
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsub
Tim Peters added the comment:
>> people already wrote substantial test suites dedicated
>> to that sole purpose, and we should aim to be "mere
>> consumers" of functions that pass _those_ tests.
> There are hash functions that pass those tests which
> are
Tim Peters added the comment:
Add a comment along the lines you (Terry) suggested. Some people need to write
doctests that run under many versions of Python, so the info is still supremely
relevant to them.
--
nosy: +tim.peters
___
Python
Tim Peters added the comment:
Stephane, it's not deep. People who need to write doctests that work across N
versions of Python shouldn't need to read N versions of the documentation.
This is hardly unique to doctest. We routinely add "Changed in version m.n"
blurb
Tim Peters added the comment:
As a sanity check, I tried the 32-bit version of MurmurHash2 (version 3 doesn't
have a 32-bit version). All of my Python tests had collisions, and while the
new tuple test dropped to 15, product([0.5, 0.25], repeat=20) skyrocketed from
141 (32-bit xxHas
Tim Peters added the comment:
Thinking about the way xxHash works prompted me to try this initialization
change:
x = 0x345678UL + (Py_uhash_t)len; // adding in the length is new
That was a pure win in my tests, slashing the number of collisions in the new
tuple test, 32-bit build, from
Tim Peters added the comment:
Attaching htest.py so we have a common way to compare what various things do on
the tests that have been suggested. unittest sucks for that. doctest too.
Here's current code output from a 32-bit build; "ideally" we want "got" val
Tim Peters added the comment:
Here's htest.py output from
git pull https://github.com/jdemeyer/cpython.git bpo34751
and a variation. The number of collisions in the variation appear in square
brackets at the end of each line.
32-bit build:
range(100) by 3; 32-bit hash codes; mean 1
Tim Peters added the comment:
We need to worry about timing too :-( I'm using this as a test. It's very
heavy on using 3-tuples of little ints as dict keys. Getting hash codes for
ints is relatively cheap, and there's not much else going on, so this is
intended to be ve
Tim Peters added the comment:
>> Changes initialization to add in the length:
> What's the rationale for that change? You always asked
> me to stay as close as possible to the "official" hash function
> which adds in the length at the end. Is there an actual
Tim Peters added the comment:
This comes up every few years, but that's about it. Here's the iteration from
2 years ago:
https://mail.python.org/pipermail/python-ideas/2016-October/043039.html
Follow the thread. It contains easy-to-use wrappers for both "do it in
multipl
Tim Peters added the comment:
This won't be changed. The effect on the iteration order of adding and/or
removing keys from a dict while iterating over it has never been defined in
Python, and never will be. The Python 3 docs for dict views say this clearly:
"""
I
Tim Peters added the comment:
Questions about Python should be asked, e.g., on the Python mailing list.
The short course is that it's desired that iterating over dicts be as fast as
possible, and nobody knows a way to make iteration robust in the face of
mutations that wouldn
Tim Peters added the comment:
Not without more expense. Which is why it hasn't been done. But this is a
wrong place to talk about it. If you want Python to change, go to the
python-ideas mailing list?
https://mail.python.org/mailman/listinfo/python-
Tim Peters added the comment:
Sorry, I find this too hard to follow. At the end:
> ['TA', 'CA'] # Missing the substring 'CA'
the comment claims it's missing 'CA', while the output plainly shows it's _not_
missing 'CA'.
Tim Peters added the comment:
I don't object to spelling it out more, and your (Terry's) suggestions are
fine. On the other hand, this module has been around for a lng time now,
and this is the first instance I've seen of someone surprised that it doesn't
produce
Tim Peters added the comment:
This doesn't actually matter - the code can never trigger. It would be fine to
replace it with an assert to that effect (see below for a specific suggestion).
The reason: The indices in this code are into vectors of PyObject*. These
vectors can'
Tim Peters added the comment:
I left the code in because it was harmless (a 100%-predictable branch), and it
was easier to show that overflow was _considered_ than to explain in full why
it was impossible. In the context of CPython. For example, the Java port of
this code couldn't re
Tim Peters added the comment:
As section 2.4.1. (String literals) of the Python reference manual says,
"... (even a raw string cannot end in an odd number of backslashes).
Specifically, a raw string cannot end in a single backslash (since the
backslash would escape the following
Tim Peters added the comment:
Looks fine to me. The third one uses native size and alignment (see the "Byte
Order, Size, and Alignment" section of the struct docs). After the first 5
B's, a pad byte is inserted so that the next H is properly aligned (to a 2-byte
boundary)
Tim Peters added the comment:
Well, a timedelta is a duration. timedelta // n is as close as possible to one
n'th of that duration, but "rounding down" (if necessary) so that the result is
representable as a timedelta. In the same way, if i and j are integers, i // j
is as cl
Changes by Tim Peters :
--
nosy: +tim_one
___
Python tracker
<http://bugs.python.org/issue18647>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
Serhiy, I don't see the regexp '(?:.*$\n?)*' anywhere in doctest.py. Are you
talking about the _EXAMPLE_RE regexp? That's the closest I see.
If that's the case, the "nothing to repeat" error is incorrect: _EXAMPLE_RE
Tim Peters added the comment:
Serhiy, I'm asking you to be very explicit about which regexp in doctest.py
you're talking about. If you're talking about the _EXAMPLE_RE regexp, I
already explained what's going on with that. If you're talking about some
other regexp,
Tim Peters added the comment:
I'm afraid it's just too tricky for the code to deduce that a negative
lookahead assertion can imply that a later match can't be empty. But I don't
know how smart the re compilation code already is ;-)
It occurs to me now that the docte
Tim Peters added the comment:
Matching an empty string an unbounded number of times isn't a case of
exponential runtime, it's a case of infinite runtime, unless the regexp
internals are smart enough to cut the search off (I don't know enough about
re's internals to kno
Tim Peters added the comment:
FWIW, I like this. It would be a nice addition to the itertools module. +1
The `key` argument should be renamed to `pred`, as others have said.
As to the name, I like "first_true". Does what it says. Plain "first" is
misleading, an
Tim Peters added the comment:
Matthew, yes, I agree that a regexp engine can be coded to be robust against
unboundedly repeated matching of an empty string. I don't know whether
Python's current engine is so coded. It's easy to trick the 2.7 engine into
accepting regexps
Tim Peters added the comment:
> skipping the underscore ("firsttrue" or "nexttrue")
> would arguably be more consistent with other itertools names.
Like izip_longest and combinations_with_replacement? ;-)
I care about readability here more than consistency with th
Tim Peters added the comment:
> Python's current regex engine isn't so coded. That's
> the reason for the up-front check.
It's peculiar then that nobody noticed before now that the check was so badly
broken ;-)
Offhand, do you have an example that displays bad beh
Tim Peters added the comment:
Serhiy, yup, that regexp is slow, but it does finish - so the engine is doing
something to avoid _unbounded_ repetitive matching of an empty string.
Change it to
(?:.?.+)*y
and the group can no longer match an empty string, but it's still slow
(although
Tim Peters added the comment:
Serhiy, yes, I know the regexp you gave takes exponential time. But:
1. This appears to have nothing to do with repeated 0-length matches. I gave
you an example of a very similar regexp that also takes exponential time, but
never makes any 0-length sub-match
Tim Peters added the comment:
So does anyone believe this check serves a useful purpose _now_? Doesn't seem
so to me.
--
___
Python tracker
<http://bugs.python.org/is
Tim Peters added the comment:
Victor, Wikipedia has a readable explanation:
http://en.wikipedia.org/wiki/NTFS_junction_point
I haven't used them much. From cmd.exe, I've been able to delete them, not
with "del" but with "rmdir". You can create one from cmd.
Tim Peters added the comment:
Yup - 2.7 evaluates this in a less precise way, as
log(10L) = log(10./16 * 2**4) = log(0.625) + log(2)*4
>>> log(10L) == log(0.625) + log(2)*4
True
This patterns works well even for longs that are far too large to represent as
a double; e.g.,
&
Tim Peters added the comment:
FYI, the improvement was made in these 2 changesets:
c8dc4b5e54fb
a9349fd3d0f7
If desired, those could presumably be grafted back into the 2.7 branch.
The commit messages refer to issue #9599, but that appears to be mistaken
Tim Peters added the comment:
OK, the correct one is issue #9959.
--
___
Python tracker
<http://bugs.python.org/issue18739>
___
___
Python-bugs-list mailin
Tim Peters added the comment:
+1 on fixing it in 2.7, for the reasons Mark gave.
Way back when I introduced the original scheme, log(a_long) raised an
exception, and the `int` and `long` types had a much higher wall between them.
The original scheme changed an annoying failure into a "p
Tim Peters added the comment:
The docs look correct to me. For example,
>>> from datetime import *
>>> td = datetime.now() - datetime(1,1,1)
>>> td
datetime.timedelta(735103, 61756, 484000)
>>>
Tim Peters added the comment:
Python's debug-mode memory allocators add some "magic values" before and after
each allocated chunk of memory, and check them when the chunk is freed to make
sure nobody overwrote them. In this case, someone did overwrite the byte at
p-5, where p
Tim Peters added the comment:
Impossible to know, but since everything in the traceback comes from
matplotlib, the error is most likely in matplotlib.
--
nosy: +tim.peters
___
Python tracker
<http://bugs.python.org/issue18
Tim Peters added the comment:
Memory corruption can be difficult to track down. Best thing you can do is
strive to find a test case as small and fast as possible that shows the same
kind of error.
By "rogue extension module" I just mean 3rd-party C code (like, for example,
matpl
Tim Peters added the comment:
By the way, if memory serves, compiling with --with-pydebug changes the memory
layout of Python objects, so a Python compiled this way _cannot_ be used
successfully with extension modules that were compiled without the same
options. Did you rebuild your
Tim Peters added the comment:
Well, if you delete a giant list, and the list held the only references
remaining to the objects the list contained, then the memory for those objects
will be free'd, one object at a time. A debug build would then detect the
memory corruption in those ob
Tim Peters added the comment:
Note that the same poster is also reporting memory corruption in issue 18843.
I suggest ignoring this one unless/until the earlier bug is resolved (memory
corruption can easily cause a segfault - or any other kind of error
Tim Peters added the comment:
Did you read Misc/README.valgrind (in the Python tree)? The warnings you've
seen so far are probably all nonsense, and README.valgrind explains how to
avoid getting them.
--
___
Python tracker
<http://bugs.py
Tim Peters added the comment:
I don't know why there isn't a configure switch for this - but then I've never
used valgrind - LOL ;-)
Other developers use valgrind on Python routinely, though. So it's unlikely
you'll find a legitimate problem _in Pyt
New submission from Tim Peters:
In issue 18843 a user noted that Misc/README.valgrind doesn't mention the
--with-valgrind configure option. It probably should. But since I've never
used valgrind, I'm not the guy to do it ;-)
--
components: Build
messages: 196338
n
Tim Peters added the comment:
I opened issue 18859 about the lack of --with-valgrind info in
Misc/README.valgrind. Thanks for noticing!
--
___
Python tracker
<http://bugs.python.org/issue18
Changes by Tim Peters :
--
keywords: +easy
___
Python tracker
<http://bugs.python.org/issue18859>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
Hmm. I don't quite know what you're doing: you said you're getting away from
--with-pydebug, but these "bad leading pad byte" messages can't be generated
unless Python is compiled with (at least) PYMALLOC_DEBUG defined.
That sa
Tim Peters added the comment:
Yet Another Tool ;-) Python's "small object" allocator grabs memory in chunks
of 256KB from the system, and carves up the space itself. Other memory tools
(like Valgrind ...) only see that Python has grabbed 256KB chunks, so can't
detect any
Tim Peters added the comment:
It would be a severely lame OS that allowed a process to overwrite another
process's memory ;-) "Bad C or C++ code", in the process you're running, is
still the best guess.
A memory module that sometimes dropped the last bit _could_ be at fa
New submission from Tim Peters:
In
http://bugs.python.org/issue18843
a user reported a debug PyMalloc "bad leading pad byte" memory
corruption death while running their code. After some thrashing, they
decided to rebuild Python, and got the same kind of error while
rebuilding Py
Changes by Tim Peters :
--
resolution: -> invalid
___
Python tracker
<http://bugs.python.org/issue18881>
___
___
Python-bugs-list mailing list
Unsubscri
Changes by Tim Peters :
--
stage: -> committed/rejected
___
Python tracker
<http://bugs.python.org/issue18881>
___
___
Python-bugs-list mailing list
Unsubscri
Changes by Tim Peters :
--
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue18881>
___
___
Python-bugs-list mailing list
Unsubscri
Tim Peters added the comment:
I sent a msg to Python-Dev, asking for a Gentoo user to try to reproduce this.
Thanks!
--
___
Python tracker
<http://bugs.python.org/issue18
Tim Peters added the comment:
Thanks for chiming in, Stephen! I can't answer your questions, but agree the
original poster needs to tell you exactly what he did -- else we'd just be
thrashing at random. There's been enough
Tim Peters added the comment:
Martin, can you please supply exact commands Stephen can use to try to
reproduce the problem you saw running `emerge`? And try them yourself first,
to make sure you can reproduce the problem too.
Any detail may be important. For example, it's possible th
Tim Peters added the comment:
The matplotlib people won't care about this one either. matplotlib allocated
the memory, and the error message at the end says it's _trying_ to call
_PyMem_DebugFree (which may well be trying to release the memory), but the
binaries aren
Tim Peters added the comment:
Martin, would it be possible to borrow someone else's machine and try to
reproduce this? If you can, that would greatly reduce the probability of this
being a HW error. It would also leave us with an exact set of commands to
share so others can try it on
Tim Peters added the comment:
Now you have something to show the matplotlib folks - although they're not
likely to get excited about leaking 40 bytes.
There is nothing Python can do about this. matplotlib is responsible for
free'ing the memory matplotlib allocates, just as
Tim Peters added the comment:
OK, it sounds to me like you do not have a reproducible test case, of any kind.
It that's true, this bug report isn't going anywhere :-(
Python isn't a memory-testing program, so it would be insanely inefficient for
it to (for example) read u
Tim Peters added the comment:
As issue 18843 has evolved, seems more likely now that it's flaky HW, but
agreed in any case there's really no evidence of a Python problem here. So
closing it. Martin, we can revisit this if there's real progress on the other
issue.
--
Tim Peters added the comment:
Nasty problem ;-)
I don't understand the need for all the indirections in the second patch.
Like, why use a weakref? It's not like we have to worry about an immortal
tstate keeping a measly little lock object alive forever, right? Seems to me
t
Tim Peters added the comment:
Someone may find the new stress.valgrind.stderr interesting, but - since I've
never used valgrind - it doesn't mean much to me.
I _expected_ you'd run the little stress program under a debug Python and
without valgrind, since that's the onl
Tim Peters added the comment:
I'm getting a headache now - LOL ;-) Thanks for the explanation!
What I still don't understand: the new lock is an internal implementation
detail. How would it gain a weakref with a callback? Users aren't going to
mess with this lock, and if y
Tim Peters added the comment:
Note this line in your first post:
DUMA Aborting: mprotect() failed: Cannot allocate memory.
Python never calls mprotect(), but DUMA() probably does. Also note what it
said after that:
Check README section 'MEMORY USAGE AND EXECUTION SPEED'
Tim Peters added the comment:
Thanks for that, Stephen! I don't know of anything else you could try that
would be helpful. The OP doesn't appear able to reproduce his problems either,
and last I heard he was off running `emerge` under DUMA:
http://duma.sourceforge.net/
Tim Peters added the comment:
All the buildbots are failing due to changeset 868ad6fa8e68 - I'm going to back
it out.
--
nosy: +tim.peters
___
Python tracker
<http://bugs.python.org/is
Tim Peters added the comment:
Suggest caution here. test_sax fails the same way for me too (Windows Vista),
under both the released 3.3.2 and a Python built from the current hg default
branch.
However, these files (test.xml and test.xml.out) have not changed since the
Python 2.7 line - the
801 - 900 of 1332 matches
Mail list logo