Jeroen Demeyer added the comment:
> Which was your original suggestion. Which you appear to be opposed to now?
> I'm unclear about why, if so.
I'm not strictly opposed to that. It's just that I have less confidence in the
current ad-hoc hash compared to something
Jeroen Demeyer added the comment:
> I'm not aware of any research papers about picking multipliers in this
> context, but would love to see one.
The only real condition that I can think of is that the order should be large:
we do not want MULTIPLIER**n = 1 (mod 2**N) for a sma
Jeroen Demeyer added the comment:
Thanks for the reference, I never heard of the FNV hash. Unfortunately, I
haven't seen a reference for the rationale of how they pick their multiplier.
It's not clear what you are suggesting though. Keep using the FNV-ish hash
somehow? Or using
Jeroen Demeyer added the comment:
> the made-up hacks Python used to worm around a class of gross flaws its prior
> DJBX33X approach suffered, taking DJBX33X out of its original context and
> applying it in an area it wasn't designed for.
But we know why the DJBX33A hash didn&
Jeroen Demeyer added the comment:
> We shouldn't feel shoved into altering something that we don't agree is broken
It *is* broken. You are just denying the reality.
That's also the reason that I'm insisting so much: I don't want to push my
personal fix (despite
Jeroen Demeyer added the comment:
> Has anyone figured out the real source of the degeneration when mixing in
> negative integers?
The underlying reason for the collisions is the following mathematical relation:
x ^ -x = -2 << i
where i is the index of the smallest set
Jeroen Demeyer added the comment:
While writing up the analysis above, it occurred to me that collisions already
happen for 2-tuples:
>>> hash((3, -2)) == hash((-3, 0))
True
These kind of 2-tuples of small integers don't look contrived at all. I can
easily see them
Jeroen Demeyer added the comment:
> stuff like "t += t >> 16" is a many-to-one function, not a permutation
Yes, I am aware of that. However, the number of collisions here is really quite
small. It's very unlikely to hit one by accident.
I also chose >> over <
Jeroen Demeyer added the comment:
> For example, FNV-1a has far better "avalanche" behavior than Bernstein
We don't care about that though. We want to have no collisions, not a random
output.
--
___
Python tracker
<ht
Jeroen Demeyer added the comment:
> In the absence of a real analysis, the intuition is simply that "t ^= t << 7"
> will clear masses of leading sign bits when hashing "small" negative integers.
That's a clever solution. If you want to go that route, I wo
Change by Jeroen Demeyer :
--
pull_requests: +8937
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
I created a new PR based on Tim's t ^= t << 7 idea, except that I'm using << 1
instead, to have more mixing in the lower bits.
With the standard FNV multiplier on 64 bits, I did get collisions while
testing. I haven't figured out
Change by Jeroen Demeyer :
--
pull_requests: +8942
___
Python tracker
<https://bugs.python.org/issue32797>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
> It would be good if PyType_Ready() will check that base class of static type
> is static.
What's the rationale for this change? It's not explained in this bug report nor
in the code.
--
Jeroen Demeyer added the comment:
> The low bits are already un-improvable in the most important cases.
Maybe you misunderstood me. Suppose that there is a hash collision, say
hash((3, 3)) == hash((-3, -3)) and you change the hashing algorithm to fix this
collision. If that change does
Jeroen Demeyer added the comment:
> When testing what, specifically? And the standard 32-bit FNV multiplier, or
> the standard 64-bit FNV multiplier?
FNV-1a with the t ^= 2 * t mangling running my new testsuite on either PR 9471
or PR 9534 using the 64-bit FNV multiplier to produce
Jeroen Demeyer added the comment:
> j is even implies (j ^ -3) == -(j ^ 3)
This follows from what I posted before: if j is even, then j ^ 3 is odd, so we
can apply the rule x ^ -2 = -x to x = j ^ 3:
(j ^ 3) ^ -2 = -(j ^ 3)
which implies
j ^ (3 ^ -2) = -(j ^ 3)
or equivalently
j ^
Jeroen Demeyer added the comment:
> And one more:
x = (x * mult) ^ t;
also appears to work equally well.
The order of operations does not really matter: you can write the loop as
x *= mult # Appears only in FNV-1
x ^= t[0]
x *= mult
x ^= t[1]
x *= mult
x ^= t[2]
x *= mult
x ^
Jeroen Demeyer added the comment:
> I want to leave low-order hash bits alone. That's deliberate.
Since I didn't understand the rationale for this and since shifting << 1 also
seems to work well, I edited PR 9471 to use DJBX33A with t ^= t << 1.
Since you insist
Jeroen Demeyer added the comment:
Regarding t ^= t << 7: I tested PR 9471 with all shift values from 1 to 20. The
new testsuite passed for all shifts from 1 to 13, except for 6. It failed for 6
and for 14 to 20. This indicates that smaller shift values are better (even
when looking
Jeroen Demeyer added the comment:
> There are _two_ hash functions at play in that collision: the tuple hash
> function, and the integer hash function. Regardless of whether the _tuple_
> hash function does [anything involving just `t`], that only directly affects
> the r
Jeroen Demeyer added the comment:
> Do you believe any other multiplier would work better toward that end?
Absolutely. Ideally, the multiplier should just be a random 64-bit number.
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
> please restore the original tuple hash test.
Sure. I wasn't sure what to do and was I afraid that having 2 tests for tuple
hashes would be too much. If that's OK for you, then surely I will rest
Jeroen Demeyer added the comment:
> The two-liner above with the xor in the second line is exactly Bernstein 33A,
> followed by a permutation of 33A's _output_ space.
Not output space, but internal state (I assume that you do that operation
inside the loop). It's replac
Jeroen Demeyer added the comment:
> Replacing DJBX33A's multiplier of 33 is also a different algorithm. So is
> working with inputs other than unsigned bytes.
I would argue that this is just extending the parameters of the algorithm. If
the algorithm is general enough, then tha
Jeroen Demeyer added the comment:
I spent about 2 days doing an extensive study of the FNV and DJB algorithms. I
share my conclusions below.
To be very clear what I mean, I am talking about the following algorithms (t is
a tuple and m is the multiplier which is always assumed to be odd
Jeroen Demeyer added the comment:
> the author wants this transformation to be easily invertible, so a prime is
> necessary
A multiplication by any odd number modulo 2**64 is invertible. As I argued
before, the concept of primes is meaningless (except for the prime 2) when
computing
Jeroen Demeyer added the comment:
This weekend I realized something important which I didn't realize before: some
hash functions which I assumed to be good (i.e. small chance of collisions
between any given two tuples) turned out to often fail the tests. This is
because you don't
Jeroen Demeyer added the comment:
SeaHash seems to be designed for 64 bits. I'm guessing that replacing the
shifts by
x ^= ((x >> 16) >> (x >> 29))
would be what you'd do for a 32-bit hash. Alternatively, we could always
compute the hash with 64 bits (using ui
Jeroen Demeyer added the comment:
Correction: the FNV variant of SeaHash only fails the new testsuite, not the
old one. The DJB variant of SeaHash fails both.
--
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
> 100% pure SeaHash does x ^= t at the start first, instead of `t ^ (t << 1)`
> on the RHS.
Indeed. Some initial testing shows that this kind of "input mangling" (applying
such a permutation on the inputs) actually plays a much more im
Jeroen Demeyer added the comment:
> I've noted before, e.g., that sticking to a prime eliminates a world of
> regular bit patterns in the multiplier.
Why do you think this? 0x1fff is prime :-)
Having regular bit patterns and being prime are independent properties.
Jeroen Demeyer added the comment:
> For that reason, I've only been looking at those that scored 10 (best
> possible) on Appleby's SMHasher[1] test suite, which is used by everyone who
> does recognized work in this field.
So it seems that this SMHasher test suite doesn&
Jeroen Demeyer added the comment:
>>> from itertools import product
>>> len(set(map(hash, product([0.5, 0.25], repeat=20
32
Good catch! Would you like me to add this to the testsuite?
--
___
Python tracker
<https://bugs.
Jeroen Demeyer added the comment:
> For that reason, I've only been looking at those that scored 10 (best
> possible) on Appleby's SMHasher[1] test suite
Do you have a list of such hash functions?
--
___
Python tracker
<http
Jeroen Demeyer added the comment:
> I know of no such hash functions short of crypto-strength ones.
Being crypto-strength and having few collisions statistically are different
properties.
For non-crypto hash functions it's typically very easy to generate collisions
once you
Jeroen Demeyer added the comment:
I'm having a look at xxHash, the second-fastest hash mentioned on
https://docs.rs/seahash/3.0.5/seahash/
--
___
Python tracker
<https://bugs.python.org/is
Jeroen Demeyer added the comment:
A (simplified and slightly modified version of) xxHash seems to work very well,
much better than SeaHash. Just like SeaHash, xxHash also works in parallel. But
I'm not doing that and just using this for the loop:
for y in t:
y ^= y * (PRIM
Jeroen Demeyer added the comment:
> I've posted several SeaHash cores that suffer no collisions at all in any of
> our tests (including across every "bad example" in these 100+ messages),
> except for "the new" tuple test. Which it also passed, most recent
Jeroen Demeyer added the comment:
> Note: I'm assuming that by "PRIME32_2" you mean 2246822519U
Yes indeed.
> and that "MULTIPLIER" means 2654435761U.
No, I mean a randomly chosen multiplier which is 3 mod 8.
--
___
Jeroen Demeyer added the comment:
> people already wrote substantial test suites dedicated to that sole purpose,
> and we should aim to be "mere consumers" of functions that pass _those_ tests.
There are hash functions that pass those tests which are still bad in practice
wh
Jeroen Demeyer added the comment:
> Taking an algorithm in wide use that's already known to get a top score on
> SMHasher and fiddling it to make a "slight" improvement in one tiny Python
> test doesn't make sense to me.
What I'm doing is the most inno
Jeroen Demeyer added the comment:
> In the 64-bit build there are no collisions across my tests except for 11 in
> the new tuple test.
That's pretty bad actually. With 64 bits, you statistically expect something in
the order of 10**-8 collisions. So what you're seein
Jeroen Demeyer added the comment:
> Taking an algorithm in wide use that's already known to get a top score on
> SMHasher and fiddling it to make a "slight" improvement in one tiny Python
> test doesn't make sense to me.
OK, I won't do that. The diff
Jeroen Demeyer added the comment:
I updated PR 9471 with a tuple hash function based on xxHash. The only change
w.r.t. the official xxHash specification is that I'm not using parallellism and
just using 1 accumulator. Please have a
Jeroen Demeyer added the comment:
I pushed a documentation-only patch on PR 9540 to better document status quo.
Can somebody please review either PR 6653 or PR 9540?
--
___
Python tracker
<https://bugs.python.org/issue32
Jeroen Demeyer added the comment:
> Changes initialization to add in the length:
What's the rationale for that change? You always asked me to stay as close as
possible to the "official" hash function which adds in the length at the end.
Is there an actual benefit fr
Jeroen Demeyer added the comment:
I pushed an update at PR 9471. I think I took into account all your comments,
except for moving the length addition from the end to the begin of the function.
--
___
Python tracker
<https://bugs.python.
Change by Jeroen Demeyer :
--
pull_requests: +9154
___
Python tracker
<https://bugs.python.org/issue25592>
___
___
Python-bugs-list mailing list
Unsubscribe:
Change by Jeroen Demeyer :
Removed file: https://bugs.python.org/file40993/data_files_doc.patch
___
Python tracker
<https://bugs.python.org/issue25592>
___
___
Python-bug
Jeroen Demeyer added the comment:
Can somebody please review PR 6448?
--
___
Python tracker
<https://bugs.python.org/issue33261>
___
___
Python-bugs-list mailin
Jeroen Demeyer added the comment:
> If you’re not sure about the reason for that sentence, I think you should not
> remove it from the docs
If the docs are wrong, their history doesn't matter that much: the docs should
be fixed regardless.
> test the conditions (package
Jeroen Demeyer added the comment:
Just for fun, let's look at the history. That piece of documentation goes back
to
commit 632bda3aa06879396561dde5ed3d93ee8fb8900c
Author: Fred Drake
Date: Fri Mar 8 22:02:06 2002 +
Add more explanation of how data_files is used (esp. wher
Jeroen Demeyer added the comment:
There is also
commit fa2f4b6d8e297eda09d8ee52dc4a3600b7d458e7
Author: Greg Ward
Date: Sat Jun 24 17:22:39 2000 +
Changed the default installation directory for data files (used by
the "install_data" command to the installation base
Jeroen Demeyer added the comment:
Well, I did try it on a minimal Python project. I also read the distutils
sources and understood why it installs data_files in sys.prefix by default. So
what more do you need to be convinced?
--
___
Python
Jeroen Demeyer added the comment:
> Did you try with a minimal project containing a C extension?
> Did you install in a system where sys.prefix != sys.exec_prefix?
Yes to both questions.
--
___
Python tracker
<https://bugs.python.org/i
Jeroen Demeyer added the comment:
> it will typically change only the last two bits of the final result
Which is great if all that you care about is avoiding collisions.
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
> Is it necessary to use METH_FASTCALL?
In Python 3, the bug only occurs with METH_FASTCALL. The issue is a reference
counting bug and the temporary tuple used for a METH_VARARGS method avoids the
Jeroen Demeyer added the comment:
Many thanks!
--
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
The problem on the machine that I mentioned was a regression from 2.7.4 to
2.7.5, probably due to #17086. Whether you consider a patch a "bugfix" or "new
feature" is quite subjective, right? If #17086 is a bugfix, then this c
Jeroen Demeyer added the comment:
This is causing breakage, see #17990 and #18000.
--
nosy: +jdemeyer
___
Python tracker
<http://bugs.python.org/issue17
New submission from Jeroen Demeyer :
The find_library_function() in Lib/distutils/unixccompiler.py does a very
simple-minded check to determine the existence of a library. It basically only
checks that a certain .so file exists. This may lead to false positives: the
mere existence of a .so
New submission from Jeroen Demeyer:
There is a serious security problem with Python's default sys.path. If I
execute
$ python /tmp/somescript.py
then Python will add /tmp as sys.path[0], such that an "import foobar" will
cause Python to read /tmp/foobar (or variations). Thi
Jeroen Demeyer added the comment:
Robert: I don't think that running scripts in /tmp is inherently unsafe. It is
Python's sys.path handling which makes it unsafe. That being said, I am not
against distutils being "fixed" but I do think the root issue should be fixed.
Jeroen Demeyer added the comment:
If you don't plan any further Python-2 releases, it would be pity that this
cannot be fixed. If you do plan a further Python-2 release, I find backwards
compatibility a poor excuse. I'm not saying that backwards compatibility
should be totally ig
Jeroen Demeyer added the comment:
It's sort of the same as #946373, except that bug report deals with other bad
consequences of sys.path[0], unrelated to security.
#5753 is specifically about the C API, not about running "pla
Jeroen Demeyer added the comment:
I should point out that there is also dangerous code in
Lib/test/test_subprocess.py in the test_cwd() function. There, the following
is executed from /tmp:
python -c 'import sys,os; sys.stdout.write(os.getcwd())'
As Python luckily knows where
Changes by Jeroen Demeyer :
Added file: http://bugs.python.org/file27923/sys_path_security.patch
___
Python tracker
<http://bugs.python.org/issue16202>
___
___
Python-bug
Changes by Jeroen Demeyer :
Removed file: http://bugs.python.org/file27536/sys_path_security.patch
___
Python tracker
<http://bugs.python.org/issue16202>
___
___
Pytho
Jeroen Demeyer added the comment:
I updated sys_path_security.patch by a newer version. This version will be
merged in the Python package of Sage (http://www.sagemath.org/).
I realise that it looks unlikely that it will be merged in CPython, but at
least it's here for refe
Jeroen Demeyer added the comment:
Sorry for the late answer, but yes: I found this out using an actual
compilation. I must admit it was in a bit of an usual situation (32-bit
userspace on a mixed 32/64-bit mutilib installation), but most other software
packages have no problems configuring
New submission from Jeroen Habraken :
The parse_qs and parse_qsl functions in the urlparse module seem to be new
since version 2.6, though this is not documented, please add "New in version
2.6.".
--
assignee: georg.brandl
components: Documentation
messages: 94860
nosy
New submission from Jeroen Habraken :
The urllib.urlencode documentation is unclear with regard to the 'doseq'
option. In my opinion it does not clearly state what its functionality is.
--
assignee: d...@python
components: Documentation
messages: 106311
nosy: VeXocide, d
Jeroen Habraken added the comment:
An elaboration as requested on IRC: It appears to make claims about 'the
sequence', but doesn't make clear that 'doseq' matters when *v* is a sequence.
It is easy to assume it refers to the query sequence, which is of
Jeroen Demeyer added the comment:
Here is more minimal breaking example. This clearly shows that this patch
breaks backwards compatibility.
```
$ cat obj.pyx
cdef class OBJ(object):
pass
$ ipython
Python 2.7.13rc1 (default, Dec 11 2016, 14:21:24)
Type "copyright", "credi
Jeroen Demeyer added the comment:
@serhiy.storchaka: yes, changing the order of the base classes fixes the issue
with __new__. Also manually assigning __new__ works, like
class C(A, B):
__new__ = B.__new__
What is broken by this patch is only the auto-detection of which __new__
(really
Jeroen Demeyer added the comment:
Wouldn't it be possible to fix assignment of __new__ without breaking backwards
compatibility (and then apply the same patch for all Python versions)? I have a
feeling that breaking the auto-detection of tp_new is a new bug introduced by
this patch and
Jeroen Demeyer added the comment:
If you are on POSIX, you could also use cysignals to get a
traceback (simply import cysignals, which will install a handler for fatal
signals like SIGSEGV).
--
___
Python tracker
<http://bugs.python.
Jeroen Demeyer added the comment:
It worries me that nothing in the Python docs nor in any PEP describes how
tp_new is inherited. In my opinion, this patch makes a significant change
which should be subject to a PEP. However, neither the old nor new behaviour is
described anywhere. This also
Jeroen Demeyer added the comment:
Let me add that this "low-level opaque object" would be rather easy to
implement on POSIX systems (I have no clue about other systems such as
Windows). I could implement it, but it would be good to have some pre-approval
from Python devs that it
Jeroen Demeyer added the comment:
Here is a proposal for an API:
* getsignal: return the Python-level signal handler (this is an existing
function)
* setsignal: set the Python-level signal handler (but not the OS-level signal
handler)
* getossignal: get the OS-level signal handler as opaque
New submission from Jeroen Demeyer:
This is a regression introduced in Python 2.7.13:
Importing the ssl module can fail with
>>> import ssl
Traceback (most recent call last):
File "", line 1, in
File "/Users/jdemeyer/sage/local/lib/python/ssl.py", line
New submission from Jeroen Demeyer:
```
>>> class zero(object):
... def __index__(self):
... return 0
...
>>> z = zero()
>>> import re
>>> p = re.compile('(a)b')
>>> m = p.match('ab')
>>> m.group(0)
Jeroen Demeyer added the comment:
I would still argue that it's a bug. The intention of PEP 357 is that __index__
should be used whenever some object needs to be converted to a Py_ssize_t,
which is exactly what you do here.
--
___
Python tr
Jeroen Demeyer added the comment:
My use case is SageMath: http://trac.sagemath.org/ticket/20750
--
___
Python tracker
<http://bugs.python.org/issue27177>
___
___
Changes by Jeroen Demeyer :
--
nosy: +jdemeyer
___
Python tracker
<http://bugs.python.org/issue1222585>
___
___
Python-bugs-list mailing list
Unsubscribe:
New submission from Jeroen Demeyer:
On Linux Ubuntu 13.04, i686:
$ uname -a
Linux arando 3.5.0-26-generic #42-Ubuntu SMP Fri Mar 8 23:20:06 UTC 2013 i686
i686 i686 GNU/Linux
$ python
Python 2.7.5 (default, May 17 2013, 18:43:24)
[GCC 4.7.3] on linux2
Type "help", "copyright
New submission from Jeroen Demeyer:
I have an Itanium Linux system where compiling Python's _ssl module fails for
some reason, with the consequence that there is no md5 support at all in the
resulting Python 2.7.5 installation.
With Python 2.7.4, setup.py didn't even try to co
Jeroen Demeyer added the comment:
Sure, building _ssl fails with:
building '_ssl' extension
gcc -pthread -fPIC -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-I. -IInclude -I./Include -I/usr/include
-I/home/buildbot/build/sage/iras-1/iras_full/build
/sage-5.10.beta4/local/
New submission from Jeroen Demeyer:
The documentation for distutils claims that sys.exec_prefix is used in certain
cases to install data_files, but this is simply not true (maybe it was true in
the past or this sentence was copy/pasted from somewhere else?)
--
assignee: docs@python
New submission from Jeroen Demeyer:
`type_getattro()` calls `tp_descr_get(self, obj, type)` without actually owning
a reference to "self". In very rare cases, this can cause a segmentation fault
if "self" is deleted by the descriptor.
Downstream: [http://trac.sagem
Jeroen Demeyer added the comment:
Just a comment: if you need really robust signal handling, you just cannot do
it with pure Python. I would recommend using Cython, where one has complete
control over when signals are checked.
--
nosy: +jdemeyer
Changes by Jeroen Demeyer :
--
type: -> crash
___
Python tracker
<http://bugs.python.org/issue25750>
___
___
Python-bugs-list mailing list
Unsubscrib
Jeroen Demeyer added the comment:
Thanks for the pointer. My patch does fix the crash in
Lib/test/crashers/borrowed_ref_2.py on Python 2.7.10.
--
___
Python tracker
<http://bugs.python.org/issue25
Jeroen Demeyer added the comment:
Follow-up: #26476
--
nosy: +jdemeyer
___
Python tracker
<http://bugs.python.org/issue4949>
___
___
Python-bugs-list mailin
New submission from Jeroen Demeyer:
PyErr_BadInternalCall() calls _PyErr_BadInternalCall(__FILE__, __LINE__). Since
__FILE__ is a string constant, the first argument of _PyErr_BadInternalCall
should be a "const char*" instead of a "char*".
This is a follow-up to #4949. Mo
Jeroen Demeyer added the comment:
> It is questionable wherever it should be backported to 2.7.
It violates the C++ standard (for extension modules written in C++), so it's
clearly a bug.
--
___
Python tracker
<http://bugs.python.org
Jeroen Demeyer added the comment:
> CPython is written on C and provides C API.
If you look at the title of https://docs.python.org/2/extending/extending.html
clearly C++ extensions are also supported.
> Even if change the signature of one function, this will not help much,
> beca
Changes by Jeroen Demeyer :
--
keywords: +patch
Added file: http://bugs.python.org/file44464/no_strict_proto.patch
___
Python tracker
<http://bugs.python.org/issue5
New submission from Jeroen Van Goey:
The sample code in the itertools.count documentation should be indented by 4
spaces.
For 2.7.4: lines 3429 till 3432 in
http://hg.python.org/releasing/2.7.4/file/026ee0057e2d/Modules/itertoolsmodule.c#l3429
For 3.4.0a1: lines 3981 till 3984 in
http
501 - 600 of 691 matches
Mail list logo