[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-04 Thread Brandt Bucher
Change by Brandt Bucher : -- stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___ Python-bugs-list mail

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher
Change by Brandt Bucher : -- pull_requests: +26586 stage: resolved -> patch review pull_request: https://github.com/python/cpython/pull/28147 ___ Python tracker ___ ___

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher
Brandt Bucher added the comment: Found it. This particular build is configured with HAVE_ALIGNED_REQUIRED=1, which forces it to use fnv instead siphash24 as its string hashing algorithm. -- ___ Python tracker _

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher
Brandt Bucher added the comment: I'm compiling Clang now to try to reproduce using a UBSan build (I'm on Ubuntu, though). I'm not entirely familiar with how these sanitizer builds work... could the implication be that we're hitting undefined behavior at some point? Or is it just a red herri

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread Brandt Bucher
Brandt Bucher added the comment: Thanks for finding this, Victor. That failure is surprising to me. Is it really possible for the order of the elements in a set to vary based on platform or build configuration (even with a fixed PYTHONHASHSEED at runtime)? Really, this looks like it’s only a

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread STINNER Victor
STINNER Victor added the comment: The test failed at: def test_deterministic_sets(self): # bpo-37596: To support reproducible builds, sets and frozensets need to # have their elements serialized in a consistent order (even when they # have been scrambled by hash ran

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-09-03 Thread STINNER Victor
STINNER Victor added the comment: I reopen the issue. test_marshal failed on AMD64 Arch Linux Usan 3.x: https://buildbot.python.org/all/#/builders/719/builds/108 == FAIL: test_deterministic_sets (test.test_marshal.BugsTestCas

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-31 Thread Brandt Bucher
Brandt Bucher added the comment: New changeset 51999c960e7fc45feebd629421dec6524a5fc803 by Brandt Bucher in branch 'main': bpo-37596: Clean up the set/frozenset marshalling code (GH-28068) https://github.com/python/cpython/commit/51999c960e7fc45feebd629421dec6524a5fc803 --

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-30 Thread Guido van Rossum
Guido van Rossum added the comment: Thanks! This comes right in time, because we're working on freezing many more modules, and modules containing frozen sets didn't have a consistent frozen representation. Now they do! (See issue45019, issue45020) -- nosy: +gvanrossum _

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-30 Thread Brandt Bucher
Change by Brandt Bucher : -- pull_requests: +26512 pull_request: https://github.com/python/cpython/pull/28068 ___ Python tracker ___ ___

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: Looking again, I think code is correct as-is (am not sure about the depth adjustment though). Stylistically, it is different from the other blocks w_complex_object() that always have a "return" after setting p->error. The new code jumps to "anyset_done"

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Brandt Bucher
Brandt Bucher added the comment: Hm, not quite sure what you mean. Are you talking about just replacing each of the new gotos with “Py_DECREF(pairs); return;”? Error handling for this whole module is a bit unconventional. Some of the error paths in this function decrement the recursion depth

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: Should the error paths decref the key and return NULL as they do elsewhere in the function? -- status: closed -> open ___ Python tracker

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Łukasz Langa
Łukasz Langa added the comment: This is a bona fide enhancement and thus out of scope for backports. Since this is merged for 3.11, I'm closing the issue. Thanks, everyone, this was some non-trivial design and implementation effort! -- resolution: -> fixed stage: patch review -> reso

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Łukasz Langa
Łukasz Langa added the comment: New changeset 33d95c6facdfda3c8c0feffa7a99184e4abc2f63 by Brandt Bucher in branch 'main': bpo-37596: Make `set` and `frozenset` marshalling deterministic (GH-27926) https://github.com/python/cpython/commit/33d95c6facdfda3c8c0feffa7a99184e4abc2f63 -- no

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-25 Thread Łukasz Langa
Change by Łukasz Langa : -- versions: +Python 3.11 -Python 3.9 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher
Change by Brandt Bucher : -- pull_requests: +26377 pull_request: https://github.com/python/cpython/pull/27926 ___ Python tracker ___ ___

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger
Change by Raymond Hettinger : -- Removed message: https://bugs.python.org/msg400182 ___ Python tracker ___ ___ Python-bugs-list mail

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: Here's pure python code for experimentation: from marshal import dumps, loads def marshal_set(s): return dumps(sorted(s, key=dumps)) def unmarshal_set(m): return frozenset(loads(m)) def test(s): assert unmarshal_se

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: Here's pure python code for expirmentation: from marshal import dumps, loads def marshal_set(s): return dumps(sorted((dumps(value), value) for value in s)) def unmarshal_set(m): return {value for dump, value in loads(m)} d

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: > I can clean it up and convert it to a PR if we decide > we want to go this route. +1 This is by far the smallest intervention that has been discussed. -- ___ Python tracker

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher
Brandt Bucher added the comment: This rough proof-of-concept seems to have the desired effect: diff --git a/Python/marshal.c b/Python/marshal.c index 1260704c74..70f9c4b109 100644 --- a/Python/marshal.c +++ b/Python/marshal.c @@ -503,9 +503,23 @@ w_complex_object(PyObject *v, char flag, WFILE

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher
Brandt Bucher added the comment: Ah, yeah. Could we add a flag to disable the reference mechanism, just for frozenset elements? It would make marshalled frozensets a bit bigger (unless we re-marshalled each one after sorting)... but I still prefer that to adding more logic/subclasses to fro

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: No, it cannot be fixed in marshal itself. s = {("string", 1), ("string", 2), ("string", 3)} All tuples contain references to the same string. The first serialized tuple will contain serialization of the string, all other will contain references to it. So

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher
Brandt Bucher added the comment: Could this issue be fixed in marshal itself? Off the top of my head, one possible option could be to use the marshalled bytes of each elements as a sort key, rather than the elements themselves. So serialize, *then* sort? -- _

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-23 Thread Brandt Bucher
Change by Brandt Bucher : -- nosy: +brandtbucher ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-08-14 Thread Filipe Laíns
Change by Filipe Laíns : -- keywords: +patch pull_requests: +26244 stage: -> patch review pull_request: https://github.com/python/cpython/pull/27769 ___ Python tracker ___

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-07-30 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: The only way I can see here is to go with a similar strategy as Serhiy proposes, which seems that it has a non trivial complication (and a new type, which I am not very fond of) but is a bit cleaner than changing the semantics of the type, which affec

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-07-30 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: > I understand, the proposal would be to make frozensets keep the creation > order. That would increase the memory consumption of all frozen set instances, which is likely not going to fly -- nosy: +pablogsal

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-07-30 Thread Felix C. Stegerman
Change by Felix C. Stegerman : -- nosy: +obfusk ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-27 Thread Filipe Laíns
Filipe Laíns added the comment: I understand, the proposal would be to make frozensets keep the creation order. -- ___ Python tracker ___ __

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-26 Thread STINNER Victor
Change by STINNER Victor : -- nosy: -vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.p

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-26 Thread Inada Naoki
Inada Naoki added the comment: > If that's the case, then the argument Raymond provided against preserving > order does not seem that relevant, as we would only need to preserve the > order in the creation operation. Note that PYC files are marshalled from code objects including frozenset i

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-26 Thread Filipe Laíns
Filipe Laíns added the comment: Ah, my bad! Though, thinking about it, it does make sense. If that's the case, then the argument Raymond provided against preserving order does not seem that relevant, as we would only need to preserve the order in the creation operation. What do you think? Is

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-26 Thread Inada Naoki
Inada Naoki added the comment: > What about normal sets? pyc files don't contain a regular set. So it is out of scope of this issue. -- nosy: +methane ___ Python tracker ___

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Raymond Hettinger
Change by Raymond Hettinger : -- Removed message: https://bugs.python.org/msg394414 ___ Python tracker ___ ___ Python-bugs-list mail

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: Is it possible to defer hash randomization until after pycs are generated? The underlying problem here is an intentional scrambling of data. If determinism is what is desired then deferring that action addresses the action cause of non-determinism rathe

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Filipe Laíns
Filipe Laíns added the comment: What about normal sets? They also suffer from the same issue. -- ___ Python tracker ___ ___ Python-b

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Possible solution: add an ordered subtype of frozenset which would keep an array of items in the original order. The compiler only creates frozenset when optimizes "x in {1, 2}" or "for x in {1, 2}". It should now create an ordered frozenset from a list of

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread Filipe Laíns
Filipe Laíns added the comment: I would not expect SOURCE_DATE_EPOCH to sacrifice performance. During packaging, SOURCE_DATE_EPOCH is always set, and sometimes we need to perform expensive operations. We only need this behavior during cache generation, making the solution not optimal. Backtr

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-25 Thread STINNER Victor
STINNER Victor added the comment: > Another idea, would it be possible to add a flag to turn on reproducibility, > sacrificing performance? The flag is the SOURCE_DATE_EPOCH env var, no? -- ___ Python tracker

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-05-11 Thread Filipe Laíns
Filipe Laíns added the comment: Another idea, would it be possible to add a flag to turn on reproducibility, sacrificing performance? This flag could be set when generating bytecode, where the performance hit shouldn't be that relevant. -- ___ Pyth

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Chih-Hsuan Yen
Change by Chih-Hsuan Yen : -- nosy: -yan12125 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.p

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Filipe Laíns
Filipe Laíns added the comment: s/is can/can/ -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Raymond Hettinger
Raymond Hettinger added the comment: s/hundred/hundred thousand/ -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubs

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Filipe Laíns
Filipe Laíns added the comment: > No, it would not. We would also have to maintain order across set operations > such as intersection which which would become dramatically more expensive if > they had to maintain order. For example intersecting a million element set > with a ten element set

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-15 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Would it be reasonable to make it so that sets are > always created with the definition order? No, it would not. We would also have to maintain order across set operations such as intersection which which would become dramatically more expensive if th

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2021-04-14 Thread Filipe Laíns
Filipe Laíns added the comment: Normal sets have the same issue, see bpo-43850. Would it be reasonable to make it so that sets are always created with the definition order? Looking at the set implementation, this seems perfectly possible. -- nosy: +FFY00

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2020-04-10 Thread Chih-Hsuan Yen
Chih-Hsuan Yen added the comment: issue34722 also talks about frozenset, nondeterministic order and sorting. Maybe this ticket and that one are for the same issue? -- nosy: +yan12125 ___ Python tracker

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2020-04-08 Thread Jeffery To
Change by Jeffery To : -- nosy: +jefferyto ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.pytho

[issue37596] Reproducible pyc: frozenset is not serialized in a deterministic order

2019-07-15 Thread STINNER Victor
New submission from STINNER Victor : See bpo-29708 meta issue and https://reproducible-builds.org/ for reproducible builds. pyc files are not fully reproducible yet: frozenset items are not serialized in a deterministic order One solution would be to modify marshal to sort frozenset items be