Scott Eilerman <scott.j.eiler...@gmail.com> added the comment:

Raymond, Tim, thanks for your replies so far. I understand (and for the most 
part, agree with) your points about not being able to list every behavior, and 
not wanting to cause uncertainty in users. However, let me argue my case one 
more time, and if you still disagree, feel free to close this.

1. It is expected (in fact, one might argue it's the entire point) that 
initializing random.seed() with a fixed value will produce a repeatable set of 
results for a traditional random number generator. An user expects that calling 
the following should always produce the same sequence of numbers:

random.seed(22)
random.random()
random.random()
random.random()

2. Based on that behavior for one of the most typical/traditional functions in 
the random module, a naive user (me) might assume that random.sample() is 
drawing from its population in a similar manner (i.e. that sequence of returned 
items, regardless of how many you ask the function to return, is uniquely 
determined by the seed). While this is certainly an assumption...

2a. This assumption is somewhat validated by the introductory section of the 
random module docs, which states "Almost all module functions depend on the 
basic function random()..."

2b. More importantly, an user can "validate" this assumption by doing some 
simple tests, e.g.:

choices = range(100)
random.seed(22)
random.sample(choices,1)
random.seed(22)
random.sample(choices,2)
random.seed(22)
random.sample(choices,3)
... and so on

Because of the nature of the set/list optimization, it is VERY possible that an 
user could do due diligence in testing like this (a few different seeds, a few 
different sets of "choices", testing up to k=10) and never uncover the 
problematic behavior. You'd pretty much have to set up some loops like I did 
earlier in this thread, which I don't think many users would do unless the 
expect to find a problem. Even then, with certain selections of "choices", you 
might still get the "expected" results.

2c. If you suspected a problem, or really wanted to be sure the function does 
what you assume it will do, obviously you can open up random.py and take a 
look. However, I doubt many users do this for every built-in module and 
function they use; clearly the point of documentation is to avoid this scenario.

3. As Raymond mentioned, this does not appear to be a "common" problem, and 
perhaps that is enough to not add anything to the docs. However, due to the 
somewhat elusive nature of the behavior, it could certainly go undetected in 
many cases, potentially causing problems without anyone noticing. Perhaps I 
chose a very unorthodox implementation to get the results I desired; I easily 
could have used random.shuffle() or random.sample(pop, len(pop)) and picked the 
nth element. However, one could imagine cases in which you have a very large 
population and you want to optimize by using sample() to get the nth random 
draw rather than randomizing the entire list, so I don't think it's an entirely 
unjustified approach.

4. Given the above points, I'd argue that a one-line insertion into the docs 
would help users steer clear of a hard-to-anticipate, potentially costly 
pitfall. My suggested language is a more direct identification of the possible 
consequences, though I agree that it it perhaps too worry-inducing without 
specifying the "cause" of the problem. Raymond's algorithmic note may be a 
better choice and would have been enough of an indicator for me to avoid the 
mistake I made.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33114>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to