Brian Quinlan wrote: > The fastest algorithm that I have been able to devise for doing so is: > O(n * log(len(lst))). Can anyone think or a solution with a better time > complexity? If not, is there an obviously better way to do this > (assuming n is big and the list size is small).
numpy.searchsorted() can do all of the n lookups in C. # Untested. import numpy as np def draw(n, lst): ps = np.cumsum([0.0] + [x[1] for x in lst]) # watch for floating point errors here. It's likely that the last # item won't be quite 1.0 r = np.random.random(n) # n psuedorandom numbers uniform on [0, 1) indices = np.searchsorted(ps, r) # now do what you like with indices, which is an array of indices into # lst. Ed Schofield has an implementation of an algorithm by Marsaglia[1] which turns sampling into a fast table lookup. If your probabilities have limited precision (2**-30 or so rather than the full double precision 2**-52 or so), then this might be an attractive option. [1] http://www.jstatsoft.org/v11/i03/v11i03.pdf The implementation is in the scipy sandbox currently, but I don't think it builds painlessly at the moment. http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/montecarlo/ Ask on one of the scipy mailing lists if you need help building it. http://www.scipy.org/Mailing_Lists -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list