Steve Holden wrote: > Dave Angel wrote: >> elsa wrote: >>> Hi guys, >>> >>> I've got a problem with my program, in that the code just takes too >>> long to run. Here's what I'm doing. If anyone has any tips, they'd be >>> much appreciated! >>> >>> So, say I have a list of lists that looks something like this (I'm >>> using a list of lists, rather than a list of tuples, as I need it to >>> be mutable): >>> >>> myList = [[786,0],[45, 1],[673,1],...................[23,46]] >>> >>> there are enough entries in the outer list, that the sum of myList[i] >>> [0] across all i could be as high as 10^7. >>> >>> Now, what I need to do is randomly choose one myList[i], however the >>> distribution of my random choice needs to be proportional to the >>> values of myList[i][0]. So, for this list above, I'd have a much >>> higher chance of choosing myList[0] than myList[1]. >>> >>> Here is how I'm doing it at the moment: >>> >>> def chooseI(myList): >>> mySum=0 >>> choice = random.choice(range(1,sum([i[0] for i in myList])+1)) >>> for i in range(len(myList)): >>> mySum+=myList[i][0] >>> if mySum>=choice: >>> return i >>> break >>> >>> This works just fine if sum([i[0] for i in myList]) is less than >>> 10,000, say. However if its up around 10^7, the whole thing crashes. >>> Is there a faster way of doing this, that doesn't involve as many >>> computational steps? >>> >>> Thanks! >>> >>> elsa >>> >>> >> "the whole thing crashes" -- give us the specific crash you see, >> including the stack trace. "Crash" is far too vague. >> >> At first glance you could be running out of memory. But not with the >> values you're quoting. If a typical average value for i[0] is 50, >> that's only 200k elements. Anyway, you could print out the len(myList) >> to see how big the list is. >> >> There are quicker ways to do the sum, but the first thing to change is >> the random.choice() stuff. Once you have a sum, you can use >> random.randint() on it directly. No need to make a range list. >> >> Another approach might be to build a new list with a running sum in it. >> Then after doing the randint() on the last (highest) item, you can do a >> bisect.bisect on that list. The index that returns will be your return >> value. >> > Your approach seems to validate an informal impression I had that with > N choices one ought to be able to build a binary tree where each > decision went to left or right according to whether a number drawn from > a linear [0,1) distribution was greater that some arbitrary margin, or not. > > There are then arithmetical questions of exactly how to arrive at the > correct threshold values for each bifurcation. If the most probable > paths are deliberately placed at the top of the tree then the most > frequent cases will be fastest to generate, also. > Bad form, I know, to reply to oneself, but since I was egotistical enough to read what I wrote again when it was published I must also record the conjecture that the resulting alphabet, expressed as the binary history of the bifurcations chosen for any given symbol, must surely be a Hamming code to be most efficient.
No proof is adduced. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ -- http://mail.python.org/mailman/listinfo/python-list