elsa <kerensael...@hotmail.com> writes: > Hello, > > I'm trying to find a way to collect a set of values from real data, > and then sample values randomly from this data - so, the data I'm > collecting becomes a kind of probability distribution. For instance, I > might have age data for some children. It's very easy to collect this > data using a list, where the index gives the value of the data, and > the number in the list gives the number of times that values occurs: > > [0,0,10,20,5] > > could mean that there are no individuals that are no people aged 0, no > people aged 1, 10 people aged 2, 20 people aged 3, and 5 people aged 4 > in my data collection. > > I then want to make a random sample that would be representative of > these proportions - is there any easy and fast way to select an entry > weighted by its value? Or are there any python packages that allow you > to easily create your own distribution based on collected data? Two > other things to bear in mind are that in reality I'm collating data > from up to around 5 million individuals, so just making one long list > with a new entry for each individual won't work. Also, it would be > good if I didn't have to decide before hand what the possible range of > values is (which unfortunately I have to do with the approach I'm > currently working on). > > Thanks in advance for your help, > > elsa.
If you want to keep it simple, you can do: >>> t = [0,0,10,20,5] >>> expanded = sum([[x]*f for x, f in enumerate(t)], []) >>> random.sample(expanded, 10) [3, 2, 2, 3, 2, 3, 2, 2, 3, 3] >>> random.sample(expanded, 10) [3, 3, 4, 3, 2, 3, 3, 3, 2, 2] >>> random.sample(expanded, 10) [3, 3, 3, 3, 3, 2, 3, 2, 2, 3] Is that what you need? -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list