On Mon, Oct 18, 2010 at 11:40 PM, Arnaud Delobelle <arno...@gmail.com> wrote: > elsa <kerensael...@hotmail.com> writes: >> Hello, >> >> I'm trying to find a way to collect a set of values from real data, >> and then sample values randomly from this data - so, the data I'm >> collecting becomes a kind of probability distribution. For instance, I >> might have age data for some children. It's very easy to collect this >> data using a list, where the index gives the value of the data, and >> the number in the list gives the number of times that values occurs: >> >> [0,0,10,20,5] >> >> could mean that there are no individuals that are no people aged 0, no >> people aged 1, 10 people aged 2, 20 people aged 3, and 5 people aged 4 >> in my data collection. >> >> I then want to make a random sample that would be representative of >> these proportions - is there any easy and fast way to select an entry >> weighted by its value? Or are there any python packages that allow you >> to easily create your own distribution based on collected data? <snip> > If you want to keep it simple, you can do: > >>>> t = [0,0,10,20,5] >>>> expanded = sum([[x]*f for x, f in enumerate(t)], []) >>>> random.sample(expanded, 10) > [3, 2, 2, 3, 2, 3, 2, 2, 3, 3] >>>> random.sample(expanded, 10) > [3, 3, 4, 3, 2, 3, 3, 3, 2, 2] >>>> random.sample(expanded, 10) > [3, 3, 3, 3, 3, 2, 3, 2, 2, 3] > > Is that what you need?
The OP explicitly ruled that out: >> Two >> other things to bear in mind are that in reality I'm collating data >> from up to around 5 million individuals, so just making one long list >> with a new entry for each individual won't work. Cheers, Chris -- The internet is wrecking people's attention spans and reading comprehension. -- http://mail.python.org/mailman/listinfo/python-list