How would you go about taking a weighted random sample of a dataset? I.E. I have a table with value and frequency columns, and will be taking random samples of 20 unique rows. A row with a frequency of 10 should appear in 10 times as many sample sets as a row with a frequency of 1.
Denormalization by creating multiple rows based on frequency is not an option, since frequencies can be as high as several million, and as low as 100. Also, any recommendations for improving performance of random selections in MySQL? This table has a few hundred thousand rows currently, and although IDs are sequential, there have been enough deletions to leave holes that make it impractical to generate selections in the script. Thanks for your help. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---