How would you go about taking a weighted random sample of a dataset?

I.E. I have a table with value and frequency columns, and will be
taking random samples of 20 unique rows.  A row with a frequency of 10
should appear in 10 times as many sample sets as a row with a
frequency of 1.

Denormalization by creating multiple rows based on frequency is not an
option, since frequencies can be as high as several million, and as
low as 100.

Also, any recommendations for improving performance of random
selections in MySQL?  This table has a few hundred thousand rows
currently, and although IDs are sequential, there have been enough
deletions to leave holes that make it impractical to generate
selections in the script.

Thanks for your help.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to