On Tue, 8 Mar 2005 14:13:01 +0000, Simon Brunning
<[EMAIL PROTECTED]> wrote:
> On 7 Mar 2005 06:38:49 -0800, gry@ll.mit.edu <gry@ll.mit.edu> wrote:
> > As far as I can tell, what you ultimately want is to be able to extract
> > a random ("representative?") subset of sentences.
> 
> If this is what's wanted, then perhaps some variation on this cookbook
> recipe might do the trick:
> 
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/59865

I couldn't resist. ;-)

import random
            
def randomLines(filename, lines=1):
    selected_lines = list(None for line_no in xrange(lines))
        
    for line_index, line in enumerate(open(filename)):
        for selected_line_index in xrange(lines):
            if random.uniform(0, line_index) < 1:
                selected_lines[selected_line_index] = line
            
    return selected_lines

This has the advantage that every line had the same chance of being
picked regardless of its length. There is the chance that it'll pick
the same line more than once, though.

-- 
Cheers,
Simon B,
[EMAIL PROTECTED],
http://www.brunningonline.net/simon/blog/
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to