On Tue, 8 Mar 2005 14:13:01 +0000, Simon Brunning <[EMAIL PROTECTED]> wrote: > On 7 Mar 2005 06:38:49 -0800, gry@ll.mit.edu <gry@ll.mit.edu> wrote: > > As far as I can tell, what you ultimately want is to be able to extract > > a random ("representative?") subset of sentences. > > If this is what's wanted, then perhaps some variation on this cookbook > recipe might do the trick: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/59865
I couldn't resist. ;-) import random def randomLines(filename, lines=1): selected_lines = list(None for line_no in xrange(lines)) for line_index, line in enumerate(open(filename)): for selected_line_index in xrange(lines): if random.uniform(0, line_index) < 1: selected_lines[selected_line_index] = line return selected_lines This has the advantage that every line had the same chance of being picked regardless of its length. There is the chance that it'll pick the same line more than once, though. -- Cheers, Simon B, [EMAIL PROTECTED], http://www.brunningonline.net/simon/blog/ -- http://mail.python.org/mailman/listinfo/python-list