On Tuesday 08 March 2005 15:28, Simon Brunning wrote:
> This has the advantage that every line had the same chance of being
> picked regardless of its length. There is the chance that it'll pick
> the same line more than once, though.

Problem being: if the file the OP is talking about really is 80GB in size, and 
you consider a sentence to have 80 bytes on average (it's likely to have less 
than that), that makes 10^9 sentences in the file. Now, multiply that with 
the memory overhead of storing a list of 10^9 None(s), and reconsider, 
whether that algorithm really works for the posted conditions. I don't think 
that any machine I have access to even has near enough memory just to store 
this list... ;)

-- 
--- Heiko.

Attachment: pgpp8eZ4iUwn7.pgp
Description: PGP signature

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to