On Tuesday 08 March 2005 15:28, Simon Brunning wrote: > This has the advantage that every line had the same chance of being > picked regardless of its length. There is the chance that it'll pick > the same line more than once, though.
Problem being: if the file the OP is talking about really is 80GB in size, and you consider a sentence to have 80 bytes on average (it's likely to have less than that), that makes 10^9 sentences in the file. Now, multiply that with the memory overhead of storing a list of 10^9 None(s), and reconsider, whether that algorithm really works for the posted conditions. I don't think that any machine I have access to even has near enough memory just to store this list... ;) -- --- Heiko.
pgpp8eZ4iUwn7.pgp
Description: PGP signature
-- http://mail.python.org/mailman/listinfo/python-list