On Sun, Nov 7, 2010 at 9:56 AM, chad <cdal...@gmail.com> wrote: > On Nov 7, 9:47 am, Chris Rebert <c...@rebertia.com> wrote: >> On Sun, Nov 7, 2010 at 9:34 AM, chad <cdal...@gmail.com> wrote: >> <snip> >> > #!/usr/local/bin/python >> >> > import sys >> >> > def construct_set(data): >> > for line in data: >> > lines = line.splitlines() >> > for curline in lines: >> > if curline.strip(): >> > key = curline.split(' ') >> > value = int(key[0]) >> > yield value >> >> > def approximate(first, second): >> > midpoint = (first + second) / 2 >> > return midpoint >> >> > def format(input): >> > prev = 0 >> > value = int(input) >> >> > with open("/home/cdalten/oakland/freq") as f: >> > for next in construct_set(f): >> > if value > prev: >> > current = prev >> > prev = next >> >> > middle = approximate(current, prev) >> > if middle < prev and value > middle: >> > return prev >> > elif value > current and current < middle: >> > return current >> <snip> >> > The question is about the construct_set() function. >> <snip> >> > I have it yield on 'value' instead of 'curline'. Will the program >> > still read the input file named freq line by line even though I don't >> > have it yielding on 'curline'? Or since I have it yield on 'value', >> > will it read the entire input file into memory at once? >> >> The former. The yield has no effect at all on how the file is read. >> The "for line in data:" iteration over the file object is what makes >> Python read from the file line-by-line. Incidentally, the use of >> splitlines() is pointless; you're already getting single lines from >> the file object by iterating over it, so splitlines() will always >> return a single-element list. > > But what happens if the input file is say 250MB? Will all 250MB be > loaded into memory at once?
No. As I said, the file will be read from 1 line at a time, on an as-needed basis; which is to say, "line-by-line". > Just curious, because I thought maybe > using something like 'yield curline' would prevent this scenario. Using "for line in data:" is what prevents that scenario. The "yield" is only relevant to how the file is read insofar as the the alternative to yield-ing would be to return a list, which would necessitate going through the entire file in continuous go and then returning a very large list; but even then, the file's content would still be read from line-by-line, not all at once as one humongous string. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list