In article <10be5c62-4c58-4b4f-b00a-82d85ee4e...@googlegroups.com>, Bryan Britten <britten.br...@gmail.com> wrote:
> If I use the following code: > > <code> > import urllib > > urlStr = "https://stream.twitter.com/1/statuses/sample.json" > > fileHandle = urllib.urlopen(urlStr) > > twtrText = fileHandle.readlines() > </code> > > It takes hours (upwards of 6 or 7, if not more) to finish computing the last > command. I'm not surprised! readlines() reads in the ENTIRE file in one gulp. That a lot of tweets! > With that being said, my question is whether there is a more efficient manner > to do this. In general, when reading a large file, you want to iterate over lines of the file and process each one. Something like: for line in urllib.urlopen(urlStr): twtrDict = json.loads(line) You still need to download and process all the data, but at least you don't need to store it in memory all at once. There is an assumption here that there's exactly one json object per line. If that's not the case, things might get a little more complicated. -- http://mail.python.org/mailman/listinfo/python-list