On Jan 4, 5:25 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > jo3c wrote: > > i have a 2000 files with header and data > > i need to get the date information from the header > > then insert it into my database > > i am doing it in batch so i use glob.glob('/mydata/*/*/*.txt') > > to get the date on line 4 in the txt file i use > > linecache.getline('/mydata/myfile.txt/, 4) > > > but if i use > > linecache.getline('glob.glob('/mydata/*/*/*.txt', 4) won't work > > glob.glob returns a list of filenames, so you need to call getline once > for each file in the list. > > but using linecache is absolutely the wrong tool for this; it's designed > for *repeated* access to arbitrary lines in a file, so it keeps all the > data in memory. that is, all the lines, for all 2000 files. > > if the files are small, and you want to keep the code short, it's easier > to just grab the file's content and using indexing on the resulting list: > > for filename in glob.glob('/mydata/*/*/*.txt'): > line = list(open(filename))[4-1] > ... do something with line ... > > (note that line numbers usually start with 1, but Python's list indexing > starts at 0). > > if the files might be large, use something like this instead: > > for filename in glob.glob('/mydata/*/*/*.txt'): > f = open(filename) > # skip first three lines > f.readline(); f.readline(); f.readline() > # grab the line we want > line = f.readline() > ... do something with line ... > > </F>
thank you guys, i did hit a wall using linecache, due to large file loading into memory.. i think this last solution works well for me thanks -- http://mail.python.org/mailman/listinfo/python-list