p. a écrit : > I need to take a series of ascii files and transform the data > contained therein so that it can be inserted into an existing > database. The ascii files are just a series of lines, each line > containing fields separated by '|' character. Relations amongst the > data in the various files are denoted through an integer identifier, a > pseudo key if you will. Unfortunately, the relations in the ascii file > do not match up with those in the database in which i need to insert > the data, i.e., I need to transform the data from the files before > inserting into the database. Now, this would all be relatively simple > if not for the following fact: The ascii files are each around 800MB, > so pulling everything into memory and matching up the relations before > inserting the data into the database is impossible. > > My questions are: > 1. Has anyone done anything like this before,
More than once, yes. > and if so, do you have > any advice? 1/ use the csv module to parse your text files 2/ use a temporary database (which schema will mimic the one in the flat files), so you can work with the appropriate tools - ie: the RDBMS will take care of disk/memory management, and you'll have a specialized, hi-level language (namely, SQL) to reassemble your data the right way. > 2. In the abstract, can anyone think of a way of amassing all the > related data for a specific identifier from all the individual files > without pulling all of the files into memory and without having to > repeatedly open, search, and close the files over and over again? Answer above. -- http://mail.python.org/mailman/listinfo/python-list