I need to take a series of ascii files and transform the data contained therein so that it can be inserted into an existing database. The ascii files are just a series of lines, each line containing fields separated by '|' character. Relations amongst the data in the various files are denoted through an integer identifier, a pseudo key if you will. Unfortunately, the relations in the ascii file do not match up with those in the database in which i need to insert the data, i.e., I need to transform the data from the files before inserting into the database. Now, this would all be relatively simple if not for the following fact: The ascii files are each around 800MB, so pulling everything into memory and matching up the relations before inserting the data into the database is impossible.
My questions are: 1. Has anyone done anything like this before, and if so, do you have any advice? 2. In the abstract, can anyone think of a way of amassing all the related data for a specific identifier from all the individual files without pulling all of the files into memory and without having to repeatedly open, search, and close the files over and over again? -- http://mail.python.org/mailman/listinfo/python-list