Hello everyone, I have a challenging issue I need to overcome and was hoping I might gain some insights from this group.
I am trying to speed up the process I am using, which is as follows: 1) I have roughly 700 files that are modified throughout the day by users, within a separate application 2) As modifications are made to the files, I use a polling service and mimic the lock-file strategy used by the separate software application 3) I generate a single 'load' file and bulk insert into a load table 4) I update/insert/delete from the load table This is just too time consuming, in my opinion. At present, users of the separate application can run recalculation functions that modify all 700 files at once, causing my code to take the whole ball of wax, rather than just the data that has changed. What I would like to do is spawn separate processes and load only the delta data. The data must be 100% reliable, so I'm leary of using something like difflib. I also want to make sure that my code scales since the number of files is ever-increasing. I would be grateful for any feedback you could provide. Thank you, Chris Nethery -- http://mail.python.org/mailman/listinfo/python-list