I only have a passing acquaintance with Python and I need to modify some existing code. This code is going to get called with 10GB of data so it needs to be fairly fast.
http://cvs2svn.tigris.org/ is code for converting a CVS repository to Subversion. I'm working on changing it to convert from CVS to git. The existing Python RCS parser provides me with the CVS deltas as strings.I need to get these deltas into an array of lines so that I can apply the diff commands that add/delete lines (like 10 d20, etc). What is the most most efficient way to do this? The data structure needs to be able to apply the diffs efficently too. The strings have embedded @'s doubled as an escape sequence, is there an efficient way to convert these back to single @'s? After each diff is applied I need to convert the array of lines back into a string, generate a sha-1 over it and then compress it with zlib and finally write it to disk. The 10GB of data is Mozilla CVS when fully expanded. Thanks for any tips on how to do this. Jon Smirl [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list