On Thu, Mar 24, 2016 at 10:57 AM, Bruce Kirk <bruce.kir...@gmail.com> wrote: > I agree, the challenge is the volume of the data to compare is 13. Million > records. So it needs to be very fast
13M records is a good lot. To what extent can the data change? You may find it easiest to do some sort of conversion to text, throwing away any information that isn't "interesting", and then use the standard 'diff' utility to compare the text files. It's up to you to figure out what differences are "uninteresting"; it'll depend on your exact data. As long as you can do the conversion-to-text in a simple and straight-forward way, the overall operation will be reasonably fast. If this is a periodic thing (eg you're constantly checking today's file against yesterday's), saving the dumped text file will mean you generally need to just convert one file, halving your workload. This isn't a solution so much as a broad pointer... hope it's at least a start! ChrisA -- https://mail.python.org/mailman/listinfo/python-list