On Jun 13, 11:06 am, Zachary Dziura <zcdzi...@gmail.com> wrote: > Hi all. > > I'm writing a Python script that will be used to compare two database > tables. Currently, those two tables are dumped into .csv files, > whereby my code goes through both files and makes comparisons. Thus > far, I only have functionality coded to make comparisons on the > headers to check for similarities and differences. Here is the code > for that functionality: > > similar_headers = 0 > different_headers = 0 > source_headers = sorted(source_mapping.headers) > target_headers = sorted(target_mapping.headers) > > # Check if the headers between the two mappings are the same > if set(source_headers) == set(target_headers): > similar_headers = len(source_headers) > else: > # We're going to do two run-throughs of the tables, to find the > # different and similar header names. Start with the source > # headers... > for source_header in source_headers: > if source_header in target_headers: > similar_headers += 1 > else: > different_headers += 1 > # Now check target headers for any differences > for target_header in target_headers: > if target_header in source_headers: > pass > else: > different_headers += 1 > > As you can probably tell, I make two iterations: one for the > 'source_headers' list, and another for the 'target_headers' list. > During the first iteration, if a specific header (mapped to a variable > 'source_header') exists in both lists, then the 'similar_headers' > variable is incremented by one. Similarly, if it doesn't exist in both > lists, 'different_headers' is incremented by one. For the second > iteration, it only checks for different headers. > > My code works as expected and there are no bugs, however I get the > feeling that I'm not doing this comparison in the most efficient way > possible. Is there another way that I can make this same comparison > while making my code more Pythonic and efficient? I would prefer not > to have to install an external module from elsewhere, though if I have > to then I will. > > Thanks in advance for any and all answers!
how about: # Check if the headers between the two mappings are the same source_headers_set = set(source_headers) target_headers_set = set(target_headers) similar_headers = len(source_headers_set & target_headers_set) different_headers = len(source_headers_set ^ target_headers_set) -- http://mail.python.org/mailman/listinfo/python-list