John Machin wrote: > John Henry wrote: > > Hi list, > > > > I am sure there are many ways of doing comparision but I like to see > > what you would do if you have 2 dictionary sets (containing lots of > > data - like 20000 keys and each key contains a dozen or so of records) > > and you want to build a list of differences about these two sets. > > > > I like to end up with 3 lists: what's in A and not in B, what's in B > > and not in A, and of course, what's in both A and B. > > > > What do you think is the cleanest way to do it? (I am sure you will > > come up with ways that astonishes me :=) ) > > > > Paddy has already pointed out a necessary addition to your requirement > definition: common keys with different values. > > Here's another possible addition: you say that "each key contains a > dozen or so of records". I presume that you mean like this: > > a = {1: ['rec1a', 'rec1b'], 42: ['rec42a', 'rec42b']} # "dozen" -> 2 to > save typing :-) > > Now that happens if the other dictionary contains: > > b = {1: ['rec1a', 'rec1b'], 42: ['rec42b', 'rec42a']} > > Key 42 would be marked as different by Paddy's classification, but the > values are the same, just not in the same order. How do you want to > treat that? avalue == bvalue? sorted(avalue) == sorted(bvalue)? Oh, and > are you sure the buckets don't contain duplicates? Maybe you need > set(avalue) == set(bvalue). What about 'rec1a' vs 'Rec1a' vs 'REC1A'? > > All comparisons are equal, but some comparisons are more equal than > others :-) > > Cheers, > John
Hi Johns, The following is my attempt to give more/deeper comparison info. Assume you have your data parsed and presented as two dicts a and b each having as values a dict representing a record. Further assume you have a function that can compute if two record level dicts are the same and another function that can compute if two values in a record level dict are the same. With a slight modification of my earlier prog we get: def komparator(a,b, check_equal): keya=set(a.keys()) keyb=set(b.keys()) a_xclusive = keya - keyb b_xclusive = keyb - keya _common = keya & keyb common_eq = set(k for k in _common if check_equal(a[k],b[k])) common_neq = _common - common_eq return (a_xclusive, b_xclusive, common_eq, common_neq) a_xclusive, b_xclusive, common_eq, common_neq = komparator(a,b, record_dict__equality_checker) common_neq = [ (key, komparator(a[key],b[key], value__equality_checker) ) for key in common_neq ] Now we get extra info on intra record differences with little extra code. Look out though, you could get swamped with data :-) - Paddy. -- http://mail.python.org/mailman/listinfo/python-list