Chad Kellerman wrote: > On Fri, May 21, 2010 at 8:07 AM, Chad Kellerman <sunck...@gmail.com> > wrote: > >> >> >> On Fri, May 21, 2010 at 7:50 AM, Peter Otten <__pete...@web.de> wrote: >> >>> Chad Kellerman wrote: >>> >>> > Python users, >>> > I am parsing an AIX trace file and creating a dictionary >>> containing >>> > keys (PIDS) and values (a list of TIDS). With PIDS being unique >>> > process ids >>> > and TIDS, being a list of thread ids. My function populates the keys >>> > so that they are unique, but my list contains duplicates. >>> > >>> > Can someone point me in the right direction so that my dictionary >>> > value >>> > does not contain duplicate elements? >>> > >>> > >>> > here is what I got. >>> > >>> > --------------<portion of code that is relevant>------------------ >>> > >>> > pidtids = {} >>> > >>> > # --- function to add pid and tid to a dictionary >>> > def addpidtids(pid,tid): >>> > pidtids.setdefault(pid,[]).append(tid) >>> >>> Use a set instead of a list (and maybe a defaultdict): >>> >>> from collections import defaultdict >>> >>> pidtids = defaultdict(set) >>> >>> def addpidtids(pid, tid): >>> pidtids[pid].add(tid) >>> >>> Peter >>> >> >> Thanks. I guess I should have posted this in my original question. >> >> I'm on 2.4.3 looks like defautldict is new in 2.5. >> >> I'll see if I can upgrade. >> >> Thanks again. >> > > > instead of upgrading.. (probably be faster to use techniques in available > 2.4.3) > > Couldn't I check to see if the pid exists (has_key I believe) and then > check if the tid is a value, in the the list for that key, prior to > passing it to the function? > > Or would that be too 'expensive'?
No. pidtids = {} def addpidtids(pid, tid): if pid in pidtids: pidtids[pid].add(tid) else: pidtids[pid] = set((tid,)) should be faster than def addpidtids(pid, tid): pidtids.setdefault(pid, set()).add(tid) and both should work in python2.4. Peter -- http://mail.python.org/mailman/listinfo/python-list