On 2/6/2010 1:24 PM, Chris Colbert wrote:
I'm working on a naive K-nearest-neighbors selection criteria for an
optical character recognition problem.
After I build my training set, I test each new image against against the
trained feature vectors and record the scores as follows:
match_vals = [(match_val_1, identifier_a), (match_val_2, identifier_b)
.... ] and so on..
then I sort the list so the smallest match_val's appear first
(indictating a strong match, so I may end up with something like this:
[(match_val_291, identifier_b), (match_val_23, identifier_b),
(match_val_22, identifer_k) .... ]
Now, what I would like to do is step through this list and find the
identifier which appears first a K number of times.
Naively, I could make a dict and iterate through the list AND the dict
at the same time and keep a tally, breaking when the criteria is met.
such as:
def getnn(match_vals):
tallies = defaultdict(lambda: 0)
for match_val, ident in match_vals:
tallies[ident] += 1
for ident, tally in tallies.iteritems():
if tally == 5:
return ident
I would think there is a better way to do this. Any ideas?
You only need to check that the incremented tally is 5, which is to say,
that the about-to-be-incremented tally is 4.
t = tallies[ident]
if t < 4: tallies[ident] = t+1
else: return ident
--
http://mail.python.org/mailman/listinfo/python-list