On 2/6/2010 1:24 PM, Chris Colbert wrote:
I'm working on a naive K-nearest-neighbors selection criteria for an
optical character recognition problem.

After I build my training set, I test each new image against against the
trained feature vectors and record the scores as follows:

match_vals = [(match_val_1, identifier_a), (match_val_2, identifier_b)
.... ] and so on..

then I sort the list so the smallest match_val's appear first
(indictating a strong match, so I may end up with something like this:

[(match_val_291, identifier_b), (match_val_23, identifier_b),
(match_val_22, identifer_k) .... ]

Now, what I would like to do is step through this list and find the
identifier which appears first a K number of times.

Naively, I could make a dict and iterate through the list AND the dict
at the same time and keep a tally, breaking when the criteria is met.

such as:

def getnn(match_vals):
     tallies = defaultdict(lambda: 0)
     for match_val, ident in match_vals:
         tallies[ident] += 1
         for ident, tally in tallies.iteritems():
             if tally == 5:
                 return ident

I would think there is a better way to do this. Any ideas?

You only need to check that the incremented tally is 5, which is to say, that the about-to-be-incremented tally is 4.
        t = tallies[ident]
        if t < 4: tallies[ident] = t+1
        else: return ident

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to