On 04/12/15 23:06, Peter Otten wrote: > duncan smith wrote: > >> Hello, >> I'm trying to find a computationally efficient way of identifying >> unique subarrays, counting them and returning an array containing only >> the unique subarrays and a corresponding 1D array of counts. The >> following code works, but is a bit slow. >> >> ############### >> >> from collections import Counter >> import numpy >> >> def bag_data(data): >> # data (a numpy array) is bagged along axis 0 >> # returns concatenated array and corresponding array of counts >> vec_shape = data.shape[1:] >> counts = Counter(tuple(arr.flatten()) for arr in data) >> data_out = numpy.zeros((len(counts),) + vec_shape) >> cnts = numpy.zeros((len(counts,))) >> for i, (tup, cnt) in enumerate(counts.iteritems()): >> data_out[i] = numpy.array(tup).reshape(vec_shape) >> cnts[i] = cnt >> return data_out, cnts >> >> ############### >> >> I've been looking through the numpy docs, but don't seem to be able to >> come up with a clean solution that avoids Python loops. > > Me neither :( > >> TIA for any >> useful pointers. Cheers. > > Here's what I have so far: > > def bag_data(data): > counts = numpy.zeros(data.shape[0]) > seen = {} > for i, arr in enumerate(data): > sarr = arr.tostring() > if sarr in seen: > counts[seen[sarr]] += 1 > else: > seen[sarr] = i > counts[i] = 1 > nz = counts != 0 > return numpy.compress(nz, data, axis=0), numpy.compress(nz, counts) >
Three times as fast as what I had, and a bit cleaner. Excellent. Cheers. Duncan -- https://mail.python.org/mailman/listinfo/python-list