Hello, I'm trying to find a computationally efficient way of identifying unique subarrays, counting them and returning an array containing only the unique subarrays and a corresponding 1D array of counts. The following code works, but is a bit slow.
############### from collections import Counter import numpy def bag_data(data): # data (a numpy array) is bagged along axis 0 # returns concatenated array and corresponding array of counts vec_shape = data.shape[1:] counts = Counter(tuple(arr.flatten()) for arr in data) data_out = numpy.zeros((len(counts),) + vec_shape) cnts = numpy.zeros((len(counts,))) for i, (tup, cnt) in enumerate(counts.iteritems()): data_out[i] = numpy.array(tup).reshape(vec_shape) cnts[i] = cnt return data_out, cnts ############### I've been looking through the numpy docs, but don't seem to be able to come up with a clean solution that avoids Python loops. TIA for any useful pointers. Cheers. Duncan -- https://mail.python.org/mailman/listinfo/python-list