Michael Spencer wrote:
>>> def resample2(data):
... bag = {}
... random.shuffle(data)
... return [[(item, label)
... for item, label in group
... if bag.setdefault(label,[]).append(item)
... or len(bag[label]) < 3]
... for group in data if not
...which failed to calculate the minimum count of labels, try this instead (while I was at it, I removed the insance LC)
>>> def resample3(data): ... bag = {} ... sample = [] ... labels = [label for group in data for item, label in group] ... min_count = min(labels.count(label) for label in set(labels)) ... random.shuffle(data) ... for subgroup in data: ... random.shuffle(subgroup) ... subgroupsample = [] ... for item, label in subgroup: ... bag.setdefault(label,[]).append(item) ... if len(bag[label]) <= min_count: ... subgroupsample.append((item,label)) ... sample.append(subgroupsample) ... return sample ... >>>
Cheers
Michael
-- http://mail.python.org/mailman/listinfo/python-list