Michael Spencer wrote:

>>> def resample2(data):
... bag = {}
... random.shuffle(data)
... return [[(item, label)
... for item, label in group
... if bag.setdefault(label,[]).append(item)
... or len(bag[label]) < 3]
... for group in data if not

...which failed to calculate the minimum count of labels, try this instead (while I was at it, I removed the insance LC)


 >>> def resample3(data):
 ...     bag = {}
 ...     sample = []
 ...     labels  = [label for group in data for item, label in group]
 ...     min_count = min(labels.count(label) for label in set(labels))
 ...     random.shuffle(data)
 ...     for subgroup in data:
 ...         random.shuffle(subgroup)
 ...         subgroupsample = []
 ...         for item, label in subgroup:
 ...             bag.setdefault(label,[]).append(item)
 ...             if len(bag[label]) <= min_count:
 ...                 subgroupsample.append((item,label))
 ...         sample.append(subgroupsample)
 ...     return sample
 ...
 >>>

Cheers

Michael

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to