Steven Bethard wrote:
So, I have a list of lists, where the items in each sublist are of basically the same form. It looks something like:...
Can anyone see a simpler way of doing this?
Steve
You just make these up to keep us amused, don't you? ;-)
Heh heh. I wish. It's actually about resampling data read in the Yamcha data format:
http://chasen.org/~taku/software/yamcha/
So each sublist is a "sentence" and each tuple is the feature vector for a "word". The point is to even out the number of positive and negative examples because support vector machines typically work better with balanced data sets.
[snip]If you don't need to preserve the ordering, would the following work?:
>>> def resample2(data):
... bag = {}
... random.shuffle(data)
... return [[(item, label)
... for item, label in group
... if bag.setdefault(label,[]).append(item)
... or len(bag[label]) < 3]
... for group in data if not random.shuffle(group)]
It would be preferable to preserve ordering, but it's not absolutely crucial. Thanks for the suggestion!
STeVe -- http://mail.python.org/mailman/listinfo/python-list