Steven Bethard wrote:
Michael Spencer wrote:
Steven Bethard wrote:
So, I have a list of lists, where the items in each sublist are of
basically the same form. It looks something like:
...
Can anyone see a simpler way of doing this?
Steve
You just make these up to keep us amused, don't you? ;-)
Heh heh. I wish. It's actually about resampling data read in the
Yamcha data format:
http://chasen.org/~taku/software/yamcha/
So each sublist is a "sentence" and each tuple is the feature vector for
a "word". The point is to even out the number of positive and negative
examples because support vector machines typically work better with
balanced data sets.
If you don't need to preserve the ordering, would the following work?:
[snip]
>>> def resample2(data):
... bag = {}
... random.shuffle(data)
... return [[(item, label)
... for item, label in group
... if bag.setdefault(label,[]).append(item)
... or len(bag[label]) < 3]
... for group in data if not
random.shuffle(group)]
It would be preferable to preserve ordering, but it's not absolutely
crucial. Thanks for the suggestion!
STeVe
Maybe combine this with a DSU pattern? Not sure whether the result would be
better than what you started with
Michael
--
http://mail.python.org/mailman/listinfo/python-list