One more test result to add, if I use your first method to unique: seen = set() uniqued = [] for x in original: if not x in seen: seen.add(x) uniqued.append(x)
The results pops up in a few seconds. It makes a dramatic difference. Thanks. See the following fasted codes: >>> import nltk >>> from nltk.corpus import wordnet as wn >>> def average_polysemy(pos): synset_list = list(wn.all_synsets(pos)) sense_number = 0 lemma_list = [] for synset in synset_list: lemma_list.extend(synset.lemma_names) unique_lemma_list = [] seen = set() for w in lemma_list: if not w in seen: seen.add(w) unique_lemma_list.append(w) for lemma in unique_lemma_list: sense_number_new = len(wn.synsets(lemma, pos)) sense_number = sense_number + sense_number_new return sense_number/len(unique_lemma_list) >>> average_polysemy('n') 1 On Sun, Sep 9, 2012 at 3:18 PM, John H. Li <typeto...@gmail.com> wrote: > Thanks again. What you explain is reasonable. I try to the second method > to unique the list. It does turn out that python just works and works > without result. Maybe because it do iterate a long list in my example and > slow. > > >>> def average_polysemy(pos): > synset_list = list(wn.all_synsets(pos)) > sense_number = 0 > lemma_list = [] > for synset in synset_list: > lemma_list.extend(synset.lemma_names) > unique_lemma_list = [] > for w in lemma_list: > if not w in unique_lemma_list: > unique_lemma_list.append(w) > return unique_lemma_list > for lemma in unique_lemma_list: > sense_number_new = len(wn.synsets(lemma, pos)) > sense_number = sense_number + sense_number_new > return sense_number/len(unique_lemma_list) > > >>> average_polysemy('n') > > On Sun, Sep 9, 2012 at 2:36 PM, Donald Stufft <donald.stu...@gmail.com>wrote: > >> For a short list the difference is going to be negligible. >> >> For a long list the difference is that checking if an item in a list >> requires iterating over the list internally to find it but checking if an >> item is inside of a set uses a faster method that doesn't require iterating >> over the list. This doesn't matter if you have 20 or 30 items, but imagine >> if instead you have 50 million items. Your going to be iterating over the >> list a lot and that can introduce significant slow dow. >> >> On the other hand using a set is faster in that case, but because you are >> storing an additional copy of the data you are using more memory to store >> extra copies of everything. >> >> On Sunday, September 9, 2012 at 2:31 AM, John H. Li wrote: >> >> Thanks first, I could understand the second approach easily. The first >> approach is a bit puzzling. Why are seen=set() and seen.add(x) still >> necessary there if we can use unique.append(x) alone? Thanks for your >> enlightenment. >> >> On Sun, Sep 9, 2012 at 1:59 PM, Donald Stufft <donald.stu...@gmail.com>wrote: >> >> seen = set() >> uniqued = [] >> for x in original: >> if not x in seen: >> seen.add(x) >> uniqued.append(x) >> >> or >> >> uniqued = [] >> for x in oriignal: >> if not x in uniqued: >> uniqued.append(x) >> >> The difference between is option #1 is more efficient speed wise, but >> uses more memory (extraneous set hanging around), whereas the second is >> slower (``in`` is slower in lists than in sets) but uses less memory. >> >> On Sunday, September 9, 2012 at 1:56 AM, John H. Li wrote: >> >> Many thanks. If I want keep the order, how can I deal with it? >> or we can list(set([1, 1, 2, 3, 4])) = [1,2,3,4] >> >> >> On Sun, Sep 9, 2012 at 1:47 PM, Donald Stufft <donald.stu...@gmail.com>wrote: >> >> If you don't need to retain order you can just use a set, >> >> set([1, 1, 2, 3, 4]) = set([1, 2, 3, 4]) >> >> But set's don't retain order. >> >> On Sunday, September 9, 2012 at 1:43 AM, Token Type wrote: >> >> Is there a unique method in python to unique a list? thanks >> -- >> http://mail.python.org/mailman/listinfo/python-list >> >> >> >> >> >> >> >
-- http://mail.python.org/mailman/listinfo/python-list