(newbie) N-uples from list of lists
Hello, i think it could be done by using itertools functions even if i can not see the trick. i would like to have all available "n-uples" from each list of lists. example for a list of 3 lists, but i should also be able to handle any numbers of items (any len(lol)) lol = (['a0', 'a1', 'a2'], ['b0', 'b1'], ['c0', 'c1', 'c2', 'c3']) => [('a0', 'b0', 'c0'), ('a0', 'b0', 'c1'), ('a0', 'b0', 'c2'), ('a0', 'b0', 'c3'), ('a0', 'b1', 'c0'), ('a0', 'b1', 'c1'), ('a0', 'b1', 'c2'), ('a0', 'b1', 'c3'), ('a1', 'b0', 'c0'), ('a1', 'b0', 'c1'), ('a1', 'b0', 'c2'), ('a1', 'b0', 'c3'), ('a1', 'b1', 'c0'), ('a1', 'b1', 'c1'), ('a1', 'b1', 'c2'), ('a1', 'b1', 'c3'), ('a2', 'b0', 'c0'), ('a2', 'b0', 'c1'), ('a2', 'b0', 'c2'), ('a2', 'b0', 'c3'), ('a2', 'b1', 'c0'), ('a2', 'b1', 'c1'), ('a2', 'b1', 'c2'), ('a2', 'b1', 'c3')] maybe tee(lol, len(lol)) can help ? it could be done by a recursive call, but i am interested in using and understanding generators. i also have found a convenient function, here : http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65285 (paste below) but i am curious of how you will do it or refactorize this one with generators... def permuteflat(*args): outs = [] olen = 1 tlen = len(args) for seq in args: olen = olen * len(seq) for i in range(olen): outs.append([None] * tlen) plq = olen for i in range(len(args)): seq = args[i] plq = plq / len(seq) for j in range(olen): si = (j / plq) % len(seq) outs[j][i] = seq[si] for i in range(olen): outs[i] = tuple(outs[i]) return outs many thanx -- http://mail.python.org/mailman/listinfo/python-list
Re: (newbie) N-uples from list of lists
great thanks to all. actually i have not seen it was a cross product... :) but then there are already few others ideas from the web, i paste what i have found below... BTW i was unable to choose the best one, speaking about performance which one should be prefered ? ### -- ### from title: variable X procuct - [(x,y) for x in list1 for y in list2] ### by author: steindl fritz ### 28 mai 2002 ### reply by: Jeff Epler def cross(l=None, *args): if l is None: # The product of no lists is 1 element long, # it contains an empty list yield [] return # Otherwise, the product is made up of each # element in the first list concatenated with each of the # products of the remaining items of the list for i in l: for j in cross(*args): yield [i] + j ### reply by: Raymond Hettinger def CartesianProduct(*args): ans = [()] for arg in args: ans = [ list(x)+[y] for x in ans for y in arg] return ans """ print CartesianProduct([1,2], list('abc'), 'do re mi'.split()) """ ### from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159975 ### by: Raymond Hettinger def cross(*args): ans = [[]] for arg in args: ans = [x+[y] for x in ans for y in arg] return ans ### from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159975 ### by: Steven Taschuk """ Iterator version, Steven Taschuk, 2003/05/24 """ def cross(*sets): wheels = map(iter, sets) # wheels like in an odometer digits = [it.next() for it in wheels] while True: yield digits[:] for i in range(len(digits)-1, -1, -1): try: digits[i] = wheels[i].next() break except StopIteration: wheels[i] = iter(sets[i]) digits[i] = wheels[i].next() else: break -- http://mail.python.org/mailman/listinfo/python-list
advice : how do you iterate with an acc ?
hello, i'm wondering how people from here handle this, as i often encounter something like: acc = []# accumulator ;) for line in fileinput.input(): if condition(line): if acc:#1 doSomething(acc)#1 acc = [] else: acc.append(line) if acc:#2 doSomething(acc)#2 BTW i am particularly annoyed by #1 and #2 as it is a reptition, and i think it is quite error prone, how will you do it in a pythonic way ? regards -- http://mail.python.org/mailman/listinfo/python-list
ZODB for inverted index?
Hello, While playing to write an inverted index (see: http://en.wikipedia.org/wiki/Inverted_index), i run out of memory with a classic dict, (i have thousand of documents and millions of terms, stemming or other filtering are not considered, i wanted to understand how to handle GB of text first). I found ZODB and try to use it a bit, but i think i must be misunderstanding how to use it even after reading http://www.zope.org/Wikis/ZODB/guide/node3.html... i would like to use it once to build my inverted index, save it to disk via a FileStorage, and then reuse this previously created inverted index from the previously created FileStorage, but it looks like i am unable to reread/reload it in memory, or i am missing how to do it... firstly each time i use the code below, it looks everything is added another time, is there a way to rather rewrite/replace it? and how am i suppose to use it after an initial creation? i thought that using the same FileStorage would reload my object inside dbroot, but it doesn't. i was also interested by the cache mecanisms, are they transparent? or maybe do you know a good tutorial to understand ZODB? thx for any help, regards. here is a sample code : import sys from BTrees.OOBTree import OOBTree from BTrees.OIBTree import OIBTree from persistent import Persistent class IDF2: def __init__(self): self.docs = OIBTree() self.idfs = OOBTree() def add(self, term, fromDoc): self.docs[fromDoc] = self.docs.get(fromDoc, 0) + 1 if not self.idfs.has_key(term): self.idfs[term] = OIBTree() self.idfs[term][fromDoc] = self.idfs[term].get(fromDoc, 0) + 1 def N(self, term): "total number of occurrences of 'term'" return sum(self.idfs[term].values()) def n(self, term): "number of documents containing 'term'" return len(self.idfs[term]) def ndocs(self): "number of documents" return len(self.docs) def __getitem__(self, key): return self.idfs[key] def iterdocs(self): for doc in self.docs.iterkeys(): yield doc def iterterms(self): for term in self.idfs.iterkeys(): yield term storage = FileStorage.FileStorage("%s.fs" % sys.argv[1]) db = DB(storage) conn = db.open() dbroot = conn.root() if not dbroot.has_key('idfs'): dbroot['idfs'] = IDF2() idfs = dbroot['idfs'] import transaction for i, line in enumerate(open(sys.argv[1])): # considering doc is linenumber... for word in line.split(): idfs.add(word, i) # Commit the change transaction.commit() --- i was expecting : storage = FileStorage.FileStorage("%s.fs" % sys.argv[1]) db = DB(storage) conn = db.open() dbroot = conn.root() print dbroot.has_key('idfs') => to return True -- http://mail.python.org/mailman/listinfo/python-list
Re: ZODB for inverted index?
thanks for your reply, anyway can someone help me on how to "rewrite" and "reload" a class instance when using ZODB ? regards -- http://mail.python.org/mailman/listinfo/python-list
Sorted and reversed on huge dict ?
Hello, i would like to sort(ed) and reverse(d) the result of many huge dictionaries (a single dictionary will contain ~ 15 entries). Keys are words, values are count (integer). i'm wondering if i can have a 10s of these in memory, or if i should proceed one after the other. but moreover i'm interested in saving theses as values, keys sorted and reversed (ie most frequent first), i can do it with sort from unix command but i wonder how i should do it with python to be memory friendly. can it be done by using : from itertools import izip pairs = izip(d.itervalues(), d.iterkeys()) for v, k in reversed(sorted(pairs)): print k, v or will it be the same as building the whole list ? -- http://mail.python.org/mailman/listinfo/python-list
Re: Sorted and reversed on huge dict ?
thanks for your replies :) so i just have tried, even if i think it will not go to the end => i was wrong : it is around 1.400.000 entries by dict... but maybe if keys of dicts are not duplicated in memory it can be done (as all dicts will have the same keys, with different (count) values)? memory is 4Gb of ram, is there a good way to know how much ram is used directly from python (or should i rely on 'top' and other unix command? by now around 220mb is used for around 200.000 words handled in 15 dicts) -- http://mail.python.org/mailman/listinfo/python-list
Re: Sorted and reversed on huge dict ?
so it still unfinished :) around 1GB for 1033268 words :) (comes from a top unix command) Paul > i was also thinking on doing it like that by pip-ing to 'sort | uniq -c | sort -nr' , but i'm pleased if Python can handle it. (well but maybe Python is slower? will check later...) Klaas > i do not know about intern construct, i will have look, but when googling i first found a post from Raymond Hettinger so i'm going to mess my mental space :) http://mail.python.org/pipermail/python-dev/2003-November/040433.html best regards. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sorted and reversed on huge dict ?
so it has worked :) and last 12h4:56, 15 dicts with 1133755 keys, i do not know how much ram was used as i was not always monitoring it. thanks for all replies, i'm going to study intern and others suggestions, hope also someone will bring a pythonic way to know memory usage :) best. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sorted and reversed on huge dict ?
just to be sure about intern, it is used as : >>> d, f = {}, {} >>> s = "this is a string" >>> d[intern(s)] = 1 >>> f[intern(s)] = 1 so actually the key in d and f are a pointer on an the same intern-ed string? if so it can be interesting, >>> print intern.__doc__ intern(string) -> string ``Intern'' the given string. This enters the string in the (global) table of interned strings whose purpose is to speed up dictionary lookups. Return the string itself or the previously interned string object with the same value. the comment here: "(Changed in version 2.3: Interned strings used to be immortal, but you now need to keep a reference to the interned string around.)", if it the string is used as a key, it is still reference-d, am i right? -- http://mail.python.org/mailman/listinfo/python-list