(newbie) N-uples from list of lists

2005-11-23 Thread vd12005
Hello,

i think it could be done by using itertools functions even if i can not
see the trick. i would like to have all available "n-uples" from each
list of lists.
example for a list of 3 lists, but i should also be able to handle any
numbers of items (any len(lol))

lol = (['a0', 'a1', 'a2'], ['b0', 'b1'], ['c0', 'c1', 'c2', 'c3'])

=>


[('a0', 'b0', 'c0'), ('a0', 'b0', 'c1'), ('a0', 'b0', 'c2'), ('a0',
'b0', 'c3'), ('a0', 'b1', 'c0'), ('a0', 'b1', 'c1'), ('a0', 'b1',
'c2'), ('a0', 'b1', 'c3'), ('a1', 'b0', 'c0'), ('a1', 'b0', 'c1'),
('a1', 'b0', 'c2'), ('a1', 'b0', 'c3'), ('a1', 'b1', 'c0'), ('a1',
'b1', 'c1'), ('a1', 'b1', 'c2'), ('a1', 'b1', 'c3'), ('a2', 'b0',
'c0'), ('a2', 'b0', 'c1'), ('a2', 'b0', 'c2'), ('a2', 'b0', 'c3'),
('a2', 'b1', 'c0'), ('a2', 'b1', 'c1'), ('a2', 'b1', 'c2'), ('a2',
'b1', 'c3')]

maybe tee(lol, len(lol)) can help ?

it could be done by a recursive call, but i am interested in using and
understanding generators.

i also have found a convenient function, here :
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65285 (paste
below)
but i am curious of how you will do it or refactorize this one with
generators...

def permuteflat(*args):
outs = []
olen = 1
tlen = len(args)
for seq in args:
olen = olen * len(seq)
for i in range(olen):
outs.append([None] * tlen)
plq = olen
for i in range(len(args)):
seq = args[i]
plq = plq / len(seq)
for j in range(olen):
si = (j / plq) % len(seq)
outs[j][i] = seq[si]
for i in range(olen):
outs[i] = tuple(outs[i])
return outs

many thanx

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: (newbie) N-uples from list of lists

2005-11-29 Thread vd12005
great thanks to all.

actually i have not seen it was a cross product... :) but then there
are already few others ideas from the web, i paste what i have found
below...

BTW i was unable to choose the best one, speaking about performance
which one should be prefered ?

### --

### from title: variable X procuct - [(x,y) for x in list1 for y in
list2]
### by author:  steindl fritz
### 28 mai 2002
### reply by:   Jeff Epler

def cross(l=None, *args):
if l is None:
# The product of no lists is 1 element long,
# it contains an empty list
yield []
return
# Otherwise, the product is made up of each
# element in the first list concatenated with each of the
# products of the remaining items of the list
for i in l:
for j in cross(*args):
yield [i] + j

### reply by:   Raymond Hettinger

def CartesianProduct(*args):
ans = [()]
for arg in args:
ans = [ list(x)+[y] for x in ans for y in arg]
return ans

"""
print CartesianProduct([1,2], list('abc'), 'do re mi'.split())
"""

### from:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159975
### by: Raymond Hettinger

def cross(*args):
ans = [[]]
for arg in args:
ans = [x+[y] for x in ans for y in arg]
return ans

### from:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159975
### by: Steven Taschuk
"""
Iterator version, Steven Taschuk, 2003/05/24
"""
def cross(*sets):
wheels = map(iter, sets) # wheels like in an odometer
digits = [it.next() for it in wheels]
while True:
yield digits[:]
for i in range(len(digits)-1, -1, -1):
try:
digits[i] = wheels[i].next()
break
except StopIteration:
wheels[i] = iter(sets[i])
digits[i] = wheels[i].next()
else:
break

-- 
http://mail.python.org/mailman/listinfo/python-list


advice : how do you iterate with an acc ?

2005-12-02 Thread vd12005
hello,

i'm wondering how people from here handle this, as i often encounter
something like:

acc = []# accumulator ;)
for line in fileinput.input():
if condition(line):
if acc:#1
doSomething(acc)#1
acc = []
else:
acc.append(line)
if acc:#2
doSomething(acc)#2

BTW i am particularly annoyed by #1 and #2 as it is a reptition, and i
think it is quite error prone, how will you do it in a pythonic way ?

regards

-- 
http://mail.python.org/mailman/listinfo/python-list


ZODB for inverted index?

2006-10-23 Thread vd12005
Hello,

While playing to write an inverted index (see:
http://en.wikipedia.org/wiki/Inverted_index), i run out of memory with
a classic dict, (i have thousand of documents and millions of terms,
stemming or other filtering are not considered, i wanted to understand
how to handle GB of text first). I found ZODB and try to use it a bit,
but i think i must be misunderstanding how to use it even after reading
http://www.zope.org/Wikis/ZODB/guide/node3.html...

i would like to use it once to build my inverted index, save it to disk
via a FileStorage,

and then reuse this previously created inverted index from the
previously created FileStorage, but it looks like i am unable to
reread/reload it in memory, or i am missing how to do it...

firstly each time i use the code below, it looks everything is added
another time, is there a way to rather rewrite/replace it? and how am i
suppose to use it after an initial creation? i thought that using the
same FileStorage would reload my object inside dbroot, but it doesn't.
i was also interested by the cache mecanisms, are they transparent?

or maybe do you know a good tutorial to understand ZODB?

thx for any help, regards.

here is a sample code :

import sys
from BTrees.OOBTree import OOBTree
from BTrees.OIBTree import OIBTree
from persistent import Persistent

class IDF2:
def __init__(self):
self.docs = OIBTree()
self.idfs = OOBTree()
def add(self, term, fromDoc):
self.docs[fromDoc] = self.docs.get(fromDoc, 0) + 1
if not self.idfs.has_key(term):
self.idfs[term] = OIBTree()
self.idfs[term][fromDoc] = self.idfs[term].get(fromDoc, 0) + 1
def N(self, term):
"total number of occurrences of 'term'"
return sum(self.idfs[term].values())
def n(self, term):
"number of documents containing 'term'"
return len(self.idfs[term])
def ndocs(self):
"number of documents"
return len(self.docs)
def __getitem__(self, key):
return self.idfs[key]
def iterdocs(self):
for doc in self.docs.iterkeys():
yield doc
def iterterms(self):
for term in self.idfs.iterkeys():
yield term

storage = FileStorage.FileStorage("%s.fs" % sys.argv[1])
db = DB(storage)
conn = db.open()
dbroot = conn.root()
if not dbroot.has_key('idfs'):
dbroot['idfs'] = IDF2()
idfs = dbroot['idfs']

import transaction
for i, line in enumerate(open(sys.argv[1])):
# considering doc is linenumber...
for word in line.split():
idfs.add(word, i)
# Commit the change
transaction.commit()

---
i was expecting :

storage = FileStorage.FileStorage("%s.fs" % sys.argv[1])
db = DB(storage)
conn = db.open()
dbroot = conn.root()
print dbroot.has_key('idfs')

=> to return True

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ZODB for inverted index?

2006-10-24 Thread vd12005

thanks for your reply,

anyway can someone help me on how to "rewrite" and "reload" a class
instance when using ZODB ?

regards

-- 
http://mail.python.org/mailman/listinfo/python-list


Sorted and reversed on huge dict ?

2006-11-03 Thread vd12005
Hello,

i would like to sort(ed) and reverse(d) the result of many huge
dictionaries (a single dictionary will contain ~ 15 entries). Keys
are words, values are count (integer).

i'm wondering if i can have a 10s of these in memory, or if i should
proceed one after the other.

but moreover i'm interested in saving theses as values, keys sorted and
reversed (ie most frequent first), i can do it with sort from unix
command but i wonder how i should do it with python to be memory
friendly.

can it be done by using :

from itertools import izip
pairs = izip(d.itervalues(), d.iterkeys())
for v, k in reversed(sorted(pairs)):
print k, v

or will it be the same as building the whole list ?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorted and reversed on huge dict ?

2006-11-03 Thread vd12005
thanks for your replies :)

so i just have tried, even if i think it will not go to the end => i
was wrong : it is around 1.400.000 entries by dict...

but maybe if keys of dicts are not duplicated in memory it can be done
(as all dicts will have the same keys, with different (count) values)?

memory is 4Gb of ram, is there a good way to know how much ram is used
directly from python  (or should i rely on 'top' and other unix
command? by now around 220mb is used for around 200.000 words handled
in 15 dicts)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorted and reversed on huge dict ?

2006-11-03 Thread vd12005

so it still unfinished :) around 1GB for 1033268 words :) (comes from a
top unix command)

Paul > i was also thinking on doing it like that by pip-ing to 'sort |
uniq -c | sort -nr' , but i'm pleased if Python can handle it. (well
but maybe Python is slower? will check later...)

Klaas > i do not know about intern construct, i will have look, but
when googling i first found a post from Raymond Hettinger so i'm going
to mess my mental space :)
http://mail.python.org/pipermail/python-dev/2003-November/040433.html

best regards.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorted and reversed on huge dict ?

2006-11-04 Thread vd12005

so it has worked :) and last 12h4:56, 15 dicts with 1133755 keys, i do
not know how much ram was used as i was not always monitoring it.

thanks for all replies, i'm going to study intern and others
suggestions, hope also someone will bring a pythonic way to know memory
usage :)

best.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorted and reversed on huge dict ?

2006-11-04 Thread vd12005

just to be sure about intern, it is used as :

>>> d, f = {}, {}
>>> s = "this is a string"
>>> d[intern(s)] = 1
>>> f[intern(s)] = 1

so actually the key in d and f are a pointer on an the same intern-ed
string? if so it can be interesting,

>>> print intern.__doc__
intern(string) -> string

``Intern'' the given string.  This enters the string in the (global)
table of interned strings whose purpose is to speed up dictionary
lookups.
Return the string itself or the previously interned string object with
the
same value.

the comment here: "(Changed in version 2.3: Interned strings used to be
immortal, but you now need to keep a reference to the interned string
around.)", if it the string is used as a key, it is still reference-d,
am i right?

-- 
http://mail.python.org/mailman/listinfo/python-list