Re: a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

2009-12-10 Thread Klauss
On Dec 9, 11:58 am, Valery  wrote:
> Hi all,
>
> Q: how to organize parallel accesses to a huge common read-only Python
> data structure?
>
> Details:
>
> I have a huge data structure that takes >50% of RAM.
> My goal is to have many computational threads (or processes) that can
> have an efficient read-access to the huge and complex data structure.
>
> 
>
> 1. multi-processing
>  => a. child-processes get their own *copies* of huge data structure
> -- bad and not possible at all in my case;

How's the layout of your data, in terms # of objects vs. bytes used?
Just to have an idea of the overhead involved in refcount
externalization (you know, what I mentioned here:
http://groups.google.com/group/unladen-swallow/browse_thread/thread/9d2af1ac3628dc24
)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

2009-12-14 Thread Klauss
On Dec 11, 11:00 am, Antoine Pitrou  wrote:
> I was going to suggest memcached but it probably serializes non-atomic
> types.
Atomic as well.
memcached communicates through sockets[3] (albeit possibly unix
sockets, which are faster than TCP ones).

multiprocessing has shared memory schemes, but does a lot of internal
copying (uses ctypes)... and are particularly unhelpful when your
shared data is highly structured, since you can't share objects, only
primitive types.


I finished a patch that pushes reference counters into packed pools.
It has lots of drawbacks, but manages to solve this particular
problem, if the data is prominently non-numeric (ie: lists and dicts,
as mentioned before). Of the drawbacks, perhaps the bigger is a bigger
memory footprint - yep... I don't believe there's anything that can be
done to change that. It can be optimized, to make the overhead a
little less though.

This test code[1] consumes roughly 2G of RAM on an x86_64 with python
2.6.1, with the patch, it *should* use 2.3G of RAM (as specified by
its output), so you can see the footprint overhead... but better page
sharing makes it consume about 6 times less - roughly 400M... which is
the size of the dataset. Ie: near-optimal data sharing.

This patch[2] has other optimizations intermingled - if there's
interest in the patch without those (which are both unproven and
nonportable) I could try to separate them. I will have to, anyway, to
upload for inclusion into CPython (if I manage to fix the
shortcomings, and if it gets approved).

The most important shortcomings of the refcount patch are:
 1) Tripled memory overhead of reference counting. Before, it was a
single Py_ssize_t per object. Now, it's two pointers plus the
Py_ssize_t. This could perhaps be optimized (by getting rid of the
arena pointer, for instance).
 2) Increased code output for Py_INCREF/DECREF. It's small, but it
adds up to a lot. Timings on test_decimal.py (a small numeric
benchmark I use, which might not be representative at all) shows a 10%
performance loss in CPU time. Again, this might be optimized with a
lot of work and creativity.
 3) Breaks binary compatibility, and in weird cases source
compatibility with extension modules. PyObject layout is different, so
statically-initialized variables need to stick to using CPython's
macros (I've seen cases when they don't), and code should use Py_REFCNT
() for accessing the refcount, but many just do ob->ob_refcnt, which
will break with the patch.
 4) I'm also not really sure (haven't tested) what happens when
CPython runs out of memory - I tried real hard not to segfault, even
recover nicely, but you know how hard that is...

[3] 
http://code.google.com/p/memcached/wiki/FAQ#How_does_it_compare_to_a_server_local_cache?_(PHP%27s_APC,_mm
[2] http://www.deeplayer.com/claudio/misc/Python-2.6.1-refcount.patch
[1] test code below

import time
from multiprocessing import Pool

def usoMemoria():
import os
import subprocess
pid = os.getpid()
cmd = "ps -o vsz=,rss=,share= -p %s --ppid %s" % (pid,pid)
p = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
info = p.stdout.readlines()
s = sum( int(r) for v,r,s in map(str.split,map(str.strip, info)) )
return s

def f(_):
return sum(int(x) for d in huge_global_data for x in d if x !=
"state") # my sofisticated formula goes here

if __name__ == '__main__':
huge_global_data = []
for i in xrange(50):
d = {}
d[str(i)] = str(i*10)
d[str(i+1)] = str(i)
d["state"] = 3
huge_global_data.append(d)

p = Pool(7)
res= list(p.map(f, xrange(20)))

print "%.2fM" % (usoMemoria() / 1024.0)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

2010-01-07 Thread Klauss
On Dec 31 2009, 6:36 pm, garyrob  wrote:
> One thing I'm not clear on regarding Klauss' patch. He says it's
> applicable where the data is primarily non-numeric. In trying to
> understand why that would be the case, I'm thinking that the increased
> per-object memory overhead for reference-counting would outweigh the
> space gains from the shared memory.
>
> Klauss's test code stores a large number of dictionaries which each
> contain just 3 items. The stored items are strings, but short ones...
> it looks like they take up less space than double floats(?).
>
> So my understanding is that the point is that the overhead for the
> dictionaries is big enough that the patch is very helpful even though
> the stored items are small. And that the patch would be less and less
> effective as the number of items stored in each dictionary became
> greater and greater, until eventually the patch might do more use more
> space for reference counting than it saved by shared memory.

Not really.
The real difference is that numbers (ints and floats) are allocated
out of small contiguous pools. So even if a great percentage of those
objects would remain read-only, there's probably holes in those pools
left by the irregular access pattern during initialization, and those
holes would be written to eventually as the pool gets used.

In essence, those pools aren't read-only for other reasons than
reference counting.

Dictionaries, tuples and lists (and many other types) don't exhibit
that behavior.
-- 
http://mail.python.org/mailman/listinfo/python-list