News123 wrote:
Hi.

I started playing with PIL.

I'm performing operations on multiple images and would like compromise
between speed and memory requirement.

The fast approach would load all images upfront and create then multiple
result files. The problem is, that I do not have enough memory to load
all files.

The slow approach is to load each potential source file only when it is
needed and to release it immediately after (leaving it up to the gc to
free memory when needed)



The question, that I have is whether there is any way to tell python,
that certain objects could be garbage collected if needed and ask python
at a later time whether the object has been collected so far (image has
to be reloaded) or not (image would not have to be reloaded)


# Fastest approach:
imgs = {}
for fname in all_image_files:
    imgs[fname] = Image.open(fname)
for creation_rule in all_creation_rules():
    img = Image.new(...)
    for img_file in creation_rule.input_files():
        img = do_somethingwith(img,imgs[img_file])
    img.save()


# Slowest approach:
for creation_rule in all_creation_rules():
    img = Image.new(...)
    for img_file in creation_rule.input_files():
        src_img = Image.open(img_file)
        img = do_somethingwith(img,src_img)
    img.save()



# What I'd like to do is something like:
imgs = GarbageCollectable_dict()
for creation_rule in all_creation_rules():
    img = Image.new(...)
    for img_file in creation_rule.input_files():
        if src_img in imgs: # if 'm lucke the object is still there
                src_img = imgs[img_file]
        else:
                src_img = Image.open(img_file)
        img = do_somethingwith(img,src_img)
    img.save()



Is this possible?

Thaks in advance for an answer or any other ideas of
how I could do smart caching without hogging all the system's
memory


You don't say what implementation of Python, nor on what OS platform. Yet you're asking how to influence that implementation.

In CPython, version 2.6 (and probably most other versions, but somebody else would have to chime in) an object is freed as soon as its reference count goes to zero. So the garbage collector is only there to catch cycles, and it runs relatively infrequently.

So, if you keep a reference to an object, it'll not be freed. Theoretically, you can use the weakref module to keep a reference without inhibiting the garbage collection, but I don't have any experience with the module. You could start by studying its documentation. But probably you want a weakref.WeakValueDictionary. Use that in your third approach to store the cache.

If you're using Cython or Jython, or one of many other implementations, the rules will be different.

The real key to efficiency is usually managing locality of reference. If a given image is going to be used for many output files, you might try to do all the work with it before going on to the next image. In that case, it might mean searching all_creation_rules for rules which reference the file you've currently loaded, measurement is key.


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to