Martin MOKREJŠ wrote: > Duncan Booth wrote: >> Almost anything you do copies references. > > > But what does this?: > > x = 'xxxxx'
Copies a reference to the existing string 'xxxxx' and stores the new reference in the variable x. If its a global variable then this will involve creating a new reference to the existing string 'x' and storing both references in the globals() dictionary for the module. > a = x[2:] Creates a new string by slicing the one reference by x, then creates a new reference to the string and stores it in the variable 'a' (as above), then releases the first reference to the new string. > b = z + len(x) Creates a new reference to 'z'; creates a new reference to 'x', creates a new reference to 'len', calls the function (which involves creating and releasing a tuple), releases the references to 'len' and 'x', invokes the add operator, releases the reference to 'z', then creates a new reference to the result while storing it in the variable 'b'. The result of the addition could be a new object, or it may just be a new reference created to an existing object. > dict[a] = b If 'dict' is a variable of type 'dict' (and its a bad idea to use the type name as the variable name), then this creates new references to each of 'a' and 'b' and stores them in the object 'dict'. As I said, almost everything you do involves creating and releasing references to existing objects. >> >> The member variables are all still accessible as member variables >> until you run your loop at the end to clear them, so no way could >> Python release them. > > OK, I wanted to know if there's some assignment using a reference, > which makes the internal garbage collector not to recycle the memory, > as, for example, the target dictionary still keeps reference to the > temporary dictionary. > So long as the object is still accessible it won't be free. You aren't making anything new reference the temporary dictionary, but so long as the old references are still there it can't free the memory. >> >> Even if they are Shelf objects, I see no reason here why you have to > > I gathered from previous discussion it's faster to use bsddb directly, > so no shelve. > Until you know that speed of accessing the file is a problem, I would go for the simple to use solution. At present there is no evidence that the speed of accessing the file is going to be a problem: your problems seem to stem from excessive memory use which means that page faults are likely to swamp everything else as far as execution time is concerned. >> What on earth is the point of an explicit call to __add__? If Guido >> had meant us to use __add__ he woudn't have created '+'. > > To invoke additon directly on the object. It's faster than letting > python to figure out that I sum up int() plus int(). It definitely > has helped a lot when using Decimal(a) + Decimal(b), where I got rid > of thousands of Decimal(__new__), __init__ and I think few other > methods of decimal as well - I think getcontext too. > The same point as above: it is most unlikely that these additions are going to make a noticeable difference to the running time, so leave them as easy to read additions until you have profiled the code and determined that they really are a hotspot that needs optimisiing. The less time you spend optimising prematurely, the sooner you get a program that meets your needs. >> tuple of your values then you don't have to use split or cast the >> strings > > bsddb creaps on me that I can store as a key or value only a string. > I'd love to store tuple. > >>>> import bsddb >>>> _words1 = bsddb.btopen('db1.db', 'c') >>>> _words1['a'] = 1 > > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "/usr/lib/python2.3/bsddb/__init__.py", line 120, in __setitem__ > self.db[key] = value > TypeError: Key and Data values must be of type string or None. > >>>> > > How can I record a number then? Either by wrapping your database access in code which converts the values to/from string, or by using a package which already does that (such as shelve). > > Can I use the mmap() feature on bsddb or any .db file? Most of the > time I do updates, not inserts! I don't want to rewrite all the time > 300MB file. I want to update it. What I do need for it? Know the > maximal length of a string value keept in the .db file? Can I get rid > of locking support in those huge files? Dont worry about it, updates won't rewrite the entire file. > > Definitely I can improve my algorithm. But I believe I'll always have > to work with those huge files. The point of using something like a bsddb file is that you get a system which will do the hard work for you of figuring out which bits to keep in memory and which bits to leave on disc. The sort of processing you showed, which updates the database in a random order but probably doesn't access every record should work well with a bsddb file. The code you show is dumping the contents of the temporary dictionary into the database then clearing the dictionary. Why are you waiting until the dictionaries occupy 300Mb before doing this. What happens to the runtime if you clear out the dictionaries when they occupy 30Mb or 3Mb? Questions about what will run fastest are almost impossible to answer without actually timing the results on some realistic data. Dumping the dictionaries frequently when they are comparatively small will result in higher runtime overheads since you will be updating some keys much more often, but you save by reducing the working set of your process. If your working set gets too large then expect massive increases in runtime, so it will be worth making sure that you aren't hitting this point before you consider any other optimisations. -- http://mail.python.org/mailman/listinfo/python-list