On Apr 22, 11:09 am, Steven D'Aprano <[EMAIL PROTECTED]> wrote: > On Sat, 21 Apr 2007 20:13:44 -0700, Prateek wrote: > > I have a bit of a specialized request. > > > I'm reading a table of strings (specifically fixed length 36 char > > uuids generated via uuid.uuid4() in the standard library) from a file > > and creating a set out of it. > > Then my program is free to make whatever modifications to this set. > > > When I go back to save this set, I'd like to be able to only save the > > new items. > > This may be a silly question, but why? Why not just save the modified set, > new items and old, and not mess about with complicated transactions?
I tried just that. Basically ignored all the difficulties of difference calculation and just overwrote the entire tablespace with the new set. At about 3000 entries per file (and 3 files) along with all the indexing etc. etc. just the extra I/O cost me 28% performance. I got 3000 entries committed in 53s with difference calculation but in 68s with writing the whole thing. > > After all, you say: > > > PS: Yes - I need blazing fast performance - simply pickling/unpickling > > won't do. Memory constraints are important but definitely secondary. > > Disk space constraints are not very important. > > Since disk space is not important, I think that you shouldn't care that > you're duplicating the original items. (Although maybe I'm missing > something.) > > Perhaps what you should be thinking about is writing a custom pickle-like > module optimized for reading/writing sets quickly. I already did this. I'm not using the pickle module at all - Since I'm guaranteed that my sets contain a variable number of fixed length strings, I write a header at the start of each tablespace (using struct.pack) marking the number of rows and then simply save each string one after the other without delimiters. I can do this simply by issuing "".join(list(set_in_question)) and then saving the string after the header. There are a few more things that I handle (such as automatic tablespace overflow) Prateek -- http://mail.python.org/mailman/listinfo/python-list