On Wed, 2009-08-12 at 18:42 +0100, Ed W wrote: > > Something like that. In dbox you have one storage directory containing > > all mailboxes' mails (so that copying can be done by simple index > > updates). Then you have a bunch of files, each about n MB (configurable, > > 2 MB by default). Expunging initially only marks the message as expunged > > in index. Then later (or immediately, configurable) you run a cronjob > > that goes through all dboxes and actually removes the used space by > > recreating those dbox files. > > > > Yeah, sounds good. > > You might consider some kind of "head optimisation", where we can > already assume that the latest chunk of mails will be noisy and have a > mixture of deletes/appends, etc. Typically mail arrives, gets > responded to, gets deleted quickly, but I would *guess* that if a mail > survives for XX hours in a mailbox then likely it's going to continue > to stay there for quite a long time until some kind of purge event > happens (user goes on a purge, archive task, etc)
If disk space usage isn't such a huge problem, I think the nightly purges solve this issue too. During the day user may get mails and delete them, and at night the deleted mails are purged. Perhaps it could help a bit if new mails were all stored in separate file(s) and at night then appended to some larger existing file, but that optimization can be left until later. :) > Oh, have you considered some "optional" api calls in the storage API? > The logic might be to assume that someone wanted to do something > clever and split the message up in some way, eg store headers > separately to bodies or bodies carved up into mime parts. The > motivation would be if there was a certain access pattern to optimise. > Eg for an SQL database it may well be sensible to split headers and > the message body in order to optimise searching - the current API may > not take advantage of that? Well, files have paths. I think the storage backend can determine from that what type the data is. So if you're writing to mails/foo/bar/123 it means you're storing a message with ID 123 to mailbox "foo/bar". It could then internally parse the message and store its header/body/mime separately.
signature.asc
Description: This is a digitally signed message part