On 27 Oct, 17:10, Bryan <bryanv...@gmail.com> wrote: > I'm designing a system and wanted to get some feedback on a potential > performance problem down the road while it is still cheap to fix. > > The system is similar to an accounting system where a system tracks > "Things" > which move between different "Buckets". The system answers these > questions: > - How many Things are in this Bucket today? > - Move Things from Bucket1 to Bucket2... > - Now how many Things are in each Bucket? > > So each time a Thing is moved between Buckets, I imagine a DB row like > this: > | id | thingId | toBucket | fromBucket | qty | > > Then to find how many Things are in a certain Bucket: > 1. Start with initial qty in the bucket at the beginning of time > 2. *Add* all qty moved *to* the bucket since beginning of time > 3. *Subtract* all qty moved *from* the bucket since beginning of time > > Simple system to "account" for Things. > > My problem is this. This design will inherantly get slower as time > goes on. > As the number of rows recording a transfer between buckets increases, > the query > to see how many Things are in a Bucket will get slower. I experience > this when > I use gnucash (which I love). I don't do "closing entries" at the end > of the > year, so each account has every transaction I have every made. I see > it getting > slower. It is nothing I am going to do anything about, because it is > still fast > enough for me. But I have to wonder how big companies with thousands > of > transactions a day do this? > > One solution would be to do a "closing entry" at certain periods in > time, so old > info would be archived. Each bucket would start the new time period > with a > balance of Things equal to what it was at the point in time we > "closed". > > How else to keep a record of every transaction, but not have the speed > of the > question "How many Things in Bucket x" depend on looking @ every > transaction > record ever made?
As well as what Jonathan has said, the following might be worth researching: () "Table Partitioning" - a specific example for postgres is here: http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html. (Although other DB's support similar techniques) () Shared Nothing Architecture (aka. 'Sharding') with appropriate 'buckets' which is massively scalable, but I'm guessing overkill :) http://en.wikipedia.org/wiki/Shared_nothing_architecture - IIRC SQLAlchemy has a basic implementation of this in its examples. Cheers, Jon. -- http://mail.python.org/mailman/listinfo/python-list