I think those best practices threads are a treat to follow (might even consider archiving some of them in a sort of best-practices faq), so here's one more.

In coding an game asset server I want to keep a large number of file revisions of varying sizes (1Kb-50Mb) inside the database.

Naturally I want to avoid having to allocate whole buffers of 50Mb too often.

I am also throwing around rsync-style deltas (with librsync), that I run through functions to generate new objects. The librsync functions have an elegant buffer oriented approach to reading&writing data, so I need a way to read through my objects inside the database, again without having them allocated in RAM too often.

I tried browsing around, but couldn't find any good discussion of this (or just plain good advice), but of course that may be due to google-impairment on my part.

I see several options:


1) Bytea column. Seems the cleanest solution, but
*) it includes lots of escaping-unescaping with associated allocation of huge buffers.
*) I seem to remember reading in a discussion in the [hackers] list that the TOAST column substring operation had been optimized so that partial fetches might be viable.
*) Incremental inserts (using a series of "update file set data = data || moredata where id=666") probably won't scale nicely.
*) However it's fully ACID,
*) and recommended by the docs, if that's a good reason.
*) TOAST compression. Can't quite figure out if this happens, but in many cases would store a lot of disk-space & -access.
*) passing them as arguments to the librsync functions may be a hassle.
open questions: I wonder what timeframe for this to be viable. It seems that two things are being worked on that would fix this. The wire-protocol to avoid having to escape all communications, and methods for fast incremental update. Both seem to be at least a year away (Pg 7.5, and someone even suggested jumping to 8.0 with the wire-protocol changes).


2) Bytea column with many rows per file.
*) Also some escaping-unescaping, but with small buffers
*) Simple and fast to do partial inserts/selects.
*) Full ACID (and with a couple of triggers, also easy to keep completely correct)
*) Butt ugly solution, but what we used to do with ye olde versions of MSQL.
*) Compression, but probably lesser space savings with smaller chunks.
open questions: What would the chunk-size be? I would imagine 5-20Kb per row.


3) Large Objects.
*) Requires special interface to talk to, not that its so bad.
*) Need a trigger to garbage collect them (simple to do though)
*) Fully ACID.
*) I've actually implemented the librsync functions using this allready, and it seems to work nicely with large files.


4) External files. I really don't like the idea of this.
*) As high performance as anything.
*) Need an alternate mechanism to access the server, with security implications etc.
*) Complications trying to maintain more-or-less full ACIDity
open questions: What tricks to use to keep this ACID-ish? I won't have any updating of content once it's committed, so just keeping the files read-only most of the time would do most of the trick. Also, I imagine


I'm going with 3) for now, but even tempted by 2) when looking this over.

At one point I was playing around with 4), and made a plperlu server to run inside the database to ease this. The security was implemented by the client receiving a ticket from the database when requesting to create a file, and then having to send this ticket before being allowed to send any data to the perl-server. Not an un-clever system, I think, that I'd be happy to share.

I wonder what other people are doing and if anyone has other arguments.



David Helgason,
Over the Edge Entertainments


---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to