external files best practices

David Helgason Thu, 15 Jan 2004 10:07:02 -0800

I think those best practices threads are a treat to follow (might even consider archiving some of them in a sort of best-practices faq), so here's one more.

In coding an game asset server I want to keep a large number of file revisions of varying sizes (1Kb-50Mb) inside the database.

Naturally I want to avoid having to allocate whole buffers of 50Mb too often.

I am also throwing around rsync-style deltas (with librsync), that I run through functions to generate new objects. The librsync functions have an elegant buffer oriented approach to reading&writing data, so I need a way to read through my objects inside the database, again without having them allocated in RAM too often.

I tried browsing around, but couldn't find any good discussion of this (or just plain good advice), but of course that may be due to google-impairment on my part.

I see several options:

1) Bytea column. Seems the cleanest solution, but *) it includes lots of escaping-unescaping with associated allocation of huge buffers. *) I seem to remember reading in a discussion in the [hackers] list that the TOAST column substring operation had been optimized so that partial fetches might be viable. *) Incremental inserts (using a series of "update file set data = data || moredata where id=666") probably won't scale nicely. *) However it's fully ACID, *) and recommended by the docs, if that's a good reason. *) TOAST compression. Can't quite figure out if this happens, but in many cases would store a lot of disk-space & -access. *) passing them as arguments to the librsync functions may be a hassle. open questions: I wonder what timeframe for this to be viable. It seems that two things are being worked on that would fix this. The wire-protocol to avoid having to escape all communications, and methods for fast incremental update. Both seem to be at least a year away (Pg 7.5, and someone even suggested jumping to 8.0 with the wire-protocol changes).

2) Bytea column with many rows per file. *) Also some escaping-unescaping, but with small buffers *) Simple and fast to do partial inserts/selects. *) Full ACID (and with a couple of triggers, also easy to keep completely correct) *) Butt ugly solution, but what we used to do with ye olde versions of MSQL. *) Compression, but probably lesser space savings with smaller chunks. open questions: What would the chunk-size be? I would imagine 5-20Kb per row.

3) Large Objects. *) Requires special interface to talk to, not that its so bad. *) Need a trigger to garbage collect them (simple to do though) *) Fully ACID. *) I've actually implemented the librsync functions using this allready, and it seems to work nicely with large files.

4) External files. I really don't like the idea of this. *) As high performance as anything. *) Need an alternate mechanism to access the server, with security implications etc. *) Complications trying to maintain more-or-less full ACIDity open questions: What tricks to use to keep this ACID-ish? I won't have any updating of content once it's committed, so just keeping the files read-only most of the time would do most of the trick. Also, I imagine

I'm going with 3) for now, but even tempted by 2) when looking this over.

At one point I was playing around with 4), and made a plperlu server to run inside the database to ease this. The security was implemented by the client receiving a ticket from the database when requesting to create a file, and then having to send this ticket before being allowed to send any data to the perl-server. Not an un-clever system, I think, that I'd be happy to share.

I wonder what other people are doing and if anyone has other arguments.

David Helgason,
Over the Edge Entertainments


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

[GENERAL] Bytea/Large Objects/external files best practices

Reply via email to