On Wed, Jun 4, 2014 at 9:21 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > If your input data is uniformly incompressible, that's not too surprising. > pg_largeobject tuples hold BLCKSZ/4 bytes of data, plus some overhead, so > the only way that 4 of them will fit on a page is if compression saves > more than the overhead. You only need a couple percent compression > savings to make that work, but precompressed data might resist being > compressed even that much. So you end up with only 3 tuples per page > which makes it right about 33.33% overhead. > > I have several filters compressing the data (shuffle, nbit, gzip). I could just disable gzip. Would this result in more efficient storage (ie less total GB used) beyond the per-block utilization bump? If that works, that would be counter intuitive and use more bandwidth (which I'm ok with) so it's a strange tradeoff.
> According to the commentary in the source code (which you've evidently > read to some extent), a smaller block size was chosen to reduce the > overhead of partial updates of large-object data. You're probably not > going to do that, but PG doesn't know it. In any case, you can't choose > a LOBLKSIZE exceeding, or even equaling, the page size; so there's not > room for a huge change here. According to the above analysis, if you > want to pack more LO data per page, you'd actually be better off with > a *smaller* LOBLKSIZE to limit the per-page wastage. > I only looked at a header briefly :-). The rest was in the docs ML http://www.postgresql.org/docs/9.3/static/catalog-pg-largeobject.html > > What would likely be the best bet for you is to run a modified build with > LOBLKSIZE held at 2K and BLCKSZ bumped up to 32K. That would cut the > wastage from 2K per 8K to 2K per 32K. Operationally, it'd be safer since > there *is* a pg_control cross-check on BLCKSZ, so you could not > accidentally start a standard build against the database. > > Or you could just live with it. 10GB of disk is cheap insurance against > human error, and you're definitely taking some risk of human error if > you run a locally-patched build. > > regards, tom lane > The overhead is noticeable as storage for me is a little hard to come by (budget limited R&D) but I respect that point. I think something should be added in the future though to match os file storage efficiency given the new lo limits are 4TB per entry - 30% is hefty overhead. Perhaps multiple large object tables such that the user can change the respective tables properties or whatever parameters needed being runtime specifiable to make this more efficient / higher utilization. I may experiment with some custom builds in the future just to see how it goes. -Jason