[GENERAL] Storing large files in multiple schemas: BLOB or BYTEA

2012-10-10 Thread tigran2-postgres
Hi,

 

I need to store large files (from several MB to 1GB) in Postgres database.
The database has multiple schemas. It looks like Postgres has 2 options to
store large objects: LOB and BYTEA. However we seem to hit problems with
each of these options.

 

1. LOB. This works almost ideal, can store up to 2GB and allows streaming so
that we do not hit memory limits in our PHP backend when reading the LOB.
However all blobs are stored in pg_catalog and are not part of schema. This
leads to a big problem when you try to use pg_dump with options -n and -b to
dump just one schema with its blobs. It dumps the schema data correctly
however then it include ALL blobs in the database not just the blobs that
belong to the particular schema.

Is there a way to dump the single schema with its blobs using pg_dump or
some other utility?

 

2. BYTEA. These are correctly stored per schema so pg_dump -n works
correctly however I cannot seem to find a way to stream the data. This means
that there is no way to access the data from PHP if it is larger than memory
limit.

 

Is there any other way to store large data in Postgres that allows streaming
and correctly works with multiple schemas per database?

 

Thanks.

 

(Sorry if this double-posts on pgsql-php, I did not know which is the best
list for this question).



Re: [GENERAL] Storing large files in multiple schemas: BLOB or BYTEA

2012-10-10 Thread tigran2-postgres
>Large Objects and bytea are the only ways.

> 

>If you want to pg_dump only certain large objects, that won't work as far
as I know (maybe using permissions and a non-superuser can help).

> 

>You absolutely need to pg_dump parts of the database regularly?

> 

>Yours,

>Laurenz Albe

 

It is not an absolute requirement but would be really nice to have. We have
a multi-tenant database with each tenant data stored in a separate scheme.
Using pg_dump seems to be the ideal way to migrate tenant data from one
database to another when we need to do it to balance the load.

 



Re: [GENERAL] Storing large files in multiple schemas: BLOB or BYTEA

2012-10-10 Thread tigran2-postgres
>I believe the general consensus around here is to not do that, if you can
avoid it. File systems are much better equipped to handle files of that
magnitude, especially when it comes to retrieving them, scanning >through
their contents, or really, any access pattern aside from simple storage.

> 

>You're better off storing the blob on disk somewhere and storing a row that
refers to its location. Either key pieces for a naming scheme or the full
path.

> 

>This is especially true if you mean to later access that data with PHP.

> 

>--

>Shaun Thomas

 

Using files stored outside the database creates all sorts of problems. For
starters you lose ACID guaranties. I would prefer to keep them in database.
We did a lot of experiments with Large Objects and they really worked fine
(stored hundreds of LOBs ranging from a few MB up to 1GB). Postgres does a
really good job with Large Objects. If it was not the pg_dump problem I
would not hesitate to use LOBs.

 

 



[GENERAL] Storing large files in multiple schemas: BLOB or BYTEA

2012-10-10 Thread tigran2-postgres
>Yeah, a pg_dump mode that dumped everything but large objects would be
nice.

There is option -b for pg_dump which controls whether large objects are
dumped or no. The problem is that with option -b it dumps all large objects
regardless of what schema you requested it to dump using option -n.
Otherwise it works fine.


>I'm now wondering about the idea of implementing a pg_dump option that 
>dumped large objects into a directory tree like
>   lobs/[loid]/[lob_md5]
>and wrote out a restore script that loaded them using `lo_import`.
>
>During dumping temporary copies could be written to something like 
>lobs/[loid]/.tmp. with the md5 being calculated on the fly as the 
>byte stream is read. If the dumped file had the same md5 as the existing 
>one it'd just delete the tempfile; otherwise the tempfile would be 
>renamed to the calculated md5.
>
>That way incremental backup systems could manage the dumped LOB tree 
>without quite the same horrible degree of duplication as is currently 
>faced when using lo in the database with pg_dump.
>
>A last_modified timestamp on `pg_largeobject_metadata` would be even 
>better, allowing the cost of reading and discarding rarely-changed large 
>objects to be avoided.

Definitely interesting idea with incremental backups.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general