Re: [GENERAL] How to keep pg_largeobject from growing endlessly
On Wed, Apr 15, 2015 at 4:49 AM, Andreas Joseph Krogh wrote: > > > In other words: Does vacuumlo cause diskspace used by pg_largeobject to be > freed to the OS (after eventually vacuumed by autovacuum)? No. But that shouldn't matter in your scenario: if you create more large objects than you delete, you aren't wasting space anyway. A longer-term problem that may apply in your scenario: pg_largeobject can't grow beyond your tablespace's disk size. Unlike other tables, it's very hard to move pg_largeobject to a new database/tablespace without downtime. If your table is constantly growing and you're worrying about how much space it's taking, other storage strategies (bytea, S3, NFS, etc) might inspire more confidence. I had this problem a few months ago; since then, I only use pg_largeobject in prototyping and low-growth situations. http://www.postgresql.org/message-id/camwjz6gf9tm+vwm_0ymqypi4xk_bv2nyaremwr1ecsqbs40...@mail.gmail.com Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to keep pg_largeobject from growing endlessly
On Wed, Apr 15, 2015 at 9:57 AM, Andreas Joseph Krogh wrote: > > På onsdag 15. april 2015 kl. 15:50:36, skrev Adam Hooper > : > > On Wed, Apr 15, 2015 at 4:49 AM, Andreas Joseph Krogh > wrote: > > > > In other words: Does vacuumlo cause diskspace used by pg_largeobject to be > > freed to the OS (after eventually vacuumed by autovacuum)? > > No. > > Ok. Out of curiousity; When does it get freed, when VACUUM FULL'ed? Yes. VACUUM FULL or CLUSTER will free the space. (Of course, you need a lot of free disk space to perform those operations.) Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] VACUUM FULL pg_largeobject without (much) downtime?
Hi list, We run a website. We once stored all sorts of files in pg_largeobject, which grew to 266GB. This is on an m1.large on Amazon EC2 on a single, magnetic, non-provisioned-IO volume. In that context, 266GB is a lot. We've since moved all but 60GB of that data to S3. We plan to reduce that to 1GB by deleting old, unused data. Of course, pg_largeobject will still take up 266GB because autovacuum doesn't reduce disk space. We want to move our entire database to an SSD volume, with as little downtime as possible. My tentative plan: 1. Use CREATE TABLESPACE and pg_repack to move user tables to a temporary volume 2. Take down Postgres, copy system-table files to the new volume, and start up Postgres from the new volume 3. Use pg_repack to move everything to the new volume This plan won't work: Step 2 will be too slow because pg_largeobject still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our staging database: it took two hours, during which pg_largeobject was locked. When pg_largeobject is locked, lots of our website doesn't work. How can we move our database without much downtime? Is there a way to `VACUUM FULL` pg_largeobject without locking it for very long? Aside from that problem, is the rest of my upgrade plan sane? For what it's worth, here's some info from VACUUM VERBOSE: overview=# VACUUM (VERBOSE, ANALYZE) pg_largeobject; INFO: vacuuming "pg_catalog.pg_largeobject" INFO: scanned index "pg_largeobject_loid_pn_index" to remove 1112630 row versions DETAIL: CPU 3.38s/9.89u sec elapsed 69.02 sec. INFO: "pg_largeobject": removed 1112630 row versions in 374889 pages DETAIL: CPU 7.48s/2.22u sec elapsed 150.44 sec. INFO: index "pg_largeobject_loid_pn_index" now contains 29373858 row versions in 370470 pages DETAIL: 1112630 index row versions were removed. 279489 index pages have been deleted, 276070 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.06 sec. INFO: "pg_largeobject": found 1112622 removable, 231974 nonremovable row versions in 3189820 out of 34522175 pages DETAIL: 0 dead row versions cannot be removed yet. There were 10318306 unused item pointers. 0 pages are entirely empty. CPU 69.81s/28.83u sec elapsed 1402.53 sec. INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 3 of 34522175 pages, containing 25085 live rows and 0 dead rows; 25085 rows in sample, 24203398 estimated total rows VACUUM Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] VACUUM FULL pg_largeobject without (much) downtime?
On Tue, Feb 3, 2015 at 12:58 PM, Bill Moran wrote: > On Tue, 3 Feb 2015 10:53:11 -0500 > Adam Hooper wrote: > >> This plan won't work: Step 2 will be too slow because pg_largeobject >> still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our >> staging database: it took two hours, during which pg_largeobject was >> locked. When pg_largeobject is locked, lots of our website doesn't >> work. > > Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing > of pg_largeobject on your test system to see if it's fast enough? On the 30GB that's left on staging, it takes 50min. Unfortunately, our staging database is now at 30GB because we already completed a VACUUM FULL on it. It seems difficult to me to revert that operation. But I need an orders-of-magnitude difference, and this clearly isn't it. > How big is the non-lo data? It's 65GB, but I've used pg_repack to move it to a separate tablespace so it won't affect downtime. Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] VACUUM FULL pg_largeobject without (much) downtime?
On Tue, Feb 3, 2015 at 2:29 PM, Bill Moran wrote: > On Tue, 3 Feb 2015 14:17:03 -0500 > Adam Hooper wrote: > > My recommendation here would be to use Slony to replicate the data to a > new server, then switch to the new server once the data has synchornized. Looks exciting. But then I notice: "Slony-I does not automatically replicate changes to large objects (BLOBS)." [1] Does that still apply? It's doable for us to VACUUM FULL and add a notice to our website saying, "you can't upload files for the next two hours." Maybe that's a better idea? Enjoy life, adam [1] http://slony.info/documentation/2.2/limitations.html -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] VACUUM FULL pg_largeobject without (much) downtime?
On Tue, Feb 3, 2015 at 3:12 PM, Bill Moran wrote: > On Tue, 3 Feb 2015 14:48:17 -0500 > Adam Hooper wrote: > >> It's doable for us to VACUUM FULL and add a notice to our website >> saying, "you can't upload files for the next two hours." Maybe that's >> a better idea? > > It's really going to depend on what options you have available. Keep > in mind that users won't be able to read large objects either, so > you'll need to disable whatever features of the site view the files > as well. Whether that's easier or harder depends on how much work it > would be to disable those features of the site. For the record, this is what we went with. To those looking to use large objects (who, um, are already searching for how to VACUUM FULL pg_largeobject), my advice: if we were to do it all over again, we'd have used a separate database per "bucket". For instance, imagine you store uploaded files and processed data in pg_largeobject. Then some day you migrate the processed data elsewhere. If uploaded files went in one database and processed data went into the second, then the uploaded-files database's pg_largeobject table would remain slim, and you could simply DROP DATABASE on the other after all clients stopped using it. There wouldn't be any downtime. My take-away, though, is to avoid the pg_largeobject table whenever possible. You can move BYTEA data with zero downtime using pg_repack, but the same can't be said for large objects. Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Row-level Security vs Application-level authz
On Tue, Feb 24, 2015 at 8:37 PM, Stephen Frost wrote: > * David Steele (da...@pgmasters.net) wrote: >> So I guess my last question is if you are inserting rows into a table to >> track user connections, how do you clean them out when the client does >> not disconnect cleanly? Or is this table intended to be append-only? > > It wouldn't be intended to be append-only but I agree that, ideally, > there'd be a way to address clients disconnect uncleanly. This is starting to sound like a web app, which I have experience with. The cardinal rule: assume everybody disconnects randomly, and code accordingly :). The goal here isn't to make the session table reflect the number of users who are currently logged in. Rather, it's to ensure the session table doesn't grow infinitely. * You can set a restriction like, "a user can only be logged in once". During login, delete other sessions associate with that user. The session table's maximum size is the size of the user table. * You can use an expiry-check function. Best is a definitive "this session is disconnected"; if you can't do that, you can try a rule such as "user cannot be logged in more than 20 times and sessions older than two weeks are invalid". During login, run the expiry checker on that user and delete expired rows. With the right function, you can constrain the session table to a reasonable size. * You can simply limit the size of the session table. If your limit is 100 and a user starts a 101st session, delete the first session. The world of websites involves lots of users and loads of short-lived sessions. A website doesn't check whether the user has access to a row: it checks whether the user has access to an endpoint with the given parameters. Postgres RLS seems like a bad approach for that use case. Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hex characters in COPY input
On Fri, Feb 27, 2015 at 9:39 AM, Melvin Call wrote: > On 2/26/15, Vick Khera wrote: >> On Thu, Feb 26, 2015 at 9:50 PM, Melvin Call >> wrote: >> >>> I get an error "ERROR: invalid byte sequence for >>> encoding "UTF8": 0xe9616c" >> >> Character code E9 is not UTF8. Don't tell Postgres you're importing UTF8 if >> you're not. > > Regardless, can you point me to some reading that would have clued > me in that e9 is not a UTF8 character? Or is the clue the fact that it was not > preceeded with 0x00? The error message Postgres gave you is a pretty darned good clue :). But your question has an error, and it's worth expounding a bit. 0xe9 is not a UTF8 character because 0xe9 is not a _character_. 0xe9 is a _byte_. Characters are not bytes. Characters can be _encoded_ into bytes, and that's not the same thing. UTF-8 encoding is a bit confusing: any byte in the range [0x00-0x7f] represents the same character as in ASCII encoding (an extremely popular encoding). Any byte in the range [0x80-0xff] is the "leading byte" in a sequence of bytes that represents a single character. "Continuation" bytes are in the range [0x80-0xbf]. (Why not the range [0x00-0xff]? Because UTF-8 was designed to generate errors when fed non-UTF8 byte sequences.) The first four bits of the leading byte describe how many continuation bytes there are. If you care to read up on the how and why of UTF-8 (a red herring in this discussion), try: See http://en.wikipedia.org/wiki/UTF-8 Back to 0xe9. 0xe9 is '1110 1001' in binary. Postgres' UTF-8 decoder sees that initial '1110' and determines that it needs to inspect three bytes to read one character. The second byte is 0x61, which is not in the range [0x80-0xbf], so Postgres reports an invalid byte sequence. Hooray: it produces exactly the error message it should. You don't need to concern yourself with the complications of UTF-8. You only need to know that bytes are not characters; if you don't know the encoding of a sequence of bytes, you've made a logic error. Postgres told you the error, though it didn't hint at how to fix it. (Text editors and web browsers use heuristics to guess file encodings, and they're usually right, though it's impossible to create a perfect heuristic. See http://stackoverflow.com/questions/4198804/how-to-reliably-guess-the-encoding-between-macroman-cp1252-latin1-utf-8-and for further discussion there.) If you're looking for take-away lessons, the main one is: "read the error message" :). Next time you have the "wrong encoding" problem, you have two options: 1) figure out the encoding and tell Postgres; or 2) regenerate the file in the correct encoding (UTF-8). The "why" is here: http://www.joelonsoftware.com/articles/Unicode.html We on this list jumped strait to option 1. We've memorized 0xe9 in particular, because we've been through your pain before. In the Americas and Western Europe, if a file contains the byte 0xe9 it probably contains the character "é" encoded as windows-1252/ISO-8859-1/ISO-8859-15. That's very common. MySQL in particular is a ghastly Internet denizen, in that it defaults to ISO-8859-15 in an apparent crusade against globalization and modern standards. Enjoy life, Adam -- Adam Hooper +1-613-986-3339 http://adamhooper.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general