Re: [GENERAL] How to keep pg_largeobject from growing endlessly

2015-04-15 Thread Adam Hooper
On Wed, Apr 15, 2015 at 4:49 AM, Andreas Joseph Krogh
 wrote:
>
>
> In other words: Does vacuumlo cause diskspace used by pg_largeobject to be 
> freed to the OS (after eventually vacuumed by autovacuum)?

No.

But that shouldn't matter in your scenario: if you create more large
objects than you delete, you aren't wasting space anyway.

A longer-term problem that may apply in your scenario: pg_largeobject
can't grow beyond your tablespace's disk size. Unlike other tables,
it's very hard to move pg_largeobject to a new database/tablespace
without downtime. If your table is constantly growing and you're
worrying about how much space it's taking, other storage strategies
(bytea, S3, NFS, etc) might inspire more confidence. I had this
problem a few months ago; since then, I only use pg_largeobject in
prototyping and low-growth situations.
http://www.postgresql.org/message-id/camwjz6gf9tm+vwm_0ymqypi4xk_bv2nyaremwr1ecsqbs40...@mail.gmail.com

Enjoy life,
Adam

-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to keep pg_largeobject from growing endlessly

2015-04-15 Thread Adam Hooper
On Wed, Apr 15, 2015 at 9:57 AM, Andreas Joseph Krogh
 wrote:
>
> På onsdag 15. april 2015 kl. 15:50:36, skrev Adam Hooper 
> :
>
> On Wed, Apr 15, 2015 at 4:49 AM, Andreas Joseph Krogh
>  wrote:
> >
> > In other words: Does vacuumlo cause diskspace used by pg_largeobject to be 
> > freed to the OS (after eventually vacuumed by autovacuum)?
>
> No.
>
> Ok. Out of curiousity; When does it get freed, when VACUUM FULL'ed?

Yes. VACUUM FULL or CLUSTER will free the space. (Of course, you need
a lot of free disk space to perform those operations.)

Enjoy life,
Adam

-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] VACUUM FULL pg_largeobject without (much) downtime?

2015-02-03 Thread Adam Hooper
Hi list,

We run a website. We once stored all sorts of files in pg_largeobject,
which grew to 266GB. This is on an m1.large on Amazon EC2 on a single,
magnetic, non-provisioned-IO volume. In that context, 266GB is a lot.

We've since moved all but 60GB of that data to S3. We plan to reduce
that to 1GB by deleting old, unused data. Of course, pg_largeobject
will still take up 266GB because autovacuum doesn't reduce disk space.

We want to move our entire database to an SSD volume, with as little
downtime as possible. My tentative plan:

1. Use CREATE TABLESPACE and pg_repack to move user tables to a temporary volume
2. Take down Postgres, copy system-table files to the new volume, and
start up Postgres from the new volume
3. Use pg_repack to move everything to the new volume

This plan won't work: Step 2 will be too slow because pg_largeobject
still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our
staging database: it took two hours, during which pg_largeobject was
locked. When pg_largeobject is locked, lots of our website doesn't
work.

How can we move our database without much downtime? Is there a way to
`VACUUM FULL` pg_largeobject without locking it for very long? Aside
from that problem, is the rest of my upgrade plan sane?

For what it's worth, here's some info from VACUUM VERBOSE:

overview=# VACUUM (VERBOSE, ANALYZE) pg_largeobject;
INFO:  vacuuming "pg_catalog.pg_largeobject"
INFO:  scanned index "pg_largeobject_loid_pn_index" to remove 1112630
row versions
DETAIL:  CPU 3.38s/9.89u sec elapsed 69.02 sec.
INFO:  "pg_largeobject": removed 1112630 row versions in 374889 pages
DETAIL:  CPU 7.48s/2.22u sec elapsed 150.44 sec.
INFO:  index "pg_largeobject_loid_pn_index" now contains 29373858 row
versions in 370470 pages
DETAIL:  1112630 index row versions were removed.
279489 index pages have been deleted, 276070 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO:  "pg_largeobject": found 1112622 removable, 231974 nonremovable
row versions in 3189820 out of 34522175 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 10318306 unused item pointers.
0 pages are entirely empty.
CPU 69.81s/28.83u sec elapsed 1402.53 sec.
INFO:  analyzing "pg_catalog.pg_largeobject"
INFO:  "pg_largeobject": scanned 3 of 34522175 pages, containing
25085 live rows and 0 dead rows; 25085 rows in sample, 24203398
estimated total rows
VACUUM

Enjoy life,
Adam

-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VACUUM FULL pg_largeobject without (much) downtime?

2015-02-03 Thread Adam Hooper
On Tue, Feb 3, 2015 at 12:58 PM, Bill Moran  wrote:
> On Tue, 3 Feb 2015 10:53:11 -0500
> Adam Hooper  wrote:
>
>> This plan won't work: Step 2 will be too slow because pg_largeobject
>> still takes 266GB. We tested `VACUUM FULL pg_largeobject` on our
>> staging database: it took two hours, during which pg_largeobject was
>> locked. When pg_largeobject is locked, lots of our website doesn't
>> work.
>
> Sometimes CLUSTER is faster than VACUUM FULL ... have you tested CLUSTERing
> of pg_largeobject on your test system to see if it's fast enough?

On the 30GB that's left on staging, it takes 50min. Unfortunately, our
staging database is now at 30GB because we already completed a VACUUM
FULL on it. It seems difficult to me to revert that operation. But I
need an orders-of-magnitude difference, and this clearly isn't it.

> How big is the non-lo data?

It's 65GB, but I've used pg_repack to move it to a separate tablespace
so it won't affect downtime.

Enjoy life,
Adam

-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VACUUM FULL pg_largeobject without (much) downtime?

2015-02-03 Thread Adam Hooper
On Tue, Feb 3, 2015 at 2:29 PM, Bill Moran  wrote:
> On Tue, 3 Feb 2015 14:17:03 -0500
> Adam Hooper  wrote:
>
> My recommendation here would be to use Slony to replicate the data to a
> new server, then switch to the new server once the data has synchornized.

Looks exciting. But then I notice: "Slony-I does not automatically
replicate changes to large objects (BLOBS)." [1]

Does that still apply?

It's doable for us to VACUUM FULL and add a notice to our website
saying, "you can't upload files for the next two hours." Maybe that's
a better idea?

Enjoy life,
adam

[1] http://slony.info/documentation/2.2/limitations.html

-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] VACUUM FULL pg_largeobject without (much) downtime?

2015-02-04 Thread Adam Hooper
On Tue, Feb 3, 2015 at 3:12 PM, Bill Moran  wrote:
> On Tue, 3 Feb 2015 14:48:17 -0500
> Adam Hooper  wrote:
>
>> It's doable for us to VACUUM FULL and add a notice to our website
>> saying, "you can't upload files for the next two hours." Maybe that's
>> a better idea?
>
> It's really going to depend on what options you have available. Keep
> in mind that users won't be able to read large objects either, so
> you'll need to disable whatever features of the site view the files
> as well. Whether that's easier or harder depends on how much work it
> would be to disable those features of the site.

For the record, this is what we went with.

To those looking to use large objects (who, um, are already searching
for how to VACUUM FULL pg_largeobject), my advice: if we were to do it
all over again, we'd have used a separate database per "bucket".

For instance, imagine you store uploaded files and processed data in
pg_largeobject. Then some day you migrate the processed data
elsewhere. If uploaded files went in one database and processed data
went into the second, then the uploaded-files database's
pg_largeobject table would remain slim, and you could simply DROP
DATABASE on the other after all clients stopped using it. There
wouldn't be any downtime.

My take-away, though, is to avoid the pg_largeobject table whenever
possible. You can move BYTEA data with zero downtime using pg_repack,
but the same can't be said for large objects.

Enjoy life,
Adam


-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Row-level Security vs Application-level authz

2015-02-24 Thread Adam Hooper
On Tue, Feb 24, 2015 at 8:37 PM, Stephen Frost  wrote:
> * David Steele (da...@pgmasters.net) wrote:
>> So I guess my last question is if you are inserting rows into a table to
>> track user connections, how do you clean them out when the client does
>> not disconnect cleanly?  Or is this table intended to be append-only?
>
> It wouldn't be intended to be append-only but I agree that, ideally,
> there'd be a way to address clients disconnect uncleanly.

This is starting to sound like a web app, which I have experience
with. The cardinal rule: assume everybody disconnects randomly, and
code accordingly :).

The goal here isn't to make the session table reflect the number of
users who are currently logged in. Rather, it's to ensure the session
table doesn't grow infinitely.

* You can set a restriction like, "a user can only be logged in once".
During login, delete other sessions associate with that user. The
session table's maximum size is the size of the user table.

* You can use an expiry-check function. Best is a definitive "this
session is disconnected"; if you can't do that, you can try a rule
such as "user cannot be logged in more than 20 times and sessions
older than two weeks are invalid". During login, run the expiry
checker on that user and delete expired rows. With the right function,
you can constrain the session table to a reasonable size.

* You can simply limit the size of the session table. If your limit is
100 and a user starts a 101st session, delete the first session.

The world of websites involves lots of users and loads of short-lived
sessions. A website doesn't check whether the user has access to a
row: it checks whether the user has access to an endpoint with the
given parameters. Postgres RLS seems like a bad approach for that use
case.

Enjoy life,
Adam

-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Hex characters in COPY input

2015-02-27 Thread Adam Hooper
On Fri, Feb 27, 2015 at 9:39 AM, Melvin Call  wrote:
> On 2/26/15, Vick Khera  wrote:
>> On Thu, Feb 26, 2015 at 9:50 PM, Melvin Call 
>> wrote:
>>
>>> I get an error "ERROR:  invalid byte sequence for
>>> encoding "UTF8": 0xe9616c"
>>
>> Character code E9 is not UTF8. Don't tell Postgres you're importing UTF8 if
>> you're not.
>
> Regardless, can you point me to some reading that would have clued
> me in that e9 is not a UTF8 character? Or is the clue the fact that it was not
> preceeded with 0x00?

The error message Postgres gave you is a pretty darned good clue :).

But your question has an error, and it's worth expounding a bit. 0xe9
is not a UTF8 character because 0xe9 is not a _character_. 0xe9 is a
_byte_. Characters are not bytes. Characters can be _encoded_ into
bytes, and that's not the same thing.

UTF-8 encoding is a bit confusing: any byte in the range [0x00-0x7f]
represents the same character as in ASCII encoding (an extremely
popular encoding). Any byte in the range [0x80-0xff] is the "leading
byte" in a sequence of bytes that represents a single character.
"Continuation" bytes are in the range [0x80-0xbf]. (Why not the range
[0x00-0xff]? Because UTF-8 was designed to generate errors when fed
non-UTF8 byte sequences.) The first four bits of the leading byte
describe how many continuation bytes there are. If you care to read up
on the how and why of UTF-8 (a red herring in this discussion), try:
See http://en.wikipedia.org/wiki/UTF-8

Back to 0xe9. 0xe9 is '1110 1001' in binary. Postgres' UTF-8 decoder
sees that initial '1110' and determines that it needs to inspect three
bytes to read one character. The second byte is 0x61, which is not in
the range [0x80-0xbf], so Postgres reports an invalid byte sequence.
Hooray: it produces exactly the error message it should.

You don't need to concern yourself with the complications of UTF-8.
You only need to know that bytes are not characters; if you don't know
the encoding of a sequence of bytes, you've made a logic error.
Postgres told you the error, though it didn't hint at how to fix it.
(Text editors and web browsers use heuristics to guess file encodings,
and they're usually right, though it's impossible to create a perfect
heuristic. See 
http://stackoverflow.com/questions/4198804/how-to-reliably-guess-the-encoding-between-macroman-cp1252-latin1-utf-8-and
for further discussion there.)

If you're looking for take-away lessons, the main one is: "read the
error message" :).

Next time you have the "wrong encoding" problem, you have two options:
1) figure out the encoding and tell Postgres; or 2) regenerate the
file in the correct encoding (UTF-8). The "why" is here:
http://www.joelonsoftware.com/articles/Unicode.html

We on this list jumped strait to option 1. We've memorized 0xe9 in
particular, because we've been through your pain before. In the
Americas and Western Europe, if a file contains the byte 0xe9 it
probably contains the character "é" encoded as
windows-1252/ISO-8859-1/ISO-8859-15. That's very common. MySQL in
particular is a ghastly Internet denizen, in that it defaults to
ISO-8859-15 in an apparent crusade against globalization and modern
standards.

Enjoy life,
Adam

-- 
Adam Hooper
+1-613-986-3339
http://adamhooper.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general