Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
Hi,

I have two applications A and B. A runs SELECT statements only and only on
tableA. B actively updates tableB, A never looks into tableB. B has nothing
to do with tableA.

Still, if A is inside a long running transaction, autovacuum does not
handle tableB. Why is it so?

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Pavel Stehule
Hi

út 8. 9. 2020 v 9:32 odesílatel Michael Holzman 
napsal:

> Hi,
>
> I have two applications A and B. A runs SELECT statements only and only on
> tableA. B actively updates tableB, A never looks into tableB. B has nothing
> to do with tableA.
>
> Still, if A is inside a long running transaction, autovacuum does not
> handle tableB. Why is it so?
>

autovacuum does cleaning of changes related to finished transactions. It
does nothing if possible dead tuples are assigned to open transactions.

Regards

Pavel


>
> --
> Regards,
> Michael Holzman
>


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 10:46 AM Pavel Stehule wrote:

>
> autovacuum does cleaning of changes related to finished transactions. It
> does nothing if possible dead tuples are assigned to open transactions.
>

This is the point.
Autovacuum does not clean dead tuples of closed transactions in tableB
while there is an open transaction on tableA.
But the tables have nothing in common. They are handled by separate
applications and there are no transactions that touch both tables
simultaneously.
Why does autovacuum create an artificial dependency on the tables?

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Paquier
On Tue, Sep 08, 2020 at 11:16:04AM +0300, Michael Holzman wrote:
> Autovacuum does not clean dead tuples of closed transactions in tableB
> while there is an open transaction on tableA.
> But the tables have nothing in common. They are handled by separate
> applications and there are no transactions that touch both tables
> simultaneously.
> Why does autovacuum create an artificial dependency on the tables?

This is called MVCC, which applies to a session as a whole.  The point
here is that even if your application knows that only tableA is used
by a given transaction, Postgres cannot know that, as it could be
possible that data from tableB is needed in this same transaction, so
old versions of the rows from tableB matching with the snapshot hold
by this long-running transaction still have to be around.
--
Michael


signature.asc
Description: PGP signature


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 11:28 AM Michael Paquier wrote:

>
> This is called MVCC, which applies to a session as a whole.  The point
> here is that even if your application knows that only tableA is used
> by a given transaction, Postgres cannot know that, as it could be
> possible that data from tableB is needed in this same transaction, so
> old versions of the rows from tableB matching with the snapshot hold
> by this long-running transaction still have to be around.
>
>  Yes, I thought so. I just hoped there may be a workaround decoupling the
tables.
Thanks.

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Pavel Stehule
út 8. 9. 2020 v 10:42 odesílatel Michael Holzman 
napsal:

>
>
> On Tue, Sep 8, 2020 at 11:28 AM Michael Paquier wrote:
>
>>
>> This is called MVCC, which applies to a session as a whole.  The point
>> here is that even if your application knows that only tableA is used
>> by a given transaction, Postgres cannot know that, as it could be
>> possible that data from tableB is needed in this same transaction, so
>> old versions of the rows from tableB matching with the snapshot hold
>> by this long-running transaction still have to be around.
>>
>>  Yes, I thought so. I just hoped there may be a workaround decoupling the
> tables.
> Thanks.
>

You can try to reduce length of transactions, if possible.

Regards

Pavel




>
> --
> Regards,
> Michael Holzman
>


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 12:28 PM Pavel Stehule wrote:

>
> You can try to reduce length of transactions, if possible.
>
>
> This is the problem. A and B were developed for Oracle where SELECT does
not open a transaction. We moved them to PG and now we have to very
accurately add COMMITs without breaking the flow. It is quite a complex
thing. I hoped we can avoid that.

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Ravi Krishna
This is the problem. A and B were developed for Oracle where SELECT does not 
open a transaction. We moved them to PG 
and now we have to very accurately add COMMITs without breaking the flow. It is 
quite a complex thing. I hoped we can 
avoid that.

 

Interesting. Are you telling the Oracle version of the code had no intermittent 
COMMIT and relied on one
final COMMIT at the end. Even in Oracle developers must have planned for commit 
since a long running 
open transaction can lead to “snapshot too old” error.



Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 1:52 PM Ravi Krishna wrote:

>
>
> Interesting. Are you telling the Oracle version of the code had no
> intermittent COMMIT and relied on one
> final COMMIT at the end. Even in Oracle developers must have planned for
> commit since a long running
> open transaction can lead to “snapshot too old” error.
>
Yes, I am saying just that. With one important clarification: there were no
transactions as SELECT does not open them and the application does not
change anything on that connection.
So, no 'snapshot too old' and no COMMITs.

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Ravi Krishna
>>Interesting. Are you telling the Oracle version of the code had no 

>>intermittent COMMIT and relied on one final COMMIT at the end. Even 

>>in Oracle developers must have planned for commit since a long running 

>>open transaction can lead to “snapshot too old” error.

>Yes, I am saying just that. With one important clarification: there were 

>no transactions as SELECT does not open them and the application does not 

>change anything on that connection. So, no 'snapshot too old' and no COMMITs. 

 

It's been a while since I worked with Oracle as a developer.  But my 
understanding

is that even a read-only transaction, like the one you described above, requires

a point in time consistent image of the database. This would imply that if your

transaction runs for a long time and meanwhile other DML sessions change lot of 
blocks,

resulting in undo tablespace getting totally turned over, then Oracle can no 
longer 

gurantee PIT consistent view of the database to your session and barf out with 

snapshot-too-old error.

 

I have no way of confirming this and I am writing this based on my limited 
experience

with oracle.  So I may be wrong.

 



Re: Autovacuum of independent tables

2020-09-08 Thread Ravi Krishna
This is assuming other sessions change the same block your session is trying to 
read.

=== 

It's been a while since I worked with Oracle as a developer.  But my 
understanding

is that even a read-only transaction, like the one you described above, requires

a point in time consistent image of the database. This would imply that if your

transaction runs for a long time and meanwhile other DML sessions change lot of 
blocks,

resulting in undo tablespace getting totally turned over, then Oracle can no 
longer 

gurantee PIT consistent view of the database to your session and barf out with 

snapshot-too-old error.

 

I have no way of confirming this and I am writing this based on my limited 
experience

with oracle.  So I may be wrong.


 



Re: Autovacuum of independent tables

2020-09-08 Thread Magnus Hagander
(Please don't drop the mailinglist from CC, as others are likely interested
in the responses)

On Tue, Sep 8, 2020 at 3:06 PM Michael Holzman 
wrote:

>
>
> On Tue, Sep 8, 2020 at 3:03 PM Magnus Hagander wrote:
>
>> A PostgreSQL SELECT does *not* open a transaction past the end of the
>> statement, if it's run independently on a connection.
>>
> This sounds like you are using a client on PostgreSQL that uses an
>> "autocommit off" mode, since that's the only case where you'd need to add
>> COMMITs (or ROLLBACKs) to close a transaction after a SELECT.
>>
>
> Yes, this is correct. We do not use autocommit. Everything is controlled
> explicitly. We run quite complex multi-statement multi-table transactions
> and cannot work with "autocommit on".
>

That is not what autocommit means.

Whether you have autocommit on or off, you can *always* control things
explicitly. And you can certainly run "multi-statement transactions" in
autocommit on -- in fact, it's what most people do since it's the default
configuration of the system (and I don't see why multi-table would even be
relevant).

Autocommit on/off only controls what happens when you *don't* control
things explicitly.



> Therefore, this is what we have
>
> > psql
> psql (11.2)
> Type "help" for help.
>
> pg-11.2 rw => COMMIT;
> WARNING:  25P01: there is no transaction in progress
> LOCATION:  EndTransactionBlock, xact.c:3675
> COMMIT
> Time: 0.745 ms
> pg-11.2 rw => select 2*2;
>  ?column?
> --
> 4
> (1 row)
>
> Time: 0.347 ms
> pg-11.2 rw => COMMIT;
> COMMIT
> Time: 0.525 ms
>
> The first COMMIT (immediately after connect) fails as there is no
> transaction.
> The second one works as even this SELECT opened one. We have a transaction
> (and a snapshot) when no table is touched!
>

So just to be clear,  here is how PostgreSQL behaves by default:

postgres=# commit;
WARNING:  there is no transaction in progress
COMMIT
postgres=# select 2*2;
 ?column?
--
4
(1 row)

postgres=# commit;
WARNING:  there is no transaction in progress
COMMIT



But yes, if you explicitly ask that a query shall keep a transaction open
across multiple statements, by turning off autocommit, it will.

In fact, *PostgreSQL* will always behave that way. The *psql client* will
behave differently depending on how you configure it, and the same will of
course apply to any other client that you have. In the example above, psql.


You cannot both have a transaction existing and not existing at the same
time. You do have to separate the idea of transactions from snapshots
though, as they can differ quite a bit depending on isolation levels.


> And how much a running transaction blocks autovacuum is also dependent on
> what isolation level you're running it in. In the default isolation level,
> a snapshot is taken for each individual select, so does not block vacuuming
> past the end of the individual select. Higher isolation levels will.
>
> > We use default isolation mode and we proved that SELECTs block
autovacuum. As soon as we added COMMITs after SELECTS in several places
(not all as we still have not fixed all the code), autovacuum started
working properly in the fixed flows.


As I said yes, a running SELECT will, because of the snapshot. An open
transaction will not, past the individual select, because a new snapshot is
taken for each SELECT.

If you have an open transaction that runs regular selects but as separate
queries then it will not block autovacuum, unless it also does something
else.

Of course if it's a big query that runs the whole time it will, but then
there would also not be a way to "add commits" into the middle of it, so
clearly that's not what's going on here.

//Magnus


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 4:25 PM Magnus Hagander wrote:

>
>
> Whether you have autocommit on or off, you can *always* control things
> explicitly. And you can certainly run "multi-statement transactions" in
> autocommit on -- in fact, it's what most people do since it's the default
> configuration of the system (and I don't see why multi-table would even be
> relevant).
>
> Autocommit on/off only controls what happens when you *don't* control
> things explicitly.
>
I know that we can control things explicitly with "autocommit on". But we
would need to add "BEGIN" statements to the code which is an even bigger
change than adding COMMITs. We considered it and found that the development
cost is too high.

It seems I was not clear enough. I do not complain. I have been a PG fan
since 2000 when I worked with it for the first time. I just wanted to
understand it deeper and, fortunately, find a work around that would
simplify our current development.

Thanks to all.

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Magnus Hagander
On Tue, Sep 8, 2020 at 4:01 PM Michael Holzman 
wrote:

>
>
> On Tue, Sep 8, 2020 at 4:25 PM Magnus Hagander wrote:
>
>>
>>
>> Whether you have autocommit on or off, you can *always* control things
>> explicitly. And you can certainly run "multi-statement transactions" in
>> autocommit on -- in fact, it's what most people do since it's the default
>> configuration of the system (and I don't see why multi-table would even be
>> relevant).
>>
>> Autocommit on/off only controls what happens when you *don't* control
>> things explicitly.
>>
> I know that we can control things explicitly with "autocommit on". But we
> would need to add "BEGIN" statements to the code which is an even bigger
> change than adding COMMITs. We considered it and found that the development
> cost is too high.
>
> It seems I was not clear enough. I do not complain. I have been a PG fan
> since 2000 when I worked with it for the first time. I just wanted to
> understand it deeper and, fortunately, find a work around that would
> simplify our current development.
>
>
Oh sure, but there is clearly *something* going on, so we should try to
figure that out. Because a transaction running multiple independent selects
with the defaults settings will not actually block autovacuum. So clearly
there is something else going on -- something else must be non-default, or
it's something that the driver layer does.

To show that, something as simple as the following, with autovacuum logging
enabled:

session 1:
CREATE TABLE test AS SELECT * FROM generate_series(1,1);

session 2:
begin;
SELECT count(*) FROM test;
\watch 1

session 1:
delete from test;


In this case, you will see autovacuum firing just fine, even though there
is an open transaction that queries the table test. As you're running you
can use a third session to see that session 2 flips between "active" and
"idle in transaction". The log output in my case was:

2020-09-08 16:13:12.271 CEST [26753] LOG:  automatic vacuum of table
"postgres.public.test": index scans: 0
pages: 0 removed, 45 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 56 removed, 0 remain, 0 are dead but not yet removable, oldest
xmin: 241585
buffer usage: 112 hits, 4 misses, 5 dirtied
avg read rate: 0.006 MB/s, avg write rate: 0.008 MB/s
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 5.01 s

It is failing to *truncate* the table, but the general autovacuum is
running.

Are you by any chance specifically referring to the truncation step?

However, if you change the session 2 to select from a *different* table,
the truncation also works, so I'm guessing that's not it?

//Magnus


Re: Autovacuum of independent tables

2020-09-08 Thread Tom Lane
Magnus Hagander  writes:
> Oh sure, but there is clearly *something* going on, so we should try to
> figure that out. Because a transaction running multiple independent selects
> with the defaults settings will not actually block autovacuum.

I don't think the OP is claiming that autovacuum is blocked, only that
it's failing to remove recently-dead rows that he thinks could be removed.

The reason that's not so is that whether or not transaction A *has*
touched table B is irrelevant.  It *could* read table B at any moment,
for all autovacuum knows.  Therefore we cannot remove rows that should
still be visible to A's snapshot.

There are some approximations involved in figuring out which rows are
potentially still visible to someone.  So perhaps this is a situation
where an approximation is being used and tighter analysis would have
shown that indeed a row could be removed.  But we haven't seen any
evidence of that so far.  The basic fact that A's snapshot is limiting
removal of rows from a table it has not touched is not a bug.

regards, tom lane




Re: Autovacuum of independent tables

2020-09-08 Thread Ron




On 9/8/20 3:27 AM, Michael Paquier wrote:

On Tue, Sep 08, 2020 at 11:16:04AM +0300, Michael Holzman wrote:

Autovacuum does not clean dead tuples of closed transactions in tableB
while there is an open transaction on tableA.
But the tables have nothing in common. They are handled by separate
applications and there are no transactions that touch both tables
simultaneously.
Why does autovacuum create an artificial dependency on the tables?

This is called MVCC, which applies to a session as a whole.  The point
here is that even if your application knows that only tableA is used
by a given transaction, Postgres cannot know that, as it could be
possible that data from tableB is needed in this same transaction, so
old versions of the rows from tableB matching with the snapshot hold
by this long-running transaction still have to be around.


Too bad the START TRANSACTION statement doesn't have a RESERVING clause 
where you can enumerate the tables you'll be using.


--
Angular momentum makes the world go 'round.




Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 5:38 PM Tom Lane wrote:

> Magnus Haganderwrites:
> > Oh sure, but there is clearly *something* going on, so we should try to
> > figure that out. Because a transaction running multiple independent
> selects
> > with the defaults settings will not actually block autovacuum.
>
> I don't think the OP is claiming that autovacuum is blocked, only that
> it's failing to remove recently-dead rows that he thinks could be removed.
>
Yes, this is exactly what happens.

The reason that's not so is that whether or not transaction A *has*
> touched table B is irrelevant.  It *could* read table B at any moment,
> for all autovacuum knows.  Therefore we cannot remove rows that should
> still be visible to A's snapshot.
>
> There are some approximations involved in figuring out which rows are
> potentially still visible to someone.  So perhaps this is a situation
> where an approximation is being used and tighter analysis would have
> shown that indeed a row could be removed.  But we haven't seen any
> evidence of that so far.  The basic fact that A's snapshot is limiting
> removal of rows from a table it has not touched is not a bug.
>
It's obviously not a bug. I was just surprised when I figured that out.
It's also quite complex to explain to my colleagues. Actually, this is the
main reason I started this thread: I tried to explain to someone and felt
that I miss something.


-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Magnus Hagander
On Tue, Sep 8, 2020 at 4:38 PM Tom Lane  wrote:

> Magnus Hagander  writes:
> > Oh sure, but there is clearly *something* going on, so we should try to
> > figure that out. Because a transaction running multiple independent
> selects
> > with the defaults settings will not actually block autovacuum.
>
> I don't think the OP is claiming that autovacuum is blocked, only that
> it's failing to remove recently-dead rows that he thinks could be removed.
>
> The reason that's not so is that whether or not transaction A *has*
> touched table B is irrelevant.  It *could* read table B at any moment,
> for all autovacuum knows.  Therefore we cannot remove rows that should
> still be visible to A's snapshot.
>
> There are some approximations involved in figuring out which rows are
> potentially still visible to someone.  So perhaps this is a situation
> where an approximation is being used and tighter analysis would have
> shown that indeed a row could be removed.  But we haven't seen any
> evidence of that so far.  The basic fact that A's snapshot is limiting
> removal of rows from a table it has not touched is not a bug.
>

Right. But in the default isolation level, the snapshot of A gets reset
between each SELECT, and does not persist to the end of the transaction. So
adding COMMIT between each select shouldn't change that part, should it?
That is, it's the snapshot age that decides it, not the transaction age.

I feel there is still some piece of information missing there, that could
explain the problem better...

//Magnus


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 5:47 PM Magnus Hagander  wrote:

>
> I feel there is still some piece of information missing there, that could
> explain the problem better...
>
I gave all the information I have (without real application and table
names, of course).
Both applications are C++ demons working with PG via ODBC on RHEL. We use
default ODBC settings.

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, Sep 8, 2020 at 4:38 PM Tom Lane  wrote:
>> The reason that's not so is that whether or not transaction A *has*
>> touched table B is irrelevant.  It *could* read table B at any moment,
>> for all autovacuum knows.  Therefore we cannot remove rows that should
>> still be visible to A's snapshot.

> Right. But in the default isolation level, the snapshot of A gets reset
> between each SELECT, and does not persist to the end of the transaction.

Well, we don't know what isolation level the OP is using.  We also don't
know what PG version he's using.  From memory, it hasn't been that long
since we fixed things so that an idle read-committed transaction
advertises no xmin.  It's also possible that the transaction isn't really
idle between statements (eg, if it's holding open cursors, or the like).

regards, tom lane




Re: Autovacuum of independent tables

2020-09-08 Thread Magnus Hagander
On Tue, Sep 8, 2020 at 5:16 PM Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Tue, Sep 8, 2020 at 4:38 PM Tom Lane  wrote:
> >> The reason that's not so is that whether or not transaction A *has*
> >> touched table B is irrelevant.  It *could* read table B at any moment,
> >> for all autovacuum knows.  Therefore we cannot remove rows that should
> >> still be visible to A's snapshot.
>
> > Right. But in the default isolation level, the snapshot of A gets reset
> > between each SELECT, and does not persist to the end of the transaction.
>
> Well, we don't know what isolation level the OP is using.  We also don't
>

Per the thread, he's using the default, which should be read committed.



> know what PG version he's using.  From memory, it hasn't been that long
>

Per his session list, 11.2.



> since we fixed things so that an idle read-committed transaction
> advertises no xmin.  It's also possible that the transaction isn't really
> idle between statements (eg, if it's holding open cursors, or the like).
>

Oh, now *cursors* is definitely something I didn't think of. And especially
in the context of ODBC, I wonder if it might be creating cursors
transparently, and that this somehow causes the problems.

Michael, do you know if that might be the case? Or try enabling
log_statements to check if it is?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 6:15 PM Tom Lane  wrote:

> Well, we don't know what isolation level the OP is using.

We use the default. We do not explicitly set the isolation level.

We also don't know what PG version he's using.

it's 12.1

>From memory, it hasn't been that long
> since we fixed things so that an idle read-committed transaction
> advertises no xmin.  It's also possible that the transaction isn't really
> idle between statements (eg, if it's holding open cursors, or the like).
>
There are no open cursors.

-- 
Regards,
Michael Holzman


Re: Autovacuum of independent tables

2020-09-08 Thread Michael Holzman
On Tue, Sep 8, 2020 at 6:33 PM Magnus Hagander  wrote:

>
> Per his session list, 11.2.
>
11.2 is on my personal testing machine. The real tests run with 12.1

Oh, now *cursors* is definitely something I didn't think of. And especially
> in the context of ODBC, I wonder if it might be creating cursors
> transparently, and that this somehow causes the problems.
>
> Michael, do you know if that might be the case? Or try enabling
> log_statements to check if it is?
>
This is very interesting. I'll try to check it. I'll post here a reply if
we do have open cursors. Thanks.

-- 
Regards,
Michael Holzman