Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

Martin Kováčik Thu, 25 Apr 2019 21:45:24 -0700

I’ve got it now, thank you for the clarification. You are right.

Martin Kováčik
+421904236791


> On 26 Apr 2019, at 00:25, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> 
>> On 4/25/19 1:28 PM, Martin Kováčik wrote:
>> I'm not sure if I understand what you mean. My initial thought was that 
>> stats are fixed per transaction, i.e. analyze from comitted transaction 
>> doesn't interfere with another running transaction. Maybe I was confused by 
>> this because analyze can be run inside a transaction, so my assumption was 
>> it is isolated like other statements.
> 
> To make it clearer I would take a look at:
> 
> https://www.postgresql.org/docs/11/sql-analyze.html
> https://www.postgresql.org/docs/11/catalog-pg-statistic.html
> 
> 
> The gist is currently there is one entry(with exception noted in second link) 
> per column in each table. This is the most recent information available 
> subject to the conditions Tom pointed out in his post. To make it MVCC aware 
> like you suggest would mean tracking the state of all transactions currently 
> open and recording that information, so each transaction could find its 
> stats. This means the planning problem could then move to the stats table as 
> it would need to be analyzed itself to work efficiently. This would probably 
> also need to be MVCC aware to be relevant, which would add to the overhead. I 
> could see this turn into a 'hall of mirrors' problem quickly.
> 
>> *Martin Kováčik*
>> /CEO/
>> *redByte*, s.r.o.
>> +421 904 236 791
>> kova...@redbyte.eu <mailto:kova...@redbyte.eu>, www.redbyte.eu 
>> <http://redbyte.eu>
>> On Thu, Apr 25, 2019 at 9:58 PM Adrian Klaver <adrian.kla...@aklaver.com 
>> <mailto:adrian.kla...@aklaver.com>> wrote:
>>    On 4/25/19 12:47 PM, Martin Kováčik wrote:
>>     > As my example shows you don't have to import a lot of rows - 1000 is
>>     > enough to make a difference - it all depends on the query. When a
>>     > cartesian product is involved only a few records is enough.
>>     > I think that stats should be MVCC versioned otherwise the planner is
>>     > using wrong statistics and chooses wrong plans.
>>    Then you are looking at moving the choke point to looking up the
>>    correct
>>    stats across possibly hundreds/thousands of transactions in flight.
>>     > *Martin Kováčik*
>>     > /CEO/
>>     > *redByte*, s.r.o.
>>     > +421 904 236 791
>>     > kova...@redbyte.eu <mailto:kova...@redbyte.eu>
>>    <mailto:kova...@redbyte.eu <mailto:kova...@redbyte.eu>>,
>>    www.redbyte.eu <http://www.redbyte.eu>
>>     > <http://redbyte.eu>
>>     >
>>     >
>>     > On Thu, Apr 25, 2019 at 9:28 PM Michael Lewis <mle...@entrata.com
>>    <mailto:mle...@entrata.com>
>>     > <mailto:mle...@entrata.com <mailto:mle...@entrata.com>>> wrote:
>>     >
>>     >
>>     >
>>     >     On Thu, Apr 25, 2019, 11:34 AM Martin Kováčik
>>    <kova...@redbyte.eu <mailto:kova...@redbyte.eu>
>>     >     <mailto:kova...@redbyte.eu <mailto:kova...@redbyte.eu>>> wrote:
>>     >
>>     >         Turning off autovacuum for the tests is a valid option and I
>>     >         will definitely do this as a workaround. Each test pretty
>>    much
>>     >         starts with empty schema and data for it is generated
>>    during the
>>     >         run and rolled back at the end. I have a lot of tests and
>>    at the
>>     >         moment it is not feasible to modify them.
>>     >
>>     >         The real workload for the application is different, but there
>>     >         are some cases, when we import data from remote web
>>    service in a
>>     >         transaction do some work with it and then we do a commit. If
>>     >         there is an autovacuum during this process I assume there
>>    will
>>     >         be similar problem regarding planner statistics.
>>     >
>>     >
>>     >     Unless you are importing a huge amount of data relative to
>>    what is
>>     >     already there, it seems likely to be significantly less impactful
>>     >     than adding data to a completely empty table. The stats on a
>>    table
>>     >     with 0 rows and then 5000 rows is going to be night and day,
>>    while
>>     >     the difference between stats on 100,000 rows and 105,000 is
>>    not as
>>     >     impactful. Musing here. I expect others will chime in.
>>     >
>>     >     Stats are not versioned with MVCC so it would expected that a
>>    commit
>>     >     in another transaction that is updating stats would influence the
>>     >     query plan for another transaction that is active.
>>     >
>>    --     Adrian Klaver
>>    adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com

Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction

Reply via email to