Measuring the Query Optimizer Effect: Turning off the QO?
Hi All As we know, a query goes through number of stages before it is executed. One of these stages is query optimization (QO). There are various parameters to try and influence optimizer decisions and costs. But I wanted to measure the effect of such a stage by turning it off completely and I can't find such a parameter which explicitly does that. Then I could execute a query to get the effect of "QO active and "QO inactive" and compare. Obviously, I know well what the results would generally look like but I am just interested in measuring the differences for various types of queries. I am also aware that this is a simple comparison - there are more interesting comparisons to perform with QO tweaks, but right now I am interested in something basic. So how would one shut down QO? Or at least, obtaining the guarantee of generating the worst plan possible, ideally without touching many parameters? Best, Tom
Re: Measuring the Query Optimizer Effect: Turning off the QO?
On 08/07/2019 01:46, Rob Sargent wrote: > > >> On Jul 7, 2019, at 5:22 PM, Tom Mercha wrote: >> >> Hi All >> >> As we know, a query goes through number of stages before it is executed. >> One of these stages is query optimization (QO). >> >> There are various parameters to try and influence optimizer decisions >> and costs. But I wanted to measure the effect of such a stage by turning >> it off completely and I can't find such a parameter which explicitly >> does that. Then I could execute a query to get the effect of "QO active >> and "QO inactive" and compare. >> >> Obviously, I know well what the results would generally look like but I >> am just interested in measuring the differences for various types of >> queries. I am also aware that this is a simple comparison - there are >> more interesting comparisons to perform with QO tweaks, but right now I >> am interested in something basic. >> >> So how would one shut down QO? Or at least, obtaining the guarantee of >> generating the worst plan possible, ideally without touching many >> parameters? >> >> Best, >> Tom > > Drop all indices? > Sorry, maybe my question wasn't clear enough. A query can be rewritten in various ways by applying rules and costs of relational algebra operators, as well as their parallelisation. I am talking about turning off this query optimization, so I am already assuming that indexes aren't present.
Re: Measuring the Query Optimizer Effect: Turning off the QO?
On 08/07/2019 02:04, Rob Sargent wrote: > > >> On Jul 7, 2019, at 6:02 PM, Rob Sargent wrote: >> >> >> >>> On Jul 7, 2019, at 6:01 PM, Rob Sargent >> <mailto:robjsarg...@gmail.com>> wrote: >>> >>> >>> >>>> On Jul 7, 2019, at 5:49 PM, Tom Mercha >>> <mailto:merch...@hotmail.com>> wrote: >>>> >>>> On 08/07/2019 01:46, Rob Sargent wrote: >>>>> >>>>> >>>>>> On Jul 7, 2019, at 5:22 PM, Tom Mercha >>>>> <mailto:merch...@hotmail.com>> wrote: >>>>>> >>>>>> Hi All >>>>>> >>>>>> As we know, a query goes through number of stages before it is executed. >>>>>> One of these stages is query optimization (QO). >>>>>> >>>>>> There are various parameters to try and influence optimizer decisions >>>>>> and costs. But I wanted to measure the effect of such a stage by turning >>>>>> it off completely and I can't find such a parameter which explicitly >>>>>> does that. Then I could execute a query to get the effect of "QO active >>>>>> and "QO inactive" and compare. >>>>>> >>>>>> Obviously, I know well what the results would generally look like but I >>>>>> am just interested in measuring the differences for various types of >>>>>> queries. I am also aware that this is a simple comparison - there are >>>>>> https://gitlab.com/camplab/jpsgcs <https://gitlab.com/camplab/jpsgcs> >>>>>> interesting comparisons to perform with QO tweaks, but right now I >>>>>> am interested in something basic. >>>>>> >>>>>> So how would one shut down QO? Or at least, obtaining the guarantee of >>>>>> generating the worst plan possible, ideally without touching many >>>>>> parameters? >>>>>> >>>>>> Best, >>>>>> Tom >>>>> >>>>> Drop all indices? >>>>> >>>> >>>> Sorry, maybe my question wasn't clear enough. >>>> >>>> A query can be rewritten in various ways by applying rules and costs of >>>> relational algebra operators, as well as their parallelisation. I am >>>> talking about turning off this query optimization, so I am already >>>> assuming that indexes aren't present. >>> >>> Have you played with any of these settings? >>> >>> postgres=# select version(); >>> version >>> - >>> PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 >>> 20150623 (Red Hat 4.8.5-36), 64-bit >>> (1 row) >>> >>> postgres=# select name, setting, unit,short_desc from pg_settings where >>> name ~ 'para'; >>>name | setting | unit | >>> short_desc >>> -+-+--+ >>> force_parallel_mode | off | | Forces use of parallel >>> query facilities. >>> max_parallel_workers| 16 | | Sets the maximum >>> number of parallel workers that can be active at one time. >>> max_parallel_workers_per_gather | 8 | | Sets the maximum >>> number of parallel processes per executor node. >>> min_parallel_index_scan_size| 64 | 8kB | Sets the minimum >>> amount of index data for a parallel scan. >>> min_parallel_table_scan_size| 1024| 8kB | Sets the minimum >>> amount of table data for a parallel scan. >>> parallel_setup_cost | 1000| | Sets the planner's >>> estimate of the cost of starting up worker processes for parallel query. >>> parallel_tuple_cost | 0.1 | | Sets the planner's >>> estimate of the cost of passing each tuple (row) from worker to master >>> backend. >>> ssl_dh_params_file | | | Location of the SSL DH >>> parameters file. >>> (8 rows) >>> >> Well not the last one of course. > > Better yet, “where category ~* ‘planner’" > Hi Yes, I've taken a look. I'm just interested in turning off the whole module and there is no parameter to do just that - an on and off switch - from what I can understand. What I can do instead is to go over the entire list of parameters and configure them each to generate a bad plan, but I'm not sure in general how to make a configuration which gives me the worst plan possible under all situations. I was hoping that someone has the on/off switch I'm looking for or a parameter template which has been used for the same purpose or something along these lines... This document describes the module I'm talking about: https://www.postgresql.org/docs/current/planner-optimizer.html
Re: Measuring the Query Optimizer Effect: Turning off the QO?
On 08/07/2019 18:29, Tom Lane wrote: > Andrew Gierth writes: >> "Tom" == Tom Lane writes: >> Tom> Two I'd particularly draw your attention to are >> Tom> join_collapse_limit and from_collapse_limit --- if you set both to >> Tom> 1, that'll effectively disable searching for a good join order, >> Tom> causing the join order to match the syntactic structure of the >> Tom> FROM clause. For instance "FROM a,b,c" will always be done by >> Tom> joining a to b first > >> FROM a,b,c can always be planned in any join order. > > Ah, right, -ENOCAFFEINE. What from_collapse_limit really does is > prevent flattening sub-SELECTs when doing so would add more join-order > freedom in the parent query. But ignoring sub-SELECTs, using explicit > JOIN syntax with join_collapse_limit=1 will let you control the > join order. > > regards, tom lane > Thanks for the clarification. I really appreciate taking the time to provide such in depth info about my query!
Re: Measuring the Query Optimizer Effect: Turning off the QO?
On 08/07/2019 16:23, Andrew Gierth wrote: >>>>>> "Tom" == Tom Mercha writes: > > Tom> Hi All > > Tom> As we know, a query goes through number of stages before it is > Tom> executed. One of these stages is query optimization (QO). > > That's not really true at all. One of the stages is query _planning_, > which takes the (rewritten) query as input and converts it to something > that the executor can take action on. There isn't actually any separate > "optimization" phase. > Hi I was just loosely speaking - I am merely referring to the concept that a query can be optimized vs unoptimized. Of course it follows that we have a phase or a subset of stages which are responsible for this purpose.
Re: Is is safe to use SPI in multiple threads?
On 23/12/2016 13:41, Peter J. Holzer wrote: > On 2016-12-09 16:52:05 +0800, Qiu Xiafei wrote: >> I'm new to PG and want to implement my domain-specific system based on PG. I >> wish to arrange my data as several tables in database and translate my DSL >> into >> SQL statements for query. Since one DSL statement may be mapped to several >> SQL >> statements, it's better to push the DSL server as close to the PG server as >> possible. I found PG's backgroud worker meet my needs. I can setup a >> background >> worker bounded to PG server and listen to a port for network requests. >> >> But I encounter a problem that the Server Programing Interfaces are not >> THREAD >> SAFE. There are some global variables defined like: SPI_processed, >> SPI_tuptable, etc. This limit to my DSL server to work in single thread mode >> which is quite inefficient. > > I had a similar requirement. I solved it by moving the application logic > out of the stored procedures. All the stored procedure does is an RPC > call (I use ØMQ for that) to a server process and send the result back > to the client. The server process converts the request into multiple SQL > queries which can be processed in parallel. > > The downside is of course that the communication overhead is much > higher (A minimum of 4 network messages per request). That's not a > problem in my case, but you mileage may vary. > > The advantages in my opinion are: > > * A standalone server process is easier to test and debug than a bunch >of stored procedures. > * I can easily scale out if necessary: Currently my database and server >process run on the same machine, but I could distribute them over >several machines with (almost) no change in logic. > > hp > Sorry to revive such an old topic. I am facing a similar requirement where I am running multiple queries concurrently. Like Qiu Xiafei, I am looking at SPI, and dynamic background workers. In particular, I am using SPI_execq(...) on each dynamic background workers I spawn. What I am experiencing is that I am not seeing a speedup, and I am beginning to wonder if I have done something wrong, if the overheads are too big, or if there are some limitations I am not aware of. As I see that none of the comments here make much of a reference to performance/speedup, would you be so kind as to tell me how satisfied you were with performance? Any insights would be greatly appreciated. Thanks, Tom