Multiple startup messages over the same connection
Hello, A question about protocol design - would it be possible to extend the protocol, so it can handle multiple startup / authentication messages over a single connection? Are there any serious obstacles? (possible issues with re-initialization of backends, I guess?) If that is possible, it could improve one important edge case - where you have to talk to multiple databases on a single host currently, you need to open a separate connection to each of them. In some cases (multitenancy for example), you may have thousands of databases on a host, which leads to inefficient connection utilization on clients (on the db side too). A lot of other RDBMSes don't have this limitation. thank you, -Vladimir Churyukin
Re: Multiple startup messages over the same connection
On Mon, Jan 22, 2024 at 11:43 PM Heikki Linnakangas wrote: > On 22/01/2024 21:58, Vladimir Churyukin wrote: > > A question about protocol design - would it be possible to extend the > > protocol, so it can handle multiple startup / authentication messages > > over a single connection? Are there any serious obstacles? (possible > > issues with re-initialization of backends, I guess?) > > If that is possible, it could improve one important edge case - where > > you have to talk to multiple databases on a single host currently, you > > need to open a separate connection to each of them. In some cases > > (multitenancy for example), you may have thousands of databases on a > > host, which leads to inefficient connection utilization on clients (on > > the db side too). A lot of other RDBMSes don't have this limitation. > > The protocol and the startup message are the least of your problems. > Yeah, it would be nice if you could switch between databases, but the > assumption that one backend operates on one database is pretty deeply > ingrained in the code. Yes, I suspected that's the reason why it was not implemented so far, but what's the main problem there? Is the issue with the global data cleanup / re-initialization after the database is changed? Is it in 3rd party extensions that assume the same and may break? Anything else? -Vladimir Churyukin
Re: Improving inferred query column names
On Wed, Feb 22, 2023, 12:40 PM Andres Freund wrote: > Hi, > > On 2023-02-11 12:47:04 -0800, Vladimir Churyukin wrote: > > That is a good idea for simple cases, I'm just curious how it would look > > like for more complex cases (you can have all kinds of expressions as > > parameters for aggregate function calls). > > If it works only for simple cases, I think it would be confusing and not > > very helpful. > > I don't think it needs to be perfect to be helpful. > It doesn't need to be perfect, but it needs to be consistent. So far you proposed a rule to replace () with _. What is the plan for expressions, how to convert them to names (with deduplication I guess?, because there could be 2 similar expressions mapped to the same name potentially). > > Wouldn't it make more sense to just deduplicate the names by adding > > numerical postfixes, like sum_1, sum_2? > > That'd be considerably worse than what we do today imo, because any > reordering > / added aggregate would lead to everything else changing as well. > Ok, that I kinda agree with. Not necessarily worse overall, but worse for some cases. Well, the proposal above about keeping the names exactly the same as the full expressions is probably the best we can do then. It will take care of possible duplications and won't be position-sensitive. And will be consistent. The only issue is somewhat unusual column names that you will have to use quotes to refer to. But is that a real issue? -Vladimir Churyukin >
Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
As an end user that spends a lot of time optimizing pretty complicated queries, I'd say that something like this could be useful. Right now the optimizer is mostly a black box. Why it chooses one plan or the other, it's a mystery. I have some general ideas about that, and I can even read and sometimes debug optimizer's code to dig deeper (although it's not always possible to reproduce the same behavior as in the production system anyway). I'm mostly interested to find where exactly the optimizer was wrong and what would be the best way to fix it. Currently Postgres is not doing a great job in that department. EXPLAIN output can tell you about mispredictions, but the logic of choosing particular plans is still obscure, because the reasons for optimizer's decisions are not visible. If configuring OPTIMIZER_DEBUG through GUC can help with that, I think it would be a useful addition. Now, that's general considerations, I'm not somebody who actually uses OPTIMIZER_DEBUG regularly (but maybe I would if it's accessible through GUC), I'm just saying that is an area where improvements would be very much welcomed. -Vladimir On Tue, Jan 3, 2023 at 4:57 PM David Rowley wrote: > On Tue, 3 Jan 2023 at 19:59, Ankit Kumar Pandey > wrote: > > > > > > On 03/01/23 08:38, David Rowley wrote: > > > Do you actually have a need for this or are you just trying to tick > > > off some TODO items? > > > > > I would say Iatter but reason I picked it up was more on side of > > learning optimizer better. > > I think it's better you leave this then. I think if someone comes > along and demonstrates the feature's usefulness and can sell us having > it so we can easily enable it by GUC then maybe that's the time to > consider it. I don't think ticking off a TODO item is reason enough. > > > Also from the thread, > > > > > https://www.postgresql.org/message-id/20120821.121611.501104647612634419.t-is...@sraoss.co.jp > > > > > +1. It would also be popular with our academic users. > > > > > There could be potential for this as well. > > I think the argument is best coming from someone who'll actually use it. > > David > > >
Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
On Tue, Jan 3, 2023 at 7:41 PM David Rowley wrote: > On Wed, 4 Jan 2023 at 16:15, Vladimir Churyukin > wrote: > > As an end user that spends a lot of time optimizing pretty complicated > queries, I'd say that something like this could be useful. > > I think we really need to at least see that it *is* useful, not that > it *could be* useful. For example, as an end user, you might not find > it great that the output is sent to stdout rather than to the window > that you execute the query in. > That's true, as an end user I would expect to see the output as a query output, not in stdout. > From what I can see here, the motivation to make this a useful feature > is backwards from what is normal. I think if you're keen to see a > feature that allows you better visibility into rejected paths then you > need to prove this is it rather than speculating that it might be > useful. > > You can't see people using the feature unless you make it useful. If it's not useful right now (because it's implemented as a compile-time flag with stdout prints for example), it doesn't mean it's not useful when it becomes more convenient. Probably the best way to find out is to create a *convenient* extension and see if people start using it. > There was a bit of work being done in [1] with the end goal of having > the ability for add_path to call a hook function before it outright > rejects a path. Maybe that would be a better place to put this and > then write some contrib module that provides some extended output in > EXPLAIN. That might require some additional fields so that we could > carry forward additional information that we'd like to show in > EXPLAIN. I imagine it's not ok just to start writing result lines in > the planner. The EXPLAIN format must be considered too and explain.c > seems like the place that should be done. add_path might need to > become a bit more verbose about the reason it rejected a certain path > for this to be useful. > I agree, extended EXPLAIN output would be a much better solution than writing into stdout. Can be implemented as an extra EXPLAIN flag, something like EXPLAIN (TRACE). One of the issues here is the result will rather be pretty long (and may consist of multiple parts, so something like returning multiple refcursors might be necessary, so a client can fetch multiple result sets. Otherwise it won't be human-readable. Although it's not necessary the purpose, if the purpose is to make it machine-readable and create tools to interpret the results, json format and a single resultset would be ok. The result can be represented as a list of trace events that shows profiler logic (the traces can be generated by the hook you mentioned and or by some other additional hooks). Is that what you were talking about? Another thing, since people react to this TODO item on https://wiki.postgresql.org/wiki/Todo, maybe it's better to modify or remove it, so they don't spend time working on something that is pretty much a dead end currently? -Vladimir
Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
On Tue, Jan 3, 2023 at 9:55 PM Tom Lane wrote: > David Rowley writes: > > The thing I had in mind was some mode that would record additional > > details during planning that could be tagged onto the final plan in > > createplan.c so that EXPLAIN could display them. I just think that > > EXPLAIN is the place where people go to learn about _what_ the plan is > > and it might also be the place where they might expect to go to find > > out more details about _why_ that plan was chosen. I by no means have > > a fully bakes idea on what that would look like, but I just think that > > dumping a bunch of lines to stdout is not going to be useful to many > > people and we need to think of something better in order to properly > > make this useful. > > There's a number of problems in this area, but I think the really > fundamental issue is that for speed reasons the planner wants to > reject losing plan alternatives as quickly as possible. So we simply > don't pursue those alternatives far enough to produce anything that > could serve as input for EXPLAIN (in its current form, anyway). > That's not necessarily a fundamental issue for EXPLAIN (well, in theory, not sure if there are fundamental limitations of the current implementation). When somebody runs EXPLAIN, they don't necessarily care that much about its performance, as long as it returns results in reasonable time. So if the planner does some extra work in that mode to better display why the specific path was chosen, it should probably be ok from the performance perspective. > What that means is that a trace of add_path decisions just can't be > very useful to an end user: there isn't enough data to present the > decisions in a recognizable form, besides which there is too much > noise because most of the rejected options are in fact silly. > So indeed we find that even hard-core developers aren't interested > in consuming the data in that form. > Even if the output is not very human-readable, it still can be useful, if there are tools that consume the output and extract meaningful data while omitting meaningless noise (if the meaningful data exists there of course). > Another issue is that frequently the problem is that we never > considered the desired plan at all, so that even if you had a > perfectly readable add_path trace it wouldn't show you what you want > to see. This might happen because the planner is simply incapable of > producing that plan shape from the given query, but often it happens > for reasons like "function F() used in the query is marked volatile, > so we didn't flatten a subquery or consider an indexscan or whatever". > I'm not sure how we could produce output that would help people > discover that kind of problem ... but I am sure that an add_path > trace won't do it. > > So, not only am I pretty down on exposing OPTIMIZER_DEBUG in > its current form, but I don't really believe that adding hooks > to add_path would allow an extension to produce anything of value. > I'd for sure want to see a convincing demonstration to the contrary > before we slow down that hot code path by adding hook calls. Pardon my ignorance, but I'm curious, how changes in planner code are currently validated? Let's say, you add some extra logic that introduces different paths in some cases, or adjust some constants. How do you validate this logic doesn't slow down something else dramatically? I see some EXPLAIN output checks in regression tests (not that many though), so I'm curious how regressions in planning are currently tested. Not the simple ones, when you have a small input and predictable plan/output, but something that can happen with more or less real data distribution on medium / large datasets. -Vladimir Churyukin
Making auto_explain more useful / convenient
Hello, I'm not sure if these ideas were circulated before or not. We use auto_explain a lot to investigate slow/problematic queries. One of the main issues with its usability comes from the fact that EXPLAIN output is logged rather than returned to the caller in some way. If you have a large cluster with lots of replicas, there is also an extra inconvenience of log accumulation, search, etc. Why not have an option to return EXPLAIN results as a NoticeResponse instead? That would make its usage more convenient. Another thing is tangentially related... I think it may be good to have a number of options to generate significantly shorter output similar to EXPLAIN. EXPLAIN is great, but sometimes people need more concise and specific information, for example total number of buffers and reads by certain query (this is pretty common), whether or not we had certain nodes in the plan (seq scan, scan of certain index(es)), how bad was cardinality misprediction on certain nodes, etc. It's not totally clear yet what would be the best way to define those rules, but I think we can come up with something reasonable. Logging or returning shorter messages like that can cause less overhead than logging full EXPLAIN and can potentially allow for better query monitoring overall. Do you see any potential issues with implementing those? Of course there should be more details, like what kind of configuration parameters to add, how to define rules for the 2nd case, etc. Just wanted to check if there are any objections in general. Thank you, -Vladimir Churyukin.
Re: Making auto_explain more useful / convenient
Thank you, that answers the first part of my question. On Sat, Nov 11, 2023 at 2:43 AM Sergei Kornilov wrote: > Hello > > auto_explain.log_level is available since postgresql 12. > > postgres=# load 'auto_explain'; > LOAD > postgres=# set auto_explain.log_min_duration to 0; > SET > postgres=# set auto_explain.log_level to 'notice'; > SET > postgres=# select 1; > NOTICE: duration: 0.010 ms plan: > Query Text: select 1; > Result (cost=0.00..0.01 rows=1 width=4) > ?column? > -- > 1 > > regards, Sergei >
Re: Making auto_explain more useful / convenient
On Sat, Nov 11, 2023 at 7:49 AM Tom Lane wrote: > Vladimir Churyukin writes: > > Why not have an option to return EXPLAIN results as a NoticeResponse > > instead? That would make its usage more convenient. > > That seems quite useless to me, and likely actually counterproductive. > If you are manually investigating query performance, you can just use > EXPLAIN directly. The point of auto_explain, ISTM, is to capture info > about queries issued by automated applications. So something like the > above could only work if you taught every one of your applications to > capture the NOTICE output, separate it from random other NOTICE > output, and then (probably) log it somewhere central for later > inspection. That's a lot of code to write, and at the end you'd > only have effectively duplicated existing tooling such as pgbadger. > Also, what happens in applications you forgot to convert? > > Sergey Kornilov just gave the right answer above in the thread for this one. Unfortunately, there are a lot of scenarios where you can't use pgbadger or any other log analysis or it's not convenient. There are a bunch of cloud hosted forks of postgres for example, not all of them give you this functionality. In AWS for example you need to download all the logs first, which complicates it significantly. The goal of this is not investigating performance of a single query but rather constant monitoring of a bunch (or all) queries, so you can detect plan degradations right away. > > Another thing is tangentially related... > > I think it may be good to have a number of options to generate > > significantly shorter output similar to EXPLAIN. EXPLAIN is great, but > > sometimes people need more concise and specific information, for example > > total number of buffers and reads by certain query (this is pretty > common), > > whether or not we had certain nodes in the plan (seq scan, scan of > certain > > index(es)), how bad was cardinality misprediction on certain nodes, etc. > > Maybe, but again I'm a bit skeptical. IME you frequently don't know > what you're looking for until you've seen the bigger picture. Zeroing > in on details like this could be pretty misleading. > > If you don't know what you're looking for, then it's not very useful, I agree. But in many cases you know. There are certain generic "signs of trouble" that you can detect by the amount of data the query processor scans, by cache hit rate for certain queries. presence of seq scans or scans of certain indexes, large differences between predicted and actual rows, some other stuff that may be relevant to your app/queries specifically that you want to monitor. We're already doing similar analysis on our side (a multi-terabyte db cluster with hundreds of millions to billions queries running daily). But it's not efficient enough because: 1. the problem I mentioned above, access to logs is limited on cloud environments 2. explain output could be huge, it causes performance issues because of its size. compact output is much more preferable for mass processing (it's even more important if this output is to notice messages rather than to logs, that's why I said it's tangentially related) Since it seems the notice output is already possible, half of the problem is solved already. I'll try to come up with possible options for more compact output then, unless you think it's completely futile. thank you, -Vladimir Churyukin
Re: Improving inferred query column names
That is a good idea for simple cases, I'm just curious how it would look like for more complex cases (you can have all kinds of expressions as parameters for aggregate function calls). If it works only for simple cases, I think it would be confusing and not very helpful. Wouldn't it make more sense to just deduplicate the names by adding numerical postfixes, like sum_1, sum_2? For backwards compatibility I guess you can have a GUC flag controlling that behavior that can be set into backwards compatibility mode if required. The previous functionality can be declared deprecated and removed (with the flag) once the current version becomes unsupported. (or with a different deprecation policy, I'm not sure what is the general rule for breaking changes and deprecation currently). If there is a clearly defined deprecation policy and a backwards compatibility option, it should be good, no? Just my 2 cents. -Vladimir Churyukin On Sat, Feb 11, 2023 at 11:24 AM Andres Freund wrote: > Hi, > > A common annoyance when writing ad-hoc analytics queries is column naming > once > aggregates are used. > > Useful column names: > SELECT reads, writes FROM pg_stat_io; > column names: reads, writes > > Not useful column names: > SELECT SUM(reads), SUM(writes) FROM pg_stat_io; > column names: sum, sum > > So i often end up manually writing: > SELECT SUM(reads) AS sum_reads, SUM(writes) AS sum_writes, ... FROM > pg_stat_io; > > > Of course we can't infer useful column names for everything, but for > something > like this, it should't be too hard to do better. E.g. by combining the > function name with the column name in the argument, if a single plain > column > is the argument. > > I think on a green field it'd be clearly better to do something like the > above. What does give me pause is that it seems quite likely to break > existing queries, and to a lesser degree, might break applications relying > on > inferred column names > > Can anybody think of a good way out of that? It's not like that problem is > going to go away at some point... > > Greetings, > > Andres Freund > > >
Re: Multiple startup messages over the same connection
On Mon, Jan 22, 2024 at 11:43 PM Heikki Linnakangas wrote: > On 22/01/2024 21:58, Vladimir Churyukin wrote: > > A question about protocol design - would it be possible to extend the > > protocol, so it can handle multiple startup / authentication messages > > over a single connection? Are there any serious obstacles? (possible > > issues with re-initialization of backends, I guess?) > > If that is possible, it could improve one important edge case - where > > you have to talk to multiple databases on a single host currently, you > > need to open a separate connection to each of them. In some cases > > (multitenancy for example), you may have thousands of databases on a > > host, which leads to inefficient connection utilization on clients (on > > the db side too). A lot of other RDBMSes don't have this limitation. > > The protocol and the startup message are the least of your problems. > Yeah, it would be nice if you could switch between databases, but the > assumption that one backend operates on one database is pretty deeply > ingrained in the code. > > -- > Heikki Linnakangas > Neon (https://neon.tech) > > Sorry to revive this old thread, just want to check on one thing: Let's say we keep one database per backend rule, I understand at this point it would be really hard to change. What if on a new startup message we just signal the postmaster about it, so it takes over the socket and spawns a new backend. After that we terminate the old one. How does it sound like in terms of implementation complexity? I guess the process of passing control from child processes to the parent could be a bit tricky for that one, but doable? Is there anything I'm missing that can be a no-go for this? The end goal is to minimize a large overhead for clients having to deal with a large number of connections on multi-tenant systems (say, one client deals with thousands of databases on the same database server). -Vladimir Churyukin
Bypassing shared_buffers
Hello, There is often a need to test particular queries executed in the worst-case scenario, i.e. right after a server restart or with no or minimal amount of data in shared buffers. In Postgres it's currently hard to achieve (other than to restart the server completely to run a single query, which is not practical). Is there a simple way to introduce a GUC variable that makes queries bypass shared_buffers and always read from storage? It would make testing like that orders of magnitude simpler. I mean, are there serious technical obstacles or any other objections to that idea in principle? Thanks, -Vladimir Churyukin
Re: Bypassing shared_buffers
To be clear, I'm talking about bypassing shared buffers for reading data / indexes only, not about disabling it completely (which I guess is impossible anyway). -Vladimir Churyukin On Wed, Jun 14, 2023 at 5:57 PM Vladimir Churyukin wrote: > Hello, > > There is often a need to test particular queries executed in the > worst-case scenario, i.e. right after a server restart or with no or > minimal amount of data in shared buffers. In Postgres it's currently hard > to achieve (other than to restart the server completely to run a single > query, which is not practical). Is there a simple way to introduce a GUC > variable that makes queries bypass shared_buffers and always read from > storage? It would make testing like that orders of magnitude simpler. I > mean, are there serious technical obstacles or any other objections to that > idea in principle? > > Thanks, > -Vladimir Churyukin >
Re: Bypassing shared_buffers
Ok, got it, thanks. Is there any alternative approach to measuring the performance as if the cache was empty? The goal is basically to calculate the max possible I/O time for a query, to get a range between min and max timing. It's ok if it's done during EXPLAIN ANALYZE call only, not for regular executions. One thing I can think of is even if the data in storage might be stale, issue read calls from it anyway, for measuring purposes. For EXPLAIN ANALYZE it should be fine as it doesn't return real data anyway. Is it possible that some pages do not exist in storage at all? Is there a different way to simulate something like that? -Vladimir Churyukin On Wed, Jun 14, 2023 at 6:22 PM Tom Lane wrote: > Vladimir Churyukin writes: > > There is often a need to test particular queries executed in the > worst-case > > scenario, i.e. right after a server restart or with no or minimal amount > of > > data in shared buffers. In Postgres it's currently hard to achieve (other > > than to restart the server completely to run a single query, which is not > > practical). Is there a simple way to introduce a GUC variable that makes > > queries bypass shared_buffers and always read from storage? It would make > > testing like that orders of magnitude simpler. I mean, are there serious > > technical obstacles or any other objections to that idea in principle? > > It's a complete non-starter. Pages on disk are not necessarily up to > date; but what is in shared buffers is. > > regards, tom lane >
Re: Bypassing shared_buffers
Do you foresee any difficulties in implementation of the "unwarm" operation? It requires a cache flush operation, so I'm curious how complicated that is (probably there is a reason this is not supported by Postgres by now? mssql and oracle support stuff like that for a long time) Cluster restart is not an option for us unfortunately, as it will be required for each query pretty much, and there are a lot of them. An ideal solution would be, if it's possible, to test it in parallel with other activities... Evicting all the other stuff using pg_prewarm is an interesting idea though (if a large prewarm operation really evicts all the previously stored data reliably). It's a bit hacky, but thanks, I think it's possible to make this work with some effort. It will require exclusive access just for that testing, which is not ideal but may work for us. -Vladimir )churyukin On Wed, Jun 14, 2023 at 7:29 PM Thomas Munro wrote: > On Thu, Jun 15, 2023 at 1:37 PM Vladimir Churyukin > wrote: > > Ok, got it, thanks. > > Is there any alternative approach to measuring the performance as if the > cache was empty? > > There are two levels of cache. If you're on Linux you can ask it to > drop its caches by writing certain values to /proc/sys/vm/drop_caches. > For PostgreSQL's own buffer pool, it would be nice if someone would > extend the pg_prewarm extension to have a similar 'unwarm' operation, > for testing like that. But one thing you can do is just restart the > database cluster, or use pg_prewarm to fill its buffer pool up with > other stuff (and thus kick out the stuff you didn't want in there). >
Re: Bypassing shared_buffers
It could be cheaper, if the testing is done for many SELECT queries sequentially - you need to flush dirty buffers just once pretty much. -Vladimir Churyukin On Wed, Jun 14, 2023 at 7:43 PM Tom Lane wrote: > Thomas Munro writes: > > There are two levels of cache. If you're on Linux you can ask it to > > drop its caches by writing certain values to /proc/sys/vm/drop_caches. > > For PostgreSQL's own buffer pool, it would be nice if someone would > > extend the pg_prewarm extension to have a similar 'unwarm' operation, > > for testing like that. But one thing you can do is just restart the > > database cluster, or use pg_prewarm to fill its buffer pool up with > > other stuff (and thus kick out the stuff you didn't want in there). > > But that'd also have to push out any dirty buffers. I'm skeptical > that it'd be noticeably cheaper than stopping and restarting the > server. > > regards, tom lane >
Re: Bypassing shared_buffers
On Thu, Jun 15, 2023 at 12:32 AM Konstantin Knizhnik wrote: > > > On 15.06.2023 4:37 AM, Vladimir Churyukin wrote: > > Ok, got it, thanks. > > Is there any alternative approach to measuring the performance as if > > the cache was empty? > > The goal is basically to calculate the max possible I/O time for a > > query, to get a range between min and max timing. > > It's ok if it's done during EXPLAIN ANALYZE call only, not for regular > > executions. > > One thing I can think of is even if the data in storage might be > > stale, issue read calls from it anyway, for measuring purposes. > > For EXPLAIN ANALYZE it should be fine as it doesn't return real data > > anyway. > > Is it possible that some pages do not exist in storage at all? Is > > there a different way to simulate something like that? > > > > I do not completely understand what you want to measure: how fast cache > be prewarmed or what is the performance > when working set doesn't fit in memory? > > No, it's not about working set or prewarming speed. We're trying to see what is the worst performance in terms of I/O, i.e. when the database just started up or the data/indexes being queried are not cached at all. Why not changing `shared_buffers` size to some very small values (i.e. > 1MB) doesn't work? > As it was already noticed, there are levels of caching: shared buffers > and OS file cache. > By reducing size of shared buffers you rely mostly on OS file cache. > And actually there is no big gap in performance here - at most workloads > I didn't see more than 15% difference). > I thought about the option of setting minimal shared_buffers, but it requires a server restart anyway, something I'd like to avoid. You can certainly flush OS cache `echo 3 > /proc/sys/vm/drop_caches` and > so simulate cold start. > But OS cached will be prewarmed quite fast (unlike shared buffer because > of strange Postgres ring-buffer strategies which cause eviction of pages > from shared buffers even if there is a lot of free space). > > So please more precisely specify the goal of your experiment. > "max possible I/O time for a query" depends on so many factors... > Do you consider just one client working in isolation or there will be > many concurrent queries and background tasks like autovacuum and > checkpointer competing for the resources? > > My point is that if you need some deterministic result then you will > have to exclude a lot of different factors which may affect performance > and then ... you calculate speed of horse in vacuum, which has almost no > relation to real performance. > > Exactly, we need more or less deterministic results for how bad I/O timings can be. Even though it's not necessarily the numbers we will be getting in real life, it gives us ideas about distribution, and it's useful because we care about the long tail (p99+) of our queries. For simplicity let's say it will be a single client only (it will be hard to do the proposed solutions reliably with other stuff running in parallel anyway). -Vladimir Churyukin