Re: percentile value check can be slow

jotpe Sun, 19 Nov 2017 06:38:09 -0800


On 19.11.2017 13:23, Tomas Vondra wrote:

Hi,

On 11/19/2017 03:10 AM, David Fetter wrote:

On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote:

Hi,

...

Is 'recognizer' an established definition I should know? Is it the same
as 'validator' or is it something new/different?


I borrowed it from http://langsec.org/

I'm not entirely sure what you mean by a validator, but a recognizer
is something that gives a quick and sure read as to whether the input
is well-formed. In general, it's along the lines of a tokenizer, a
parser, and something that does very light post-parse analysis for
correctness of form.

For the case that started the thread, a recognizer would check
something along the lines of

     CHECK('[0,1]' @> ALL(input_array))


OK, thanks. From what I understand, recognizer is more about recognizing
if a string is valid within a given formal language (essentially, if
it's a well-formed program). That may not be the right term for checks
on parameter values.

OTOH we already have "validators" on a number of places - functions
checking various parameters, e.g. reloptions for FDWs, etc.

But I guess the naming can be solved later ...

Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
     Allow optional CHECK constraints in CREATE AGGREGATE for direct
     arguments.


How will any of the approaches deal with something like

     select percentile_cont((select array_agg(v) from p))
            within group (order by a) from t;

In this case the the values are unknown after the parse analysis, so I
guess it does not really address that.


It doesn't.  Does it make sense to do a one-shot execution for cases
like that?  It might well be worth it to do the aggregate once in
advance as a throw-away if the query execution time is already going
to take awhile.  Of course, you can break that one by making p a JOIN
to yet another thing...

FWIW while I won't stand in the way of improving this, I wonder if this
is really worth the additional complexity. If you get errors like this
with a static list of values, you will fix the list and you're done. If
the list is dynamic (computed in the query itself), you'll still get the
error much later during query execution.

So if you're getting many failures like this for the "delayed error
reporting" to be an issue, perhaps there's something wrong in you stack
and you should address that instead?


I'd like to think that getting something to fail quickly and cheaply
when it can will give our end users a better experience.  Here,
"cheaply" refers to their computing resources and time.


The trouble is, this increases execution time for everyone, including
people who carefully construct the parameter values. That seems rather
undesirable.


Clearly, not having this happen in this case bothered Johannes
enough to wade in here.


No. He was surprised the error is reported after significant amount of
time, but he does not seem to claim failing faster would be valuable to
him. That is your assumption, and I have my doubts about it.

I did not know about the complexity that is needed to precheck theparameters. I thought maybe it could be done easily.

If it's too hard to change that, I wouldn't want that improvement.

Best Regards
Johannes

Re: percentile value check can be slow

Reply via email to