Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

Pavel Stehule Fri, 04 Feb 2011 11:51:23 -0800

2011/2/4 Robert Haas <robertmh...@gmail.com>:
> On Fri, Jan 28, 2011 at 2:19 PM, Pavel Stehule <pavel.steh...@gmail.com> 
> wrote:
>> 2011/1/28 Robert Haas <robertmh...@gmail.com>:
>>> On Tue, Jan 25, 2011 at 11:29 AM, Pavel Stehule <pavel.steh...@gmail.com> 
>>> wrote:
>>>> variant a) my first patch - detoast on first usage with avoiding to
>>>> useless detoast checking
>>>> variant b) my first patch - detoast on first usage without avoiding to
>>>> useless detoast checking
>>>>
>>>> time for 1 - about 300 ms, a is bout 1.5% faster than b
>>>> time for 2 - about 30000 ms, a is about 3% faster than b
>>>
>>> This makes your approach sound pretty good, but it sounds like we
>>> might need to find a better way to structure the code.
>>>
>>
>> do you have a any idea?
>
> At first blush, the should_be_detoasted flag looks completely
> unnecessary to me.  I don't see why we have to set a flag in one part
> of the code to tell some other part of the code whether or not it
> should detoast.  Why is that necessary or a good idea?  Why can't we
> just make the decision at the point where we're detoasting?


a logic about detoasting isn't exactly simple, and you see on tests,
so "bypass" save a 3%. I can't to say, if it's much or less. Just take
a 3%. Isn't nothing simpler than remove these flags.

>
> Also, I believe I'm agreeing with Tom when I say that
> exec_eval_datum() doesn't look like the right place to do this.  Right
> now that function is a very simple switch, and I've found that it's
> usually a bad idea to stick complex code into the middle of such
> things.  It looks like function only has three callers, so maybe we
> should look at each caller individually and see whether it needs this
> logic or not.  For example, I'm guessing make_tuple_from_row()
> doesn't, because it's only calling exec_eval_datum() so that it can
> pass the results to heap_form_tuple(), which already contains some
> logic to detoast in certain cases.  It's not clear why we'd want
> different logic here than we do anywhere else.  The other callers are
> exec_assign_value(), where this looks like it could be important to
>

first we have to decide when we will do a  detoasting - there are two variants

a) when we write a toasted data to variable
b) when we use a toasted data

@a needs to modify: copy parameters and exec_assign_value,
@b needs to modify plpgsql_param_fetch or exec_eval_datum. Really
important is plpgsql_param_fetch, because it is only point, where we
know, so some Datum will be used, but wasn't detoasted yet. Problem is
"wrong" memory context. And plpgsql_param_fetch is callback, that is
emitted from different SPI functions so is hard to move a some lines
to function, that will run under good context.

There isn't necessary to detoast Datum, when value is only copied.

The disadvantage of @a can be so we remove 95% of pathologic cases,
and we create a 5% new. @b is terrible because the most important
point (main work) is done under parser memory context. I don't
understand you what is complex on (this code can be moved to
function). But @b needs only one place for detosting, because it's
nice centralised, that you and Tom dislike.

oldctx = MemoryContextSwitchTo(fcecxt);
if (!var->typbyval && var->typlen == -1)
  x = detoast_datum(var);
  if (x != var)
  {
    pfree(var)
    var = x;
  }
MemoryContextSwitchTo(oldctx);

You can put this code to  plpgsql_param_fetch or exec_eval_datum or
you have to copy this code two or three times.

I have not a idea how to design it well to by you and Tom happy :(

When I thinking about it, then I am sure, so best solution is really
global cache of detoasted values. It cannot be well solved on local
area. On local area I can do correct decision only when I know, so I
am inside a cycle and I work with some Datum repeatedly.  Detoasting
on assign can has impacts on FOR stmt, iteration over cursors ...

FOR a,b,c IN SELECT * FROM foo
LOOP
  if a THEN CONTINUE; END IF;
  process b,c -- b and c are toastable values.
END LOOP;

> avoid repeatedly detoasting the array, and plpgsql_param_fetch(),
> which I'm not sure about, but perhaps you have an idea or can
> benchmark it.

It's hard to benchmark it. I can create a use case, where @a win or @b
win. I believe so in almost all cases @a or @b will be better than
current state. And I afraid, so there can be situation where implicit
detoast can be bad.

Regards

Pavel Stehule

p.s. as one idea is a flag for function, that can to specify, if
inside function can be detoasting lazy or aggressive

.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql

Reply via email to