Re: The documentation for storage type 'plain' actually allows single byte header

2023-01-15 Thread Tom Lane
Laurenz Albe  writes:
> On Tue, 2023-01-10 at 15:53 +, PG Doc comments form wrote:
>>> PLAIN prevents either compression or out-of-line storage; furthermore it
>>> disables use of single-byte headers for varlena types. This is the only
>>> possible strategy for columns of non-TOAST-able data types.

>> However, it does allow "single byte" headers. How to verify this?
>> CREATE EXTENSION pageinspect;
>> CREATE TABLE test(a VARCHAR(1) STORAGE PLAIN);
>> INSERT INTO test VALUES (repeat('A',10));
>> 
>> Now peek into the page with pageinspect functions
>> 
>> SELECT left(encode(t_data, 'hex'), 40) FROM
>> heap_page_items(get_raw_page('test', 0));
>> 
>> This returned value of "1741414141414141414141".

> I think that the documentation is wrong.  The attached patch removes the
> offending half-sentence.

The documentation is correct, what is broken is the code.  I'm not
sure when we broke it, but what I see in tracing through the INSERT
is that we are forming the tuple using a tupdesc with the wrong
value of attstorage.  It looks like the tupdesc belongs to the
virtual slot representing the output of the INSERT statement,
which is not identical to the target relation's tupdesc.

(The virtual slot's tupdesc is probably reverse-engineered from
just the data types of the columns, so it'll have whatever is the
default attstorage for the data type.  It's blind luck that this
attstorage value isn't used for anything more consequential,
like TOAST decisions.)

regards, tom lane




Re: The documentation for storage type 'plain' actually allows single byte header

2023-01-15 Thread Andres Freund
Hi,

On 2023-01-15 16:40:27 -0500, Tom Lane wrote:
> The documentation is correct, what is broken is the code.  I'm not
> sure when we broke it

Looks to be an old issue, predating the slot type stuff. It reproduces at
least as far back as 10.

I've not thought through this fully. But after a first look, this might be
hard to fix without incuring a lot of overhead / complexity. We check whether
projection is needed between nodes with tlist_matches_tupdesc() - targetlists
don't know about storage. And we decide whether we need to project in
nodeModifyTuple solely based on

/* Extract non-junk columns of the subplan's result tlist. */
foreach(l, subplan->targetlist)
{
TargetEntry *tle = (TargetEntry *) lfirst(l);

if (!tle->resjunk)
insertTargetList = lappend(insertTargetList, tle);
else
need_projection = true;
}

Greetings,

Andres Freund




Re: The documentation for storage type 'plain' actually allows single byte header

2023-01-15 Thread Tom Lane
Andres Freund  writes:
> On 2023-01-15 16:40:27 -0500, Tom Lane wrote:
>> The documentation is correct, what is broken is the code.  I'm not
>> sure when we broke it

> I've not thought through this fully. But after a first look, this might be
> hard to fix without incuring a lot of overhead / complexity.

It appeared to me that it was failing at this step in
ExecGetInsertNewTuple:

if (relinfo->ri_newTupleSlot->tts_ops != planSlot->tts_ops)
{
ExecCopySlot(relinfo->ri_newTupleSlot, planSlot);
return relinfo->ri_newTupleSlot;
}

ri_newTupleSlot has the tupdesc we want, planSlot is a virtual slot
that has the bogus tupdesc, and for some reason heap_form_tuple is
getting called with planSlot's tupdesc not ri_newTupleSlot's.  I'm
not quite sure if this is just a thinko somewhere or there's a
deficiency in the design of the slot APIs.

The UPDATE path seems to work fine, btw.

regards, tom lane




Re: The documentation for storage type 'plain' actually allows single byte header

2023-01-15 Thread Andres Freund
Hi,

On 2023-01-15 18:08:21 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-01-15 16:40:27 -0500, Tom Lane wrote:
> >> The documentation is correct, what is broken is the code.  I'm not
> >> sure when we broke it
>
> > I've not thought through this fully. But after a first look, this might be
> > hard to fix without incuring a lot of overhead / complexity.
>
> It appeared to me that it was failing at this step in
> ExecGetInsertNewTuple:
>
> if (relinfo->ri_newTupleSlot->tts_ops != planSlot->tts_ops)
> {
> ExecCopySlot(relinfo->ri_newTupleSlot, planSlot);
> return relinfo->ri_newTupleSlot;
> }
>
> ri_newTupleSlot has the tupdesc we want, planSlot is a virtual slot
> that has the bogus tupdesc, and for some reason heap_form_tuple is
> getting called with planSlot's tupdesc not ri_newTupleSlot's.

The way we copy a slot into a heap slot is to materialize the source slot and
copy the heap tuple into target slot. Which is also what happened before the
slot type abstraction (hence the problem also existing before that was
introduced).


> I'm not quite sure if this is just a thinko somewhere or there's a
> deficiency in the design of the slot APIs.

I think it's fairly fundamental that copying between two slots assumes a
compatible tupdescs.

I think the problem is more in the determination whether we need to project,
or not (i.e. ExecInitInsertProjection()). But we can't really make a good
decision, because we just determine the types of "incoming" tuples based on
targetlists, which don't contain information about the storage type.

Greetings,

Andres Freund




Re: The documentation for storage type 'plain' actually allows single byte header

2023-01-15 Thread Tom Lane
Andres Freund  writes:
> On 2023-01-15 18:08:21 -0500, Tom Lane wrote:
>> ri_newTupleSlot has the tupdesc we want, planSlot is a virtual slot
>> that has the bogus tupdesc, and for some reason heap_form_tuple is
>> getting called with planSlot's tupdesc not ri_newTupleSlot's.

> The way we copy a slot into a heap slot is to materialize the source slot and
> copy the heap tuple into target slot. Which is also what happened before the
> slot type abstraction (hence the problem also existing before that was
> introduced).

Hmm.  For the case of virtual->physical slot, that doesn't sound
terribly efficient.

> I think it's fairly fundamental that copying between two slots assumes a
> compatible tupdescs.

We could possibly make some effort to inject the desired attstorage
properties into the planSlot's tupdesc.  Not sure where would be a
good place.

regards, tom lane




Re: The documentation for storage type 'plain' actually allows single byte header

2023-01-15 Thread Andres Freund
Hi,

On 2023-01-15 18:41:22 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-01-15 18:08:21 -0500, Tom Lane wrote:
> >> ri_newTupleSlot has the tupdesc we want, planSlot is a virtual slot
> >> that has the bogus tupdesc, and for some reason heap_form_tuple is
> >> getting called with planSlot's tupdesc not ri_newTupleSlot's.
>
> > The way we copy a slot into a heap slot is to materialize the source slot 
> > and
> > copy the heap tuple into target slot. Which is also what happened before the
> > slot type abstraction (hence the problem also existing before that was
> > introduced).
>
> Hmm.  For the case of virtual->physical slot, that doesn't sound
> terribly efficient.

It's ok, I think. For virtual->heap we form the tuple in the context of the
destination heap slot. I don't think we could avoid creating a HeapTuple. I
guess we could try to avoid needing to deform the heap tuple again in the
target slot, but I'm not sure that's worth the complexity (we'd need to
readjust by-reference datums to point into the heap tuple). It might be worth
adding a version of ExecCopySlot() that explicitly does that, I think it could
be useful for some executor nodes that know that columns will be accessed
immediately after.


> > I think it's fairly fundamental that copying between two slots assumes a
> > compatible tupdescs.
>
> We could possibly make some effort to inject the desired attstorage
> properties into the planSlot's tupdesc.  Not sure where would be a
> good place.

I'm not sure that'd get us very far. Consider the case of
INSERT INTO table_using_plain SELECT * FROM table_using_extended;

In that case we just deal with heap tuples coming in, without a need to
project, without a need to copy from one slot to another.


I don't see how we can fix this mess entirely without tracking the storage
type a lot more widely. Most importantly in targetlists, as we use the
targetlists to compute the tupledescs of executor nodes, which then influence
where we build projections.


Given that altering a column to PLAIN doesn't rewrite the table, we already
have to be prepared to receive short or compressed varlenas, even after
setting STORAGE to PLAIN.

I think we should consider just reformulating the "furthermore it disables use
of single-byte headers for varlena types" portion to say that short varlenas
are disabled for non-toastable datatypes. I don't see much point in investing
a lot of complexity making this a hard restriction. Afaict the only point in
changing to PLAIN is to disallow external storage and compression, which it
achieves eved when using short varlenas.

The compression bit is a bit worse, I guess. We probably have the same problem
with EXTERNAL, which supposedly doesn't allow compression - but I don't think
we have code ensuring that we decompress in-line datums. It'll end up
happening if there's other columns that get newly compressed or stored
externally, but not guaranteed.


Greetings,

Andres Freund




Re: The documentation for storage type 'plain' actually allows single byte header

2023-01-15 Thread Andres Freund
Hi,

On 2023-01-15 16:49:01 -0800, Andres Freund wrote:
> I don't see how we can fix this mess entirely without tracking the storage
> type a lot more widely. Most importantly in targetlists, as we use the
> targetlists to compute the tupledescs of executor nodes, which then influence
> where we build projections.
> 
> 
> Given that altering a column to PLAIN doesn't rewrite the table, we already
> have to be prepared to receive short or compressed varlenas, even after
> setting STORAGE to PLAIN.
> 
> I think we should consider just reformulating the "furthermore it disables use
> of single-byte headers for varlena types" portion to say that short varlenas
> are disabled for non-toastable datatypes. I don't see much point in investing
> a lot of complexity making this a hard restriction. Afaict the only point in
> changing to PLAIN is to disallow external storage and compression, which it
> achieves eved when using short varlenas.
> 
> The compression bit is a bit worse, I guess. We probably have the same problem
> with EXTERNAL, which supposedly doesn't allow compression - but I don't think
> we have code ensuring that we decompress in-line datums. It'll end up
> happening if there's other columns that get newly compressed or stored
> externally, but not guaranteed.

One way we could deal with it would be to force the tuple to be processed by
heap_toast_insert_or_update() when there's a difference between typstorage and
attstorage. I think to make that cheap enough to determine, we'd have to cache
that information in the relcache. I haven't thought it through, but I suspect
it'd be problematic to add a pg_type lookup to RelationBuildTupleDesc(),
leading to building that information on demand later.

Greetings,

Andres Freund