On Wed, 30 Jul 2025 at 14:49, Peter Smith <smithpb2...@gmail.com> wrote:
> On Fri, Jul 25, 2025 at 1:58 PM Japin Li <japi...@hotmail.com> wrote:
>>
>> On Wed, 23 Jul 2025 at 14:07, Peter Smith <smithpb2...@gmail.com> wrote:
>> > On Tue, Jul 22, 2025 at 8:12 PM Japin Li <japi...@hotmail.com> wrote:
>> > ...
> ...
>> >>
>> >> Or is it by design that users are unable to read the internal relations?
>> >>
>> >
>> > IIUC, those VCI internal relations (implemented as materialized views)
>> > are entirely managed by VCI logic. Users are not required to be aware
>> > of them, and they definitely are not meant to tamper with them.
>> >
>>
>> Thanks for your explanation!
>>
>> > The REFRESH that you attempted should have caused a more graceful error, 
>> > like:
>> > ERROR:  extension "vci" prohibits this operation on view
>> > "vci_0000016482_00000_d"
>> > So, thanks for reporting that the ERROR failed. Investigating...
>>
>> I'm considering storing this metadata in heap tables, as Citus Columnar [1]
>> and TimescaleDB [2] also utilize them for some metadata. Is this a sound
>> approach? I'm wondering if this is a suitable strategy for VCI?
>>
>> [1] 
>> https://github.com/citusdata/citus/blob/main/src/backend/columnar/columnar_metadata.c#L174
>> [2] https://github.com/timescale/timescaledb/blob/main/src/chunk.c#L151
>>
>
> Hi Japin,
>
> TL;DR;
> ------
> What metadata relations did you have in mind to change, how do you
> want to change them, and what is the main motivation?
>

I've noticed a strange user experience with vci_xxx_{d,m}. Although they are
defined as materialized views, they cannot be queried, leading to an error
that suggests using REFRESH MATERIALIZED VIEW. However, that command is also
rejected.  This is my motivation.

>
> Metadata
> --------
> What relations did you mean when you referred to "metadata"? There are
> a number of VCI internal relations serving different purposes -- WOS,
> NULL vectors, ROS Columnar Data, etc. That relation
> ("vci_0000016482_00000_d") is not exactly what I would call metadata;
> that relation is for VCI ROS data of column 0 (the 'd' suffix means
> "data"), so it is real data that VCI maintains in the ROS and keeps in
> synch with the original data of the table being indexed.
>

My apologies, "metadata" I said might not be the most accurate term here.
I mean the internal relations.

> The relations that VCI considers as "metadata" have an 'm' suffix. e.g.
>
> +/**
> + * Meta Relation
> + */
> +#define VCI_RELTYPE_META     ('m')
> +
>
> All "metadata" files can be seen below with VCI_RELTYPE_META suffix.
> Are these the files you had in mind to change?
>
> + /* TID */
> + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID,
> VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
> + vci_SetMainRelVar(vmr_info, vcimrv_tid_data_oid, 0, oid);
> + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_TID,
> VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
> + vci_SetMainRelVar(vmr_info, vcimrv_tid_meta_oid, 0, oid);
> +
> + /* NUll */
> + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_NULL,
> VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
> + vci_SetMainRelVar(vmr_info, vcimrv_null_data_oid, 0, oid);
> + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_NULL,
> VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
> + vci_SetMainRelVar(vmr_info, vcimrv_null_meta_oid, 0, oid);
> +
> + /* Delete Vector */
> + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_DELETE,
> VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
> + vci_SetMainRelVar(vmr_info, vcimrv_delete_data_oid, 0, oid);
> + oid = vci_create_relation(GenRelName(indexRel, VCI_COLUMN_ID_DELETE,
> VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
> + vci_SetMainRelVar(vmr_info, vcimrv_delete_meta_oid, 0, oid);
>
> ...
>
> + vci_SetMainRelVar(vmr_info, vcimrv_num_columns, 0, tupdesc->natts);
> + for (i = 0; i < tupdesc->natts; i++)
> + {
> + Oid column_store_oid;
> + Oid column_meta_oid;
> + vcis_m_column_t *columnPointer;
> +
> + column_store_oid = vci_create_relation(GenRelName(indexRel, i,
> VCI_RELTYPE_DATA), indexRel, indexInfo, VCI_RELTYPE_ROS);
> + column_meta_oid = vci_create_relation(GenRelName(indexRel, i,
> VCI_RELTYPE_META), indexRel, indexInfo, VCI_RELTYPE_ROS);
>
>
> Heap Tables
> -----------
> When you say "storing ... in heap tables", I assumed you meant that
> some of the metadata files could be stored as normal row-based
> PostgreSQL files -- e.g. like PG catalogs, right?

Yeah, you got me.

>
> AFAIK the current VCI internal relations are implemented with a column
> of byte arrays that get mapped to VCI data structures in memory --
> e.g. the Extents and Offsets and Bit Vectors etc (as described in the
> README) are all stored in binary form. This arrangement also helps for
> manipulating Extents for compression/defragmentation, and for Delete
> Vector garbage collection and CRID renumbering.

Sorry, I haven't thought about those things yet.

>
> Can you please elaborate what is your goal of changing some of these
> relations to be normal PostgreSQL row-based tables? e.g. What problems
> do you have in mind that using the heap tables will solve?
>
> If your idea was mainly to assist debugging, then for this purpose I
> believe that the original VCI implementation (in the commercial
> product) had a separate "vci inspection" tool capable of dumping the
> content of the VCI internal relations in a human-readable format. But,
> we did not bring this tool across to OSS. Users are not supposed to be
> aware of these VCI internal relations, and they cannot change them, so
> even if they could see the contents of those heap tables it might
> appear meaningless.
>

I agree that the internal relations aren't for end users. However, it's
confusing that users can't query the materialized views.

> Current Focus
> --------------
> We know that the VCI patches still need more work before they might be
> acceptable to the OSS community. Currently, we are focusing on
> simplifying the existing VCI patches and fixing bugs. We also need to
> make the PostgreSQL core interface less intrusive, and add lots more
> test cases.
> 
> Changing any VCI internal relations to be heap tables may involve lots
> of code rewrite, so even if it can be shown advantageous to do it, I'd
> prefer to put this kind of work on a TODO list for later -- e.g. it
> might be premature to redesign the current logic before all the other
> necessary stability work is done first.

-- 
Regards,
Japin Li


Reply via email to