Hi,

On 2023-05-05 16:44:39 +0800, 吴昊 wrote:
> When I wrote an extension to implement a new storage by table access method. 
> I found some issues
> that the existing code has strong assumptions for heap tables now. Here are 3 
> issues that I currently have:
> 
> 
> 1. Index access method has a callback to handle reloptions, but table access 
> method hasn't. We can't
> add storage-specific parameters by `WITH` clause when creating a table.

Makes sense to add that.


> 2. Existing code strongly assumes that the data file of a table structures by 
> a serial physical files named
> in a hard coded rule: <relfilenode>[.<segno>]. It may only fit for heap like 
> tables. A new storage may have its
> owner structure on how the data files are organized. The problem happens when 
> dropping a table.

I agree that it's not great, but I don't think it's particularly easy to fix
(because things like transactional DDL require fairly tight integration). Most
of the time it should be possible can also work around the limitations though.


> 3. The existing code also assumes that the data file consists of a series of 
> fix-sized block. It may not
> be desired by other storage. Is there any suggestions on this situation?

That's a requirement of using the buffer manager, but if you don't want to
rely on that, you can use a different pattern. There's some limitations
(format of TIDs, most prominently), but you should be able to deal with that.

I don't think it would make sense to support other block sizes in the buffer
manager.


> The rest of this mail is to talk about the first issue. It looks reasonable 
> to add a similar callback in
> struct TableAmRoutine, and parse reloptions by the callback. This patch is in 
> the attachment file.

Why did you add relkind to the callbacks? The callbacks are specific to a
certain relkind, so I don't think that makes sense.

I don't think we really need GetTableAmRoutineByAmId() that raises nice
errors etc - as the AM has already been converted to an oid, we shouldn't need
to recheck?



> +bytea *
> +table_reloptions_am(Oid accessMethodId, Datum reloptions, char relkind, bool 
> validate)
>  {
> ...
> 
> +             /* built-in table access method put here to fetch TAM fast */
> +             case HEAP_TABLE_AM_OID:
> +                     tam = GetHeapamTableAmRoutine();
> +                     break;
>               default:
> -                     /* other relkinds are not supported */
> -                     return NULL;
> +                     tam = GetTableAmRoutineByAmId(accessMethodId);
> +                     break;

Why do we need this fastpath? This shouldn't be something called at a
meaningful frequency?


> }
> +     return table_reloptions(tam->amoptions, reloptions, relkind, validate);
> }

I'd just pass the tam, instead of an individual function.

> @@ -866,6 +866,11 @@ typedef struct TableAmRoutine
>                                                                               
>    struct SampleScanState *scanstate,
>                                                                               
>    TupleTableSlot *slot);
>  
> +     /*
> +      * This callback is used to parse reloptions for relation/matview/toast.
> +      */
> +     bytea       *(*amoptions)(Datum reloptions, char relkind, bool 
> validate);
> +
>  } TableAmRoutine;

Did you mean table instead of relation in the comment?



> Another thing about reloption is that the current API passes a parameter 
> `validate` to tell the parse
> functioin to check and whether to raise an error. It doesn't have enough 
> context when these reloptioins
> are used:
> 1. CREATE TABLE ... WITH(...)
> 2. ALTER TABLE ... SET ...
> 3. ALTER TABLE ... RESET ...
> The reason why the context matters is that some reloptions are disallowed to 
> change after creating
> the table, while some reloptions are allowed.

What kind of reloption are you thinking of here?

Greetings,

Andres Freund


Reply via email to