Hi,

I’d like to use PG as an analytics engine on multiple separately created
read-only datasets. The datasets, in my case, are independently created by
bringing up a local PG instance, populating a table, and shutting down the
PG instance. I do have control over both the creation and the query
processes.

After some research, I realize this is not a standard use-case, so I’m open
to development work in the internals as required. I’d appreciate some
advice/guidance on possible approaches.

   1. For dealing with independently created tables, I’m planning to have
   them organized in tablespaces during creation, let’s say one per table. In
   the PG instance where querying is performed, I can use symlinks to the
   relevant tablespaces from the data directory, and modify system catalogs
   appropriately for PG to be able to work with those tables.

Is this a viable approach?

Are there better alternatives?

Any pitfalls I should expect?



   1. From my research so far, it seems PG wouldn’t work out-of-the-box if
   those tablespaces are read-only. (Even assuming various services like
   vacuum, etc. are all turned off.)

In some documentation/slides, it’s mentioned that even in case of only the
select queries PG might write some metadata into the table’s pages (hints,
or something like that.)

Is this correct? Or is there a way to prevent all writes to the tables
without modifying PG code?

For my education, where can I find some info on those “hints”? (Would also
appreciate pointers to code.)

If it’s not possible to work with readonly tables without code changes,
what would be the possible approaches for development?

Would appreciate any advice/guidance/recommendations.

Thanks in advance!

Reply via email to