Re: Contributing some code

Antonio Belloni Thu, 28 Dec 2017 07:07:46 -0800

Hi Craig,

Ok. I see your point.


I'll study the storage internals a little bit more and try to figure out a
better approach using the directions you have made available.

Regards,
Antonio Belloni

On Thu, Dec 28, 2017 at 12:51 AM, Craig Ringer <[email protected]>
wrote:

> On 28 December 2017 at 01:40, Antonio Belloni <[email protected]>
> wrote:
>
>> Hi,
>>
>> This is my first post on the list. My name is Antonio. I am a CS grad
>> student and my field of study is about databases and information retrieval.
>> To get some practical knowledge, I've been studying Postgresql codebase for
>> a while.
>>
>> Now I would like to contribute with some code and I've chosen the
>> following topic of the TODO list :
>>
>> Allow reporting of which objects are in which tablespaces
>>
>> This item is difficult because a tablespace can contain objects from
>> multiple databases. There is a server-side function that returns the
>> databases which use a specific tablespace, so this requires a tool that
>> will call that function and connect to each database to find the objects in
>> each database for that tablespace.
>> The topic suggests to use the pg_tablespace_databases to discover which
>> database is using a specific tablespace and then connect to each database
>> and find the objects in the tablespaces.
>> I checked the code of pg_tablespace_databases, defined in
>> src/backend/utils/adt/misc.c, and see that it uses a much simpler approach
>> : It just reads the tablespaces directories and return the name of the
>> directories that represents databases OIDs.
>> Although the function works as expected, I  can see some issues not
>> addressed in the code :
>> - It does not check for permissions. Any user can execute it;- It does
>> not check if the platform supports symlinks, which can cause an error
>> because the function is trying to follow the links defined in
>> base/pg_tblspc.
>> I could use the same approach and write a function that goes down one
>> more level in the directory structure and find the objects' OIDs inside
>> each database directory, but I don't know if this is the better way to do
>> that.
>>
>>
>
> There's a bit of a trap hidden here. The names of relation extents look
> like oids, possibly with an extent number for relations bigger than 1GB.
> But they aren't.  They're relfilenode numbers.
>
> PostgreSQL maps relation oids to relfilenodes. By default on a new system,
> relations will often have the same relfilenode as oid. That's a pity IMO;
> it'd be way less confusing if we allocated relfilenodes from a wholly
> different counter, because as it is, it gives people the false impression
> they can expect the filename relfilenode to be the relation oid.
>
> In fact, what happens (per my probably imperfect understanding) is that
> PostgreSQL checks pg_class (via the relcache) for the oid of the table. It
> then uses RelationIsMapped to see if it's a normal relation with the
> filenode number in pg_class or not. If it's a normal (non-mapped) relation,
> it uses the Relation's rd_node to find the relation's physical address
> tablespace, dboid, and relfilenode. If it's a mapped relation, it instead
> consults the relmapper to find the relation's storage; see
> src/backend/utils/cache/relmapper.c .
>
> See also src/backend/storage/smgr/README,
>
> This means you can't determine relation oids from ondisk state without
> scanning pg_class. And pg_class is per-database, not a shared relation, so
> you must look at each db in turn, since Pg doesn't support cross-DB
> queries. Logical decoding handles this with the RelidByRelfilenode
> function, but there are issues there around making sure you have the right
> snapshot etc.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: Contributing some code

Reply via email to