Re: Contributing with code
Sorry, but I did not want to start a flaming war against the TODO list with my first message. In all the other open source projects I have contributed code, the TODO list is always a start point to newcomers. There's no explicit message in the Postgresql TODO list saying that the projects there are hard, stuck or undesirable. So it's very confusing to a newbie who just want to help and try to learn something in the process. And now I don't know If I should continue to work on the issue on my first message and post my ideas on the list, or if I should find other ways to contribute, for example, fixing bugs from the bug list. Regards, Antonio Belloni Atenciosamente, Antonio Belloni abell...@rioservice.com +55 21 3083-1939 +55 21 99327-0200 RIO SERVICE | Tecnologia em Movimento Av. Pastor Martin Luther King Jr. 126 - Grupo 465 Centro Empresarial Shopping Nova América http://www.rioservice.com http://www.busvision.com.br 2018-01-03 0:47 GMT-02:00 Stephen Frost : > Noah, all, > > * Noah Misch (n...@leadboat.com) wrote: > > On Tue, Jan 02, 2018 at 05:52:37PM -0500, Peter Eisentraut wrote: > > > On 12/31/17 22:43, Craig Ringer wrote: > > > > I'd rather rename it the "stuck, hard and abandoned projects list" > ;) > > > > > > That might actually be useful. > > > > +1. When I do refer to a TODO entry, it's usually because the entry > bears a > > list of threads illustrating the difficulty of a problem. I like the > project > > having such lists, but "TODO" is a bad heading for them. > > Renaming the list is certainly an idea that I could get behind, though I > agree with Chris that we could keep it a bit more positive. I also > agree that the TODO list tends towards projects that are stuck and hard, > which is why I actually think it wouldn't be that hard to go through and > mark the really hard things as really hard or even create an independent > page for them as I suggested elsewhere on this thread, because (at least > from my perception of it- which could be wrong) the overall list > doesn't actually change that much (see above wrt "stuck, hard and > abandoned" comment). If we could see our way forward to really making > it clear that these things are stuck, hard or abandoned then maybe we > can make room for new projects to go on the list that are of reasonable > size for newcomers to the project. > > Thanks! > > Stephen >
Contributing with code
Hi, This is my first post on the list. My name is Antonio. I am a CS grad student and my field of study is about databases and information retrieval. To get some practical knowledge, I've been studying Postgresql codebase for a while. Now I would like to contribute with some code and I've chosen the following topic of the TODO list : Allow reporting of which objects are in which tablespaces This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace. The topic suggests to use the pg_tablespace_databases to discover which database is using a specific tablespace and then connect to each database and find the objects in the tablespaces. I checked the code of pg_tablespace_databases, defined in src/backend/utils/adt/misc.c, and see that it uses a much simpler approach : It just reads the tablespaces directories and return the name of the directories that represents databases OIDs. Although the function works as expected, I can see some issues not addressed in the code : - It does not check for permissions. Any user can execute it;- It does not check if the platform supports symlinks, which can cause an error because the function is trying to follow the links defined in base/pg_tblspc. I could use the same approach and write a function that goes down one more level in the directory structure and find the objects' OIDs inside each database directory, but I don't know if this is the better way to do that. Please, could someone give me feedback and help me with this topic ? Regards,Antonio Belloni
Contributing some code
Hi, This is my first post on the list. My name is Antonio. I am a CS grad student and my field of study is about databases and information retrieval. To get some practical knowledge, I've been studying Postgresql codebase for a while. Now I would like to contribute with some code and I've chosen the following topic of the TODO list : Allow reporting of which objects are in which tablespaces This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace. The topic suggests to use the pg_tablespace_databases to discover which database is using a specific tablespace and then connect to each database and find the objects in the tablespaces. I checked the code of pg_tablespace_databases, defined in src/backend/utils/adt/misc.c, and see that it uses a much simpler approach : It just reads the tablespaces directories and return the name of the directories that represents databases OIDs. Although the function works as expected, I can see some issues not addressed in the code : - It does not check for permissions. Any user can execute it;- It does not check if the platform supports symlinks, which can cause an error because the function is trying to follow the links defined in base/pg_tblspc. I could use the same approach and write a function that goes down one more level in the directory structure and find the objects' OIDs inside each database directory, but I don't know if this is the better way to do that. Please, could someone give me feedback and help me with this topic ? Regards,Antonio Belloni
Re: Contributing some code
Hi Craig, Ok. I see your point. I'll study the storage internals a little bit more and try to figure out a better approach using the directions you have made available. Regards, Antonio Belloni On Thu, Dec 28, 2017 at 12:51 AM, Craig Ringer wrote: > On 28 December 2017 at 01:40, Antonio Belloni > wrote: > >> Hi, >> >> This is my first post on the list. My name is Antonio. I am a CS grad >> student and my field of study is about databases and information retrieval. >> To get some practical knowledge, I've been studying Postgresql codebase for >> a while. >> >> Now I would like to contribute with some code and I've chosen the >> following topic of the TODO list : >> >> Allow reporting of which objects are in which tablespaces >> >> This item is difficult because a tablespace can contain objects from >> multiple databases. There is a server-side function that returns the >> databases which use a specific tablespace, so this requires a tool that >> will call that function and connect to each database to find the objects in >> each database for that tablespace. >> The topic suggests to use the pg_tablespace_databases to discover which >> database is using a specific tablespace and then connect to each database >> and find the objects in the tablespaces. >> I checked the code of pg_tablespace_databases, defined in >> src/backend/utils/adt/misc.c, and see that it uses a much simpler approach >> : It just reads the tablespaces directories and return the name of the >> directories that represents databases OIDs. >> Although the function works as expected, I can see some issues not >> addressed in the code : >> - It does not check for permissions. Any user can execute it;- It does >> not check if the platform supports symlinks, which can cause an error >> because the function is trying to follow the links defined in >> base/pg_tblspc. >> I could use the same approach and write a function that goes down one >> more level in the directory structure and find the objects' OIDs inside >> each database directory, but I don't know if this is the better way to do >> that. >> >> > > There's a bit of a trap hidden here. The names of relation extents look > like oids, possibly with an extent number for relations bigger than 1GB. > But they aren't. They're relfilenode numbers. > > PostgreSQL maps relation oids to relfilenodes. By default on a new system, > relations will often have the same relfilenode as oid. That's a pity IMO; > it'd be way less confusing if we allocated relfilenodes from a wholly > different counter, because as it is, it gives people the false impression > they can expect the filename relfilenode to be the relation oid. > > In fact, what happens (per my probably imperfect understanding) is that > PostgreSQL checks pg_class (via the relcache) for the oid of the table. It > then uses RelationIsMapped to see if it's a normal relation with the > filenode number in pg_class or not. If it's a normal (non-mapped) relation, > it uses the Relation's rd_node to find the relation's physical address > tablespace, dboid, and relfilenode. If it's a mapped relation, it instead > consults the relmapper to find the relation's storage; see > src/backend/utils/cache/relmapper.c . > > See also src/backend/storage/smgr/README, > > This means you can't determine relation oids from ondisk state without > scanning pg_class. And pg_class is per-database, not a shared relation, so > you must look at each db in turn, since Pg doesn't support cross-DB > queries. Logical decoding handles this with the RelidByRelfilenode > function, but there are issues there around making sure you have the right > snapshot etc. > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >