Re: Contributing with code

2018-01-05 Thread Antonio Belloni
Sorry, but I did not want to start a flaming war against the TODO list with
my first message. In all the other open source projects I have contributed
code, the TODO list is always a start point to newcomers. There's no
explicit message in the Postgresql TODO list saying that the projects there
are hard, stuck or undesirable. So it's very confusing to a newbie who just
want to help and try to learn something in the process.

And now I don't know If I should continue to work on the issue on my first
message and post my ideas on the list, or if I should find other ways to
contribute, for example, fixing bugs from the bug list.

Regards,
Antonio Belloni


Atenciosamente,
Antonio Belloni
abell...@rioservice.com
+55 21 3083-1939
+55 21 99327-0200
RIO SERVICE | Tecnologia em Movimento
Av. Pastor Martin Luther King Jr. 126 - Grupo 465
Centro Empresarial Shopping Nova América
http://www.rioservice.com
http://www.busvision.com.br


2018-01-03 0:47 GMT-02:00 Stephen Frost :

> Noah, all,
>
> * Noah Misch (n...@leadboat.com) wrote:
> > On Tue, Jan 02, 2018 at 05:52:37PM -0500, Peter Eisentraut wrote:
> > > On 12/31/17 22:43, Craig Ringer wrote:
> > > > I'd rather rename it the "stuck, hard and abandoned projects list"
> ;)
> > >
> > > That might actually be useful.
> >
> > +1.  When I do refer to a TODO entry, it's usually because the entry
> bears a
> > list of threads illustrating the difficulty of a problem.  I like the
> project
> > having such lists, but "TODO" is a bad heading for them.
>
> Renaming the list is certainly an idea that I could get behind, though I
> agree with Chris that we could keep it a bit more positive.  I also
> agree that the TODO list tends towards projects that are stuck and hard,
> which is why I actually think it wouldn't be that hard to go through and
> mark the really hard things as really hard or even create an independent
> page for them as I suggested elsewhere on this thread, because (at least
> from my perception of it- which could be wrong) the overall list
> doesn't actually change that much (see above wrt "stuck, hard and
> abandoned" comment).  If we could see our way forward to really making
> it clear that these things are stuck, hard or abandoned then maybe we
> can make room for new projects to go on the list that are of reasonable
> size for newcomers to the project.
>
> Thanks!
>
> Stephen
>


Contributing with code

2017-12-27 Thread Antonio Belloni
Hi,

This is my first post on the list. My name is Antonio. I am a CS grad
student and my field of study is about databases and information retrieval.
To get some practical knowledge, I've been studying Postgresql codebase for
a while.

Now I would like to contribute with some code and I've chosen the following
topic of the TODO list :

Allow reporting of which objects are in which tablespaces

This item is difficult because a tablespace can contain objects from
multiple databases. There is a server-side function that returns the
databases which use a specific tablespace, so this requires a tool that
will call that function and connect to each database to find the objects in
each database for that tablespace.
The topic suggests to use the pg_tablespace_databases to discover which
database is using a specific tablespace and then connect to each database
and find the objects in the tablespaces.
I checked the code of pg_tablespace_databases, defined in
src/backend/utils/adt/misc.c, and see that it uses a much simpler approach
: It just reads the tablespaces directories and return the name of the
directories that represents databases OIDs.
Although the function works as expected, I  can see some issues not
addressed in the code :
- It does not check for permissions. Any user can execute it;- It does not
check if the platform supports symlinks, which can cause an error because
the function is trying to follow the links defined in base/pg_tblspc.
I could use the same approach and write a function that goes down one more
level in the directory structure and find the objects' OIDs inside each
database directory, but I don't know if this is the better way to do that.
Please, could someone give me feedback and help me with this topic ?
Regards,Antonio Belloni


Contributing some code

2017-12-27 Thread Antonio Belloni
Hi,

This is my first post on the list. My name is Antonio. I am a CS grad
student and my field of study is about databases and information retrieval.
To get some practical knowledge, I've been studying Postgresql codebase for
a while.

Now I would like to contribute with some code and I've chosen the following
topic of the TODO list :

Allow reporting of which objects are in which tablespaces

This item is difficult because a tablespace can contain objects from
multiple databases. There is a server-side function that returns the
databases which use a specific tablespace, so this requires a tool that
will call that function and connect to each database to find the objects in
each database for that tablespace.
The topic suggests to use the pg_tablespace_databases to discover which
database is using a specific tablespace and then connect to each database
and find the objects in the tablespaces.
I checked the code of pg_tablespace_databases, defined in
src/backend/utils/adt/misc.c, and see that it uses a much simpler approach
: It just reads the tablespaces directories and return the name of the
directories that represents databases OIDs.
Although the function works as expected, I  can see some issues not
addressed in the code :
- It does not check for permissions. Any user can execute it;- It does not
check if the platform supports symlinks, which can cause an error because
the function is trying to follow the links defined in base/pg_tblspc.
I could use the same approach and write a function that goes down one more
level in the directory structure and find the objects' OIDs inside each
database directory, but I don't know if this is the better way to do that.
Please, could someone give me feedback and help me with this topic ?
Regards,Antonio Belloni


Re: Contributing some code

2017-12-28 Thread Antonio Belloni
Hi Craig,

Ok. I see your point.

I'll study the storage internals a little bit more and try to figure out a
better approach using the directions you have made available.

Regards,
Antonio Belloni

On Thu, Dec 28, 2017 at 12:51 AM, Craig Ringer 
wrote:

> On 28 December 2017 at 01:40, Antonio Belloni 
> wrote:
>
>> Hi,
>>
>> This is my first post on the list. My name is Antonio. I am a CS grad
>> student and my field of study is about databases and information retrieval.
>> To get some practical knowledge, I've been studying Postgresql codebase for
>> a while.
>>
>> Now I would like to contribute with some code and I've chosen the
>> following topic of the TODO list :
>>
>> Allow reporting of which objects are in which tablespaces
>>
>> This item is difficult because a tablespace can contain objects from
>> multiple databases. There is a server-side function that returns the
>> databases which use a specific tablespace, so this requires a tool that
>> will call that function and connect to each database to find the objects in
>> each database for that tablespace.
>> The topic suggests to use the pg_tablespace_databases to discover which
>> database is using a specific tablespace and then connect to each database
>> and find the objects in the tablespaces.
>> I checked the code of pg_tablespace_databases, defined in
>> src/backend/utils/adt/misc.c, and see that it uses a much simpler approach
>> : It just reads the tablespaces directories and return the name of the
>> directories that represents databases OIDs.
>> Although the function works as expected, I  can see some issues not
>> addressed in the code :
>> - It does not check for permissions. Any user can execute it;- It does
>> not check if the platform supports symlinks, which can cause an error
>> because the function is trying to follow the links defined in
>> base/pg_tblspc.
>> I could use the same approach and write a function that goes down one
>> more level in the directory structure and find the objects' OIDs inside
>> each database directory, but I don't know if this is the better way to do
>> that.
>>
>>
>
> There's a bit of a trap hidden here. The names of relation extents look
> like oids, possibly with an extent number for relations bigger than 1GB.
> But they aren't.  They're relfilenode numbers.
>
> PostgreSQL maps relation oids to relfilenodes. By default on a new system,
> relations will often have the same relfilenode as oid. That's a pity IMO;
> it'd be way less confusing if we allocated relfilenodes from a wholly
> different counter, because as it is, it gives people the false impression
> they can expect the filename relfilenode to be the relation oid.
>
> In fact, what happens (per my probably imperfect understanding) is that
> PostgreSQL checks pg_class (via the relcache) for the oid of the table. It
> then uses RelationIsMapped to see if it's a normal relation with the
> filenode number in pg_class or not. If it's a normal (non-mapped) relation,
> it uses the Relation's rd_node to find the relation's physical address
> tablespace, dboid, and relfilenode. If it's a mapped relation, it instead
> consults the relmapper to find the relation's storage; see
> src/backend/utils/cache/relmapper.c .
>
> See also src/backend/storage/smgr/README,
>
> This means you can't determine relation oids from ondisk state without
> scanning pg_class. And pg_class is per-database, not a shared relation, so
> you must look at each db in turn, since Pg doesn't support cross-DB
> queries. Logical decoding handles this with the RelidByRelfilenode
> function, but there are issues there around making sure you have the right
> snapshot etc.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>