On 2017/03/29 12:36, Craig Ringer wrote: > On 29 March 2017 at 10:53, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: >> Hi, >> >> On 2017/03/28 15:40, Kang Yuzhe wrote: >>> Thanks Tsunakawa for such an informative reply. >>> >>> Almost all of the docs related to the internals of PG are of introductory >>> concepts only. >>> There is even more useful PG internals site entitled "The Internals of >>> PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG >>> Internals. >>> >>> The query processing framework that is described in the manual as you >>> mentioned is of informative and introductory nature. >>> In theory, the query processing framework described in the manual is >>> understandable. >>> >>> Unfortunate, it is another story to understand how query processing >>> framework in PG codebase really works. >>> It has become a difficult task for me to walk through the PG source code >>> for example how SELECT/INSERT/TRUNCATE in the the different modules under >>> "src/..". really works. >>> >>> I wish there were Hands-On with PostgreSQL Internals like >>> https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-postgresql-internals/ >>> for more complex PG features. >>> >>> For example, MERGE SQL standard is not supported yet by PG. I wish there >>> were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is >>> implemented in parser/executor/storage etc. modules with detailed >>> explanation for each code and debugging and other important concepts >>> related to system programming. >> >> I am not sure if I can show you that one place where you could learn all >> of that, but many people who started with PostgreSQL development at some >> point started by exploring the source code itself (either for learning or >> to write a feature patch), articles on PostgreSQL wiki, and many related >> presentations accessible using the Internet. I liked the following among >> many others: > > Personally I have to agree that the learning curve is very steep. Some > of the docs and presentations help, but there's a LOT to understand.
I agree too. :) > When you're getting started you're lost in a world of language you > don't know, and trying to understand one piece often gets you lost in > other pieces. In no particular order: > > * Memory contexts and palloc > * Managing transactions and how that interacts with memory contexts > and the default memory context > * Snapshots, snapshot push/pop, etc > * LWLocks, memory barriers, spinlocks, latches > * Heavyweight locks (and the different APIs to them) > * GUCs, their scopes, the rules around their callbacks, etc > * dynahash > * catalogs and oids and access methods > * The heap AM like heap_open > * relcache, catcache, syscache > * genam and the systable_ calls and their limitations with indexes > * The SPI > * When to use each of the above 4! > * Heap tuples and minimal tuples > * VARLENA > * GETSTRUCT, when you can/can't use it, other attribute fetching methods > * TOAST and detoasting datums. > * forming and deforming tuples > * LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines). > * cache invalidations, when they can happen, and how to do anything > safely around them. > * TIDs, cmin and cmax, xmin and xmax > * postmaster, vacuum, bgwriter, checkpointer, startup process, > walsender, walreceiver, all our auxillary procs and what they do > * relmapper, relfilenodes vs relation oids, filenode extents > * ondisk structure, page headers, pages > * shmem management, buffers and buffer pins > * bgworkers > * PG_TRY() and PG_CATCH() and their limitations > * elog and ereport and errcontexts, exception unwinding/longjmp and > how it interacts with memory contexts, lwlocks, etc > * The nest of macros around datum manipulation and functions, PL > handlers. How to find the macros for the data types you want to work > with. > * Everything to do with the C API for arrays (is horrible) > * The details of the parse/rewrite/plan phases with rewrite calling > back into parse, paths, the mess with inheritance_planner, reading and > understanding plantrees > * The permissions and grants model and how to interact with it > * PGPROC, PGXACT, other main shmem structures > * Resource owners (which I still don't fully "get") > * Checkpoints, pg_control and ShmemVariableCache, crash recovery > * How globals are used in Pg and how they interact with fork()ing from > postmaster > * SSI (haven't gone there yet myself) > * .... That is indeed a big list of things to know and (have to) worry about. If we indeed come up with a PG-hackers-handbook someday, things in your list could be organized such that it's clear to someone wanting to contribute code which of those things they need to *absolutely* worry about and which they don't. > Personally I recall finding the magic of resource owner and memory > context changing under me when I started/stopped xacts in a bgworker, > along with the need to manage snapshots and SPI state to be distinctly > confusing. > > There are various READMEs, blog posts, presentation slides/videos, etc > that explain bits and pieces. But not much exists to tie it together > into a comprehensible hole with simple, minimal explanations for each > part so someone who's new to it all can begin to get a handle on it, > find resources to learn more about subsystems they need to care about, > etc. > > Lots of it boils down to "read the code". But so much code! You don't > know if what you're reading is really relevant or if it's even > correct, or if it makes assumptions that differ from your situation. > There are lots of coding rules that aren't necessarily obvious unless > you read the right place, e.g. that you don't need to and shouldn't > LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots > or xacts for you (but will often silently work anyway!). etc. > > I've long intended to start a blog series on postgresql innards > concepts, partly with the intent of turning it into such an overview. > I find that people are better at shouting you down when you're wrong > than they are at writing new material or reviewing proposed docs, so > it's often a good way to fact-check things ;) . Plus it's a good way > to learn. Time is always short though. Agreed on all counts. Look forward to the blog. :) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers