Thanks Amit for further confirmation on the Craig's intention. I am looking forward to seeing your "PG internal machinery under microscope" blog. May health, persistence and courage be with YOU.
Regards, Zeray On Wed, Mar 29, 2017 at 10:36 AM, Amit Langote < langote_amit...@lab.ntt.co.jp> wrote: > On 2017/03/29 12:36, Craig Ringer wrote: > > On 29 March 2017 at 10:53, Amit Langote <langote_amit...@lab.ntt.co.jp> > wrote: > >> Hi, > >> > >> On 2017/03/28 15:40, Kang Yuzhe wrote: > >>> Thanks Tsunakawa for such an informative reply. > >>> > >>> Almost all of the docs related to the internals of PG are of > introductory > >>> concepts only. > >>> There is even more useful PG internals site entitled "The Internals of > >>> PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese > PG > >>> Internals. > >>> > >>> The query processing framework that is described in the manual as you > >>> mentioned is of informative and introductory nature. > >>> In theory, the query processing framework described in the manual is > >>> understandable. > >>> > >>> Unfortunate, it is another story to understand how query processing > >>> framework in PG codebase really works. > >>> It has become a difficult task for me to walk through the PG source > code > >>> for example how SELECT/INSERT/TRUNCATE in the the different modules > under > >>> "src/..". really works. > >>> > >>> I wish there were Hands-On with PostgreSQL Internals like > >>> https://bkmjournal.wordpress.com/2017/01/22/hands-on-with- > postgresql-internals/ > >>> for more complex PG features. > >>> > >>> For example, MERGE SQL standard is not supported yet by PG. I wish > there > >>> were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is > >>> implemented in parser/executor/storage etc. modules with detailed > >>> explanation for each code and debugging and other important concepts > >>> related to system programming. > >> > >> I am not sure if I can show you that one place where you could learn all > >> of that, but many people who started with PostgreSQL development at some > >> point started by exploring the source code itself (either for learning > or > >> to write a feature patch), articles on PostgreSQL wiki, and many related > >> presentations accessible using the Internet. I liked the following among > >> many others: > > > > Personally I have to agree that the learning curve is very steep. Some > > of the docs and presentations help, but there's a LOT to understand. > > I agree too. :) > > > When you're getting started you're lost in a world of language you > > don't know, and trying to understand one piece often gets you lost in > > other pieces. In no particular order: > > > > * Memory contexts and palloc > > * Managing transactions and how that interacts with memory contexts > > and the default memory context > > * Snapshots, snapshot push/pop, etc > > * LWLocks, memory barriers, spinlocks, latches > > * Heavyweight locks (and the different APIs to them) > > * GUCs, their scopes, the rules around their callbacks, etc > > * dynahash > > * catalogs and oids and access methods > > * The heap AM like heap_open > > * relcache, catcache, syscache > > * genam and the systable_ calls and their limitations with indexes > > * The SPI > > * When to use each of the above 4! > > * Heap tuples and minimal tuples > > * VARLENA > > * GETSTRUCT, when you can/can't use it, other attribute fetching methods > > * TOAST and detoasting datums. > > * forming and deforming tuples > > * LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines). > > * cache invalidations, when they can happen, and how to do anything > > safely around them. > > * TIDs, cmin and cmax, xmin and xmax > > * postmaster, vacuum, bgwriter, checkpointer, startup process, > > walsender, walreceiver, all our auxillary procs and what they do > > * relmapper, relfilenodes vs relation oids, filenode extents > > * ondisk structure, page headers, pages > > * shmem management, buffers and buffer pins > > * bgworkers > > * PG_TRY() and PG_CATCH() and their limitations > > * elog and ereport and errcontexts, exception unwinding/longjmp and > > how it interacts with memory contexts, lwlocks, etc > > * The nest of macros around datum manipulation and functions, PL > > handlers. How to find the macros for the data types you want to work > > with. > > * Everything to do with the C API for arrays (is horrible) > > * The details of the parse/rewrite/plan phases with rewrite calling > > back into parse, paths, the mess with inheritance_planner, reading and > > understanding plantrees > > * The permissions and grants model and how to interact with it > > * PGPROC, PGXACT, other main shmem structures > > * Resource owners (which I still don't fully "get") > > * Checkpoints, pg_control and ShmemVariableCache, crash recovery > > * How globals are used in Pg and how they interact with fork()ing from > > postmaster > > * SSI (haven't gone there yet myself) > > * .... > > That is indeed a big list of things to know and (have to) worry about. If > we indeed come up with a PG-hackers-handbook someday, things in your list > could be organized such that it's clear to someone wanting to contribute > code which of those things they need to *absolutely* worry about and which > they don't. > > > Personally I recall finding the magic of resource owner and memory > > context changing under me when I started/stopped xacts in a bgworker, > > along with the need to manage snapshots and SPI state to be distinctly > > confusing. > > > > There are various READMEs, blog posts, presentation slides/videos, etc > > that explain bits and pieces. But not much exists to tie it together > > into a comprehensible hole with simple, minimal explanations for each > > part so someone who's new to it all can begin to get a handle on it, > > find resources to learn more about subsystems they need to care about, > > etc. > > > > Lots of it boils down to "read the code". But so much code! You don't > > know if what you're reading is really relevant or if it's even > > correct, or if it makes assumptions that differ from your situation. > > There are lots of coding rules that aren't necessarily obvious unless > > you read the right place, e.g. that you don't need to and shouldn't > > LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots > > or xacts for you (but will often silently work anyway!). etc. > > > > I've long intended to start a blog series on postgresql innards > > concepts, partly with the intent of turning it into such an overview. > > I find that people are better at shouting you down when you're wrong > > than they are at writing new material or reviewing proposed docs, so > > it's often a good way to fact-check things ;) . Plus it's a good way > > to learn. Time is always short though. > > Agreed on all counts. Look forward to the blog. :) > > Thanks, > Amit > > >