Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

Kang Yuzhe Wed, 29 Mar 2017 01:30:08 -0700

Thanks Amit for further confirmation on the  Craig's intention.

I am looking forward to seeing your "PG internal machinery under
microscope" blog. May health, persistence and courage be with YOU.


Regards,
Zeray

On Wed, Mar 29, 2017 at 10:36 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2017/03/29 12:36, Craig Ringer wrote:
> > On 29 March 2017 at 10:53, Amit Langote <langote_amit...@lab.ntt.co.jp>
> wrote:
> >> Hi,
> >>
> >> On 2017/03/28 15:40, Kang Yuzhe wrote:
> >>> Thanks Tsunakawa for such an informative reply.
> >>>
> >>> Almost all of the docs related to the internals of PG are of
> introductory
> >>> concepts only.
> >>> There is even more useful PG internals site entitled "The Internals of
> >>> PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese
> PG
> >>> Internals.
> >>>
> >>> The query processing framework that is described in the manual as you
> >>> mentioned is of informative and introductory nature.
> >>> In theory, the query processing framework described in the manual is
> >>> understandable.
> >>>
> >>> Unfortunate, it is another story to understand how query processing
> >>> framework in PG codebase really works.
> >>> It has become a difficult task for me to walk through the PG source
> code
> >>> for example how SELECT/INSERT/TRUNCATE in the the different modules
> under
> >>> "src/..". really works.
> >>>
> >>> I wish there were Hands-On with PostgreSQL Internals like
> >>> https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-
> postgresql-internals/
> >>> for more complex PG features.
> >>>
> >>> For example, MERGE SQL standard is not supported yet by PG.  I wish
> there
> >>> were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
> >>> implemented in parser/executor/storage etc. modules with detailed
> >>> explanation for each code and debugging and other important concepts
> >>> related to system programming.
> >>
> >> I am not sure if I can show you that one place where you could learn all
> >> of that, but many people who started with PostgreSQL development at some
> >> point started by exploring the source code itself (either for learning
> or
> >> to write a feature patch), articles on PostgreSQL wiki, and many related
> >> presentations accessible using the Internet. I liked the following among
> >> many others:
> >
> > Personally I have to agree that the learning curve is very steep. Some
> > of the docs and presentations help, but there's a LOT to understand.
>
> I agree too. :)
>
> > When you're getting started you're lost in a world of language you
> > don't know, and trying to understand one piece often gets you lost in
> > other pieces. In no particular order:
> >
> > * Memory contexts and palloc
> > * Managing transactions and how that interacts with memory contexts
> > and the default memory context
> > * Snapshots, snapshot push/pop, etc
> > * LWLocks, memory barriers, spinlocks, latches
> > * Heavyweight locks (and the different APIs to them)
> > * GUCs, their scopes, the rules around their callbacks, etc
> > * dynahash
> > * catalogs and oids and access methods
> > * The heap AM like heap_open
> > * relcache, catcache, syscache
> > * genam and the systable_ calls and their limitations with indexes
> > * The SPI
> > * When to use each of the above 4!
> > * Heap tuples and minimal tuples
> > * VARLENA
> > * GETSTRUCT, when you can/can't use it, other attribute fetching methods
> > * TOAST and detoasting datums.
> > * forming and deforming tuples
> > * LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines).
> > * cache invalidations, when they can happen, and how to do anything
> > safely around them.
> > * TIDs, cmin and cmax, xmin and xmax
> > * postmaster, vacuum, bgwriter, checkpointer, startup process,
> > walsender, walreceiver, all our auxillary procs and what they do
> > * relmapper, relfilenodes vs relation oids, filenode extents
> > * ondisk structure, page headers, pages
> > * shmem management, buffers and buffer pins
> > * bgworkers
> > * PG_TRY() and PG_CATCH() and their limitations
> > * elog and ereport and errcontexts, exception unwinding/longjmp and
> > how it interacts with memory contexts, lwlocks, etc
> > * The nest of macros around datum manipulation and functions, PL
> > handlers. How to find the macros for the data types you want to work
> > with.
> > * Everything to do with the C API for arrays (is horrible)
> > * The details of the parse/rewrite/plan phases with rewrite calling
> > back into parse, paths, the mess with inheritance_planner, reading and
> > understanding plantrees
> > * The permissions and grants model and how to interact with it
> > * PGPROC, PGXACT, other main shmem structures
> > * Resource owners (which I still don't fully "get")
> > * Checkpoints, pg_control and ShmemVariableCache, crash recovery
> > * How globals are used in Pg and how they interact with fork()ing from
> > postmaster
> > * SSI (haven't gone there yet myself)
> > * ....
>
> That is indeed a big list of things to know and (have to) worry about.  If
> we indeed come up with a PG-hackers-handbook someday, things in your list
> could be organized such that it's clear to someone wanting to contribute
> code which of those things they need to *absolutely* worry about and which
> they don't.
>
> > Personally I recall finding the magic of resource owner and memory
> > context changing under me when I started/stopped xacts in a bgworker,
> > along with the need to manage snapshots and SPI state to be distinctly
> > confusing.
> >
> > There are various READMEs, blog posts, presentation slides/videos, etc
> > that explain bits and pieces. But not much exists to tie it together
> > into a comprehensible hole with simple, minimal explanations for each
> > part so someone who's new to it all can begin to get a handle on it,
> > find resources to learn more about subsystems they need to care about,
> > etc.
> >
> > Lots of it boils down to "read the code". But so much code! You don't
> > know if what you're reading is really relevant or if it's even
> > correct, or if it makes assumptions that differ from your situation.
> > There are lots of coding rules that aren't necessarily obvious unless
> > you read the right place, e.g. that you don't need to and shouldn't
> > LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots
> > or xacts for you (but will often silently work anyway!). etc.
> >
> > I've long intended to start a blog series on postgresql innards
> > concepts, partly with the intent of turning it into such an overview.
> > I find that people are better at shouting you down when you're wrong
> > than they are at writing new material or reviewing proposed docs, so
> > it's often a good way to fact-check things ;) .  Plus it's a good way
> > to learn. Time is always short though.
>
> Agreed on all counts.  Look forward to the blog. :)
>
> Thanks,
> Amit
>
>
>

Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

Reply via email to