Thank you Paul! This certainly helps. On Thu, Apr 23, 2020 at 12:26 PM Paul Jungwirth <p...@illuminatedcomputing.com> wrote:
> On 4/23/20 8:44 AM, Preethi S wrote: > > I am fairly new to postgres and I am trying to understand how the data > > is processed during the insert from buffer to the disk. Can someone help > > me with that? Also, I would like to see source code workflow. Can > > someone help me with finding the source code for the data > > insertion/modification workflow. > > I'm also a Postgres hacker newbie, but I've spent some time adding > SQL:2011 FOR PORTION OF support to UPDATE/DELETE, so I've gone through > that learning process. (I should say "going through". :-) > > I'd say be prepared to spend a *lot* of time reading the code. > Personally I use `grep -r` a lot and just read and read. For specifics > you can use a debugger or insert `ereport(NOTICE, (errmsg("something > %s", foo)))` and run queries (or the test suite). Also many subfolders > have an extensive README that will guide you. Some of the READMEs may > take an hour or more to get through and understand, but reading them is > worth it. > > It helped me a lot to spend several years writing occasional Postgres C > extensions before really doing anything in the core codebase. There are > lots of basics you learn that way. There are a bunch of articles and > presentations out there about that you might find helpful. > > Postgres processes queries in several steps: > > - parse > - analyze > - rewrite > - plan > - optimize > - execute > > The parse step is a bison grammar (look for gram.y). Basically it fills > in structs cutting up what the user typed. > > The analyze step starts to make sense of the parse results. Look at > parser/analyze.c. It maps input strings to database objects---for > example looking up table/column names (and making sure they really > exist). Here you're sort of just copying things from the parse structs > to different structs. You're building up Node trees that later steps can > use. I think the analyze step is often considered to be still part of > the parse phase. > > It seems like each SQL "clause" has its own transformFoo function, so > probably you'll want to add your own (transformMyAwesomeFeatureClause) > and then call it from its "parent" (e.g. transformUpdateStmt). > > If you add new Node types you'll need to edit nodes/*funcs.c and also > probably teach some switch statements how to handle them. If you are > filling in a struct but then later in the pipeline find that what you > wrote isn't there anymore, you probably forgot to implement a copy > function. > > The rewrite/plan/optimize steps aren't things you need to worry about > too much if you're interested in DML, but you can read more about them > in the source code. Especially rewrite is pretty niche (views and RULEs). > > The execute step is the most challenging I think. It has its own Node > trees and also keeps an execution state. Probably you'll need to look at > src/backend/executor/nodeModifyTable.c among others. You'll also need to > learn about TupleTableSlots. (If anyone here has a good learning > resource for TTS I would also be glad to read it.) > > I'm afraid this description is comically dumbed down, but hopefully it > can be something like a map. I'd probably just take an UPDATE statement > and try to trace it through the pipeline, and maybe experiment with > small changes along the way. You can add things to src/test/regress as > you go. > > And the mailing list is a very friendly place to ask questions. > > Yours, > > -- > Paul ~{:-) > p...@illuminatedcomputing.com > > >