NumericShort vs NumericLong format
Hi, - How can I determine which format will be used for a numeric type? - What the precision and scale values should be for pgsql to use the long format? Is there a threshold?
Get access to the whole query in CustomScan path callback
Hi, The goal is to have access to all the tables that are being scanned or will be scanned as a part of the query. Basically, the callback looks like this: typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); Now, the problem is when there is a nested query, the function will be called once for the parent query and once for the subquery. However, I need access to the whole query in this function. There seems to be no CustomScan callback before this that has the whole query passed to it. Is there any way I can get access to the complete query (or all the relations in the query) by using the parameters passed to this function? Or any other workaround? Thank you and happy holidays!
Get relid for a relation
Hi, In CustomScan cost estimator, where PlannerInfo and RelOptInfo are passed, I want to get access to the relation stats (for example pg_stat_all_tables) by calling pg_stat_fetch_stat_tabentry(). However, I don't have access to relid to pass to this function. For a sample relation, when I hardcode the relid (for example 16385), it works. However, RelOptInfo->relid is always 1 (for whatever relation the query is scanning). Why this happens and how to get access to the correct relid (16385) as in pg_stat_all_tables? Thank you!
Re: Get access to the whole query in CustomScan path callback
I cannot find any information related to other relations in the query other than the one which is being scanned in the root pointer. Is there any function call which can be used to get access to it? On Wed, Dec 21, 2022 at 9:46 AM Tom Lane wrote: > Amin writes: > > The goal is to have access to all the tables that are being scanned or > will > > be scanned as a part of the query. Basically, the callback looks like > this: > > > typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root, > > RelOptInfo *rel, > > Index rti, > > RangeTblEntry *rte); > > > Now, the problem is when there is a nested query, the function will be > > called once for the parent query and once for the subquery. However, I > need > > access to the whole query in this function. There seems to be no > CustomScan > > callback before this that has the whole query passed to it. Is there any > > way I can get access to the complete query (or all the relations in the > > query) by using the parameters passed to this function? Or any other > > workaround? > > Everything the planner knows is accessible via the "root" pointer. > > I very strongly question the idea that a custom scan provider should > be doing what you say you want to do, but the info is there. > > regards, tom lane >
How to find the number of cached pages for a relation?
Hi, Before scanning a relation, in the planner stage, I want to make a call to retrieve information about how many pages will be a hit for a specific relation. The module pg_buffercache seems to be doing a similar thing. Also, pg_statio_all_tables seems to be having that information, but it is updated after execution. However, I want the information before execution. Also not sure how pg_statio_all_tables is created and how I can access it in the code. Thank you!
Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query
Hi there, In my implementation of CustomScan, when I have a query that scans multiple tables (select c from t1,t2,t3), the planner always picks one table to be scanned by CustomScan and offloads the rest to SeqScan. I tried assigning a cost of 0 to the CustomScan path, but still not working. BeginCustomScan gets executed, ExecCustomScan is skipped, and then EndCustomScan is executed for all the tables that are offloaded to Seq Scan. EXPLAIN shows that always only one table is picked to be executed by CustomScan. Any idea what I might be doing wrong? Like a value in a struct I might be setting incorrectly? Thanks!
Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query
To simplify: Can CustomScan scan multiple relations in the same query or it will always be assigned to one or zero relations? On Fri, Apr 14, 2023 at 4:33 PM Amin wrote: > Hi there, > > In my implementation of CustomScan, when I have a query that scans > multiple tables (select c from t1,t2,t3), the planner always picks one > table to be scanned by CustomScan and offloads the rest to SeqScan. I tried > assigning a cost of 0 to the CustomScan path, but still not working. > BeginCustomScan gets executed, ExecCustomScan is skipped, and then > EndCustomScan is executed for all the tables that are offloaded to Seq > Scan. EXPLAIN shows that always only one table is picked to be executed by > CustomScan. Any idea what I might be doing wrong? Like a value in a struct > I might be setting incorrectly? > > Thanks! >
Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query
Hi Tom, I made sure EXPLAIN returns CustomScan for all scans in the query. But still, ExecCustomScan is only called once while the rest of the functions are called for each scan separately. Is this expected behavior? How to work around this? Thank you! On Mon, Apr 17, 2023 at 3:45 PM Tom Lane wrote: > Amin writes: > > To simplify: Can CustomScan scan multiple relations in the same query or > it > > will always be assigned to one or zero relations? > > There's barely any code in the core planner that is specific to custom > scans. Almost certainly this misbehavior is the fault of your > custom-path-creation code. Maybe you're labeling the paths with the > wrong parent relation, or forgetting to submit them to add_path, > or assigning them costs that are high enough to get them rejected? > > regards, tom lane >
Traversing targetlist to find accessed columns
Hi, Having a query, I am trying to find out all the columns that need to be accessed (their varattno and vartype). I have access to a targetlist representing a tree like this. So, I am looking for a function that recursively traverses the tree and gives me the VARs. So, for SELECT a,b,b+c from tab; I am interested in [a,b]. Is such a function currently implemented in postgresql? How can I use it? :targetlist ( >{TARGETENTRY >:expr > {VAR > :varno 1 > :varattno 1 > :vartype 23 > :vartypmod -1 > :varcollid 0 > :varlevelsup 0 > :varnosyn 1 > :varattnosyn 1 > :location 7 > } >:resno 1 >:resname l_orderkey >:ressortgroupref 0 >:resorigtbl 24805 >:resorigcol 1 >:resjunk false >} >{TARGETENTRY >:expr > {VAR > :varno 1 > :varattno 2 > :vartype 23 > :vartypmod -1 > :varcollid 0 > :varlevelsup 0 > :varnosyn 1 > :varattnosyn 2 > :location 18 > } >:resno 2 >:resname l_partkey >:ressortgroupref 0 >:resorigtbl 24805 >:resorigcol 2 >:resjunk false >} >{TARGETENTRY >:expr > {OPEXPR > :opno 551 > :opfuncid 177 > :opresulttype 23 > :opretset false > :opcollid 0 > :inputcollid 0 > :args ( > {OPEXPR > :opno 551 > :opfuncid 177 > :opresulttype 23 > :opretset false > :opcollid 0 > :inputcollid 0 > :args ( > {VAR > :varno 1 > :varattno 1 > :vartype 23 > :vartypmod -1 > :varcollid 0 > :varlevelsup 0 > :varnosyn 1 > :varattnosyn 1 > :location 28 > } > {VAR > :varno 1 > :varattno 2 > :vartype 23 > :vartypmod -1 > :varcollid 0 > :varlevelsup 0 > :varnosyn 1 > :varattnosyn 2 > :location 39 > } > ) > :location 38 > } > {VAR > :varno 1 > :varattno 3 > :vartype 23 > :vartypmod -1 > :varcollid 0 > :varlevelsup 0 > :varnosyn 1 > :varattnosyn 3 > :location 49 > } >) >:location 48 >} > :resno 3 > :resname ?column? > :ressortgroupref 0 > :resorigtbl 0 > :resorigcol 0 > :resjunk false > } > ) >
Creating HeapTuple from char and date values
Hi All, I am trying to create HeapTuple data structure. First, I create a tuple descriptor: TupleDesc *td=CreateTemplateTupleDesc(colCount); Then, for each variable, I do: TupleDescInitEntry(*td,v->varattno,NULL,v->vartype,v->vartypmod,0); Then, I assign values: if int32: values[v->varattno-1]=Int8GetDatum(myValue); Similarly for float. Finally, I create the HeapTuple: HeapTuple tuple=heap_form_tuple(td,values,isnull); Everything works fine with int and float. But I don't know how to handle chars. Let's say we have a character(10) column. One problem is v->vartypmod will be set to 14. Shouldn't it be 10? Second, how should I assign values? Is values[v->varattno-1]=CStringGetDatum(myValue); correct? Should I set the last parameter to TupleDescInitEntry? Why am I getting "invalid memory alloc request size" or segfault with different configurations?
Getting relations accessed by a query using the raw query string
Hi, Having a query string, I am trying to use the postgres parser to find which relations the query accesses. This is what I currently have: const char *query_string="select * from dummytable;"; List *parsetree_list=pg_parse_query(query_string); ListCell *parsetree_item; foreach(parsetree_item,parsetree_list){ RawStmt *parsetree=lfirst_node(RawStmt,parsetree_item); Query *query=parse_analyze(parsetree,query_string,NULL,0,NULL); } However, when I inspect the variable "query", it is not populated correctly. For example, commandType is set to CMD_DELETE while I have passed a SELECT query. - What am I doing wrong? - Once I get the query correctly, how can I get the list of relations it gets access to? - Or any other ways to get the list of relations from raw query string through postgres calls? Thank you!
Re: How to find the number of cached pages for a relation?
Thank you Andres. If I want to do "a" ( Do one probe of the buffer mapping table for each block of the relation. Thus O(#relation blocks)) what function calls can I use, assuming I only have access to the relation id? How can I access and scan the buffer mapping table? On Fri, Jan 13, 2023 at 6:27 PM Andres Freund wrote: > Hi, > > On 2023-01-13 17:28:31 -0800, Amin wrote: > > Before scanning a relation, in the planner stage, I want to make a call > to > > retrieve information about how many pages will be a hit for a specific > > relation. The module pg_buffercache seems to be doing a similar thing. > > Also, pg_statio_all_tables seems to be having that information, but it is > > updated after execution. However, I want the information before > execution. > > Also not sure how pg_statio_all_tables is created and how I can access it > > in the code. > > There's no cheap way to do that. Currently the only ways are to: > > a) Do one probe of the buffer mapping table for each block of the >relation. Thus O(#relation blocks). > > b) Scan all of buffer headers, check which are for the relation. Thus >O(#NBuffers) > > Neither of which are a good idea during planning. > > > It might be a bit more realistic to get very rough estimates: > > You could compute the table's historic cache hit ratio from pgstats (i.e. > use > the data backing pg_statio_all_tables). Of course that's not going to be > specific to your query (for index scans etc), and might have changed more > recently. It'd also be completely wrong after a restart. > > If we had information about *recent* cache hit patterns for the relation, > it'd > be a lot better, but we don't have the infrastructure for that, and > introducing it would increase the size of the stats entries noticably. > > Another way could be to probe the buffer mapping table for a small subset > of > the locks and infer the likelihood of other blocks being in shared buffers > that way. > > A third way could be to track the cache hit for relations in backend local > memory, likely in the relache entry. The big disadvantage would be that > query > plans would differ between connections and that connections would need to > "warm up" to have good plans. But it'd handle restarts nicely. > > Greetings, > > Andres Freund >
Scan buffercache for a table
Hi, I am looking for function calls to scan the buffer cache for a table and find the cached pages. I want to find out which pages are cached and which of them are dirty. Having the relation id, how can I do that? I have gone through bufmgr.c and relcache.c, but could not find a way to get relation-specific pages from the buffer cache. Thank you!
Re: Scan buffercache for a table
Thank you Justin. I started a new thread because the context is a little bit different. I am no longer interested in statistics anymore. I want to find exact individual pages of a table which are cached and are/aren't dirty. pg_buffercache implements the loop, but it goes over all the buffers. However, I want to scan a specific table cache pages. On Mon, Jan 30, 2023 at 6:43 PM Justin Pryzby wrote: > On Mon, Jan 30, 2023 at 06:01:08PM -0800, Amin wrote: > > Hi, > > > > I am looking for function calls to scan the buffer cache for a table and > > find the cached pages. I want to find out which pages are cached and > which > > of them are dirty. Having the relation id, how can I do that? I have gone > > through bufmgr.c and relcache.c, but could not find a way to get > > relation-specific pages from the buffer cache. > > This looks like a re-post of the question you asked on Jan 13: > caf-ka8_axsmpqw1scotnaqx8nfhgmjc6l87qzao3jezlibu...@mail.gmail.com > It'd be better not to start a new thread (or if you do that, it'd be > better to mention the old one and include its participants). > > On Fri, Jan 13, 2023 at 05:28:31PM -0800, Amin wrote: > > Hi, > > > > Before scanning a relation, in the planner stage, I want to make a > > call to > > retrieve information about how many pages will be a hit for a specific > > relation. The module pg_buffercache seems to be doing a similar thing. > > The planner is a *model* which (among other things) tries to guess how > many pages will be read/hit. It's not expected to be anywhere near > accurate. > > pg_buffercache only checks for pages within postgres' own buffer cache. > It doesn't look for pages which are in the OS page cache, which require > a system call to access (but don't require device I/O). > > Read about pgfincore for introinspection of the OS page cache. > > > Also, pg_statio_all_tables seems to be having that information, but it > > is updated after execution. However, I want the information before > > execution. Also not sure how pg_statio_all_tables is created and how > > I can access it in the code. > > But the view isn't omnicient. When you execute a plan, you don't know > how it's going to end. If you did, you wouldn't need to run it - you > could just print the answer. > > Note that planning and execution are separate and independant. It's > possible to plan a query without ever running it, or to plan it once and > run it multiple times. The view reflects I/O requested by postgres; the > I/O normally comes primarily from execution. > > You can look at how the view is defined: > \sv pg_statio_all_tables > > And then you can look at how the functions that it calls are implemented > (\df+). Same for pg_buffercache. It seems like you'll want to learn > how to navigate the source code to find how things are connected. > > -- > Justin >
Share variable between psql backends in CustomScan
Hi, I am looking for a way to define a global variable in CustomScan plugin that is shared between different psql backends. Is it possible without using shared memory? Does postgresql implement any function that facilitates this? Thank you, Amin