NumericShort vs NumericLong format

2023-03-06 Thread Amin
Hi,

- How can I determine which format will be used for a numeric type?
- What the precision and scale values should be for pgsql to use the long
format? Is there a threshold?


Get access to the whole query in CustomScan path callback

2022-12-21 Thread Amin
Hi,

The goal is to have access to all the tables that are being scanned or will
be scanned as a part of the query. Basically, the callback looks like this:

typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
RelOptInfo *rel,
Index rti,
RangeTblEntry *rte);

Now, the problem is when there is a nested query, the function will be
called once for the parent query and once for the subquery. However, I need
access to the whole query in this function. There seems to be no CustomScan
callback before this that has the whole query passed to it. Is there any
way I can get access to the complete query (or all the relations in the
query) by using the parameters passed to this function? Or any other
workaround?

Thank you and happy holidays!


Get relid for a relation

2023-01-13 Thread Amin
Hi,

In CustomScan cost estimator, where PlannerInfo and RelOptInfo are passed,
I want to get access to the relation stats (for example pg_stat_all_tables)
by calling pg_stat_fetch_stat_tabentry(). However, I don't have access to
relid to pass to this function. For a sample relation, when I hardcode the
relid (for example 16385), it works. However, RelOptInfo->relid is always 1
(for whatever relation the query is scanning). Why this happens and how to
get access to the correct relid (16385) as in pg_stat_all_tables?

Thank you!


Re: Get access to the whole query in CustomScan path callback

2023-01-13 Thread Amin
I cannot find any information related to other relations in the query other
than the one which is being scanned in the root pointer. Is there any
function call which can be used to get access to it?

On Wed, Dec 21, 2022 at 9:46 AM Tom Lane  wrote:

> Amin  writes:
> > The goal is to have access to all the tables that are being scanned or
> will
> > be scanned as a part of the query. Basically, the callback looks like
> this:
>
> > typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
> > RelOptInfo *rel,
> > Index rti,
> > RangeTblEntry *rte);
>
> > Now, the problem is when there is a nested query, the function will be
> > called once for the parent query and once for the subquery. However, I
> need
> > access to the whole query in this function. There seems to be no
> CustomScan
> > callback before this that has the whole query passed to it. Is there any
> > way I can get access to the complete query (or all the relations in the
> > query) by using the parameters passed to this function? Or any other
> > workaround?
>
> Everything the planner knows is accessible via the "root" pointer.
>
> I very strongly question the idea that a custom scan provider should
> be doing what you say you want to do, but the info is there.
>
> regards, tom lane
>


How to find the number of cached pages for a relation?

2023-01-13 Thread Amin
Hi,

Before scanning a relation, in the planner stage, I want to make a call to
retrieve information about how many pages will be a hit for a specific
relation. The module pg_buffercache seems to be doing a similar thing.
Also, pg_statio_all_tables seems to be having that information, but it is
updated after execution. However, I want the information before execution.
Also not sure how pg_statio_all_tables is created and how I can access it
in the code.

Thank you!


Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-14 Thread Amin
Hi there,

In my implementation of CustomScan, when I have a query that scans multiple
tables (select c from t1,t2,t3), the planner always picks one table to be
scanned by CustomScan and offloads the rest to SeqScan. I tried assigning a
cost of 0 to the CustomScan path, but still not working. BeginCustomScan
gets executed, ExecCustomScan is skipped, and then EndCustomScan is
executed for all the tables that are offloaded to Seq Scan. EXPLAIN shows
that always only one table is picked to be executed by CustomScan. Any idea
what I might be doing wrong? Like a value in a struct I might be setting
incorrectly?

Thanks!


Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-17 Thread Amin
To simplify: Can CustomScan scan multiple relations in the same query or it
will always be assigned to one or zero relations?

On Fri, Apr 14, 2023 at 4:33 PM Amin  wrote:

> Hi there,
>
> In my implementation of CustomScan, when I have a query that scans
> multiple tables (select c from t1,t2,t3), the planner always picks one
> table to be scanned by CustomScan and offloads the rest to SeqScan. I tried
> assigning a cost of 0 to the CustomScan path, but still not working.
> BeginCustomScan gets executed, ExecCustomScan is skipped, and then
> EndCustomScan is executed for all the tables that are offloaded to Seq
> Scan. EXPLAIN shows that always only one table is picked to be executed by
> CustomScan. Any idea what I might be doing wrong? Like a value in a struct
> I might be setting incorrectly?
>
> Thanks!
>


Re: Scans are offloaded to SeqScan instead of CustomScan when there are multiple relations in the same query

2023-04-17 Thread Amin
Hi Tom,

I made sure EXPLAIN returns CustomScan for all scans in the query. But
still, ExecCustomScan is only called once while the rest of the functions
are called for each scan separately. Is this expected behavior? How to work
around this?

Thank you!

On Mon, Apr 17, 2023 at 3:45 PM Tom Lane  wrote:

> Amin  writes:
> > To simplify: Can CustomScan scan multiple relations in the same query or
> it
> > will always be assigned to one or zero relations?
>
> There's barely any code in the core planner that is specific to custom
> scans.  Almost certainly this misbehavior is the fault of your
> custom-path-creation code.  Maybe you're labeling the paths with the
> wrong parent relation, or forgetting to submit them to add_path,
> or assigning them costs that are high enough to get them rejected?
>
> regards, tom lane
>


Traversing targetlist to find accessed columns

2022-12-03 Thread Amin
Hi,

Having a query, I am trying to find out all the columns that need to be
accessed (their varattno and vartype). I have access to a targetlist
representing a tree like this. So, I am looking for a function that
recursively traverses the tree and gives me the VARs. So, for SELECT
a,b,b+c from tab; I am interested in [a,b]. Is such a function currently
implemented in postgresql? How can I use it?

:targetlist (
>{TARGETENTRY
>:expr
>   {VAR
>   :varno 1
>   :varattno 1
>   :vartype 23
>   :vartypmod -1
>   :varcollid 0
>   :varlevelsup 0
>   :varnosyn 1
>   :varattnosyn 1
>   :location 7
>   }
>:resno 1
>:resname l_orderkey
>:ressortgroupref 0
>:resorigtbl 24805
>:resorigcol 1
>:resjunk false
>}
>{TARGETENTRY
>:expr
>   {VAR
>   :varno 1
>   :varattno 2
>   :vartype 23
>   :vartypmod -1
>   :varcollid 0
>   :varlevelsup 0
>   :varnosyn 1
>   :varattnosyn 2
>   :location 18
>   }
>:resno 2
>:resname l_partkey
>:ressortgroupref 0
>:resorigtbl 24805
>:resorigcol 2
>:resjunk false
>}
>{TARGETENTRY
>:expr
>   {OPEXPR
>   :opno 551
>   :opfuncid 177
>   :opresulttype 23
>   :opretset false
>   :opcollid 0
>   :inputcollid 0
>   :args (
>  {OPEXPR
>  :opno 551
>  :opfuncid 177
>  :opresulttype 23
>  :opretset false
>  :opcollid 0
>  :inputcollid 0
>  :args (
> {VAR
> :varno 1
> :varattno 1
> :vartype 23
> :vartypmod -1
> :varcollid 0
> :varlevelsup 0
> :varnosyn 1
> :varattnosyn 1
> :location 28
> }
> {VAR
> :varno 1
> :varattno 2
> :vartype 23
> :vartypmod -1
> :varcollid 0
> :varlevelsup 0
> :varnosyn 1
> :varattnosyn 2
>  :location 39
>  }
>   )
>   :location 38
>   }
>   {VAR
>   :varno 1
>   :varattno 3
>   :vartype 23
>   :vartypmod -1
>   :varcollid 0
>   :varlevelsup 0
>   :varnosyn 1
>   :varattnosyn 3
>   :location 49
>   }
>)
>:location 48
>}
> :resno 3
> :resname ?column?
> :ressortgroupref 0
> :resorigtbl 0
> :resorigcol 0
> :resjunk false
> }
>  )
>


Creating HeapTuple from char and date values

2022-12-07 Thread Amin
Hi All,

I am trying to create HeapTuple data structure.
First, I create a tuple descriptor:
 TupleDesc *td=CreateTemplateTupleDesc(colCount);
Then, for each variable, I do:
  TupleDescInitEntry(*td,v->varattno,NULL,v->vartype,v->vartypmod,0);
Then, I assign values:
  if int32: values[v->varattno-1]=Int8GetDatum(myValue);
Similarly for float.
Finally, I create the HeapTuple:
  HeapTuple tuple=heap_form_tuple(td,values,isnull);

Everything works fine with int and float. But I don't know how to handle
chars.
Let's say we have a character(10) column. One problem is v->vartypmod will
be set to 14. Shouldn't it be 10?
Second, how should I assign values? Is
values[v->varattno-1]=CStringGetDatum(myValue); correct? Should I set the
last parameter to TupleDescInitEntry? Why am I getting "invalid memory
alloc request size" or segfault with different configurations?


Getting relations accessed by a query using the raw query string

2023-01-24 Thread Amin
Hi,

Having a query string, I am trying to use the postgres parser to find which
relations the query accesses. This is what I currently have:

const char *query_string="select * from dummytable;";
List *parsetree_list=pg_parse_query(query_string);
ListCell *parsetree_item;

foreach(parsetree_item,parsetree_list){
  RawStmt *parsetree=lfirst_node(RawStmt,parsetree_item);
  Query *query=parse_analyze(parsetree,query_string,NULL,0,NULL);
}

However, when I inspect the variable "query", it is not populated
correctly. For example, commandType is set to CMD_DELETE while I have
passed a SELECT query.
- What am I doing wrong?
- Once I get the query correctly, how can I get the list of relations it
gets access to?
- Or any other ways to get the list of relations from raw query string
through postgres calls?

Thank you!


Re: How to find the number of cached pages for a relation?

2023-01-27 Thread Amin
Thank you Andres.

If I want to do "a" ( Do one probe of the buffer mapping table for each
block of the relation. Thus O(#relation blocks)) what function calls can I
use, assuming I only have access to the relation id? How can I access and
scan the buffer mapping table?

On Fri, Jan 13, 2023 at 6:27 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-13 17:28:31 -0800, Amin wrote:
> > Before scanning a relation, in the planner stage, I want to make a call
> to
> > retrieve information about how many pages will be a hit for a specific
> > relation. The module pg_buffercache seems to be doing a similar thing.
> > Also, pg_statio_all_tables seems to be having that information, but it is
> > updated after execution. However, I want the information before
> execution.
> > Also not sure how pg_statio_all_tables is created and how I can access it
> > in the code.
>
> There's no cheap way to do that. Currently the only ways are to:
>
> a) Do one probe of the buffer mapping table for each block of the
>relation. Thus O(#relation blocks).
>
> b) Scan all of buffer headers, check which are for the relation. Thus
>O(#NBuffers)
>
> Neither of which are a good idea during planning.
>
>
> It might be a bit more realistic to get very rough estimates:
>
> You could compute the table's historic cache hit ratio from pgstats (i.e.
> use
> the data backing pg_statio_all_tables). Of course that's not going to be
> specific to your query (for index scans etc), and might have changed more
> recently. It'd also be completely wrong after a restart.
>
> If we had information about *recent* cache hit patterns for the relation,
> it'd
> be a lot better, but we don't have the infrastructure for that, and
> introducing it would increase the size of the stats entries noticably.
>
> Another way could be to probe the buffer mapping table for a small subset
> of
> the locks and infer the likelihood of other blocks being in shared buffers
> that way.
>
> A third way could be to track the cache hit for relations in backend local
> memory, likely in the relache entry. The big disadvantage would be that
> query
> plans would differ between connections and that connections would need to
> "warm up" to have good plans. But it'd handle restarts nicely.
>
> Greetings,
>
> Andres Freund
>


Scan buffercache for a table

2023-01-30 Thread Amin
Hi,

I am looking for function calls to scan the buffer cache for a table and
find the cached pages. I want to find out which pages are cached and which
of them are dirty. Having the relation id, how can I do that? I have gone
through bufmgr.c and relcache.c, but could not find a way to get
relation-specific pages from the buffer cache.

Thank you!


Re: Scan buffercache for a table

2023-01-30 Thread Amin
Thank you Justin. I started a new thread because the context is a little
bit different. I am no longer interested in statistics anymore. I want to
find exact individual pages of a table which are cached and are/aren't
dirty. pg_buffercache implements the loop, but it goes over all the
buffers. However, I want to scan a specific table cache pages.

On Mon, Jan 30, 2023 at 6:43 PM Justin Pryzby  wrote:

> On Mon, Jan 30, 2023 at 06:01:08PM -0800, Amin wrote:
> > Hi,
> >
> > I am looking for function calls to scan the buffer cache for a table and
> > find the cached pages. I want to find out which pages are cached and
> which
> > of them are dirty. Having the relation id, how can I do that? I have gone
> > through bufmgr.c and relcache.c, but could not find a way to get
> > relation-specific pages from the buffer cache.
>
> This looks like a re-post of the question you asked on Jan 13:
> caf-ka8_axsmpqw1scotnaqx8nfhgmjc6l87qzao3jezlibu...@mail.gmail.com
> It'd be better not to start a new thread (or if you do that, it'd be
> better to mention the old one and include its participants).
>
> On Fri, Jan 13, 2023 at 05:28:31PM -0800, Amin wrote:
> > Hi,
> >
> > Before scanning a relation, in the planner stage, I want to make a
> > call to
> > retrieve information about how many pages will be a hit for a specific
> > relation. The module pg_buffercache seems to be doing a similar thing.
>
> The planner is a *model* which (among other things) tries to guess how
> many pages will be read/hit.  It's not expected to be anywhere near
> accurate.
>
> pg_buffercache only checks for pages within postgres' own buffer cache.
> It doesn't look for pages which are in the OS page cache, which require
> a system call to access (but don't require device I/O).
>
> Read about pgfincore for introinspection of the OS page cache.
>
> > Also, pg_statio_all_tables seems to be having that information, but it
> > is updated after execution. However, I want the information before
> > execution.  Also not sure how pg_statio_all_tables is created and how
> > I can access it in the code.
>
> But the view isn't omnicient.  When you execute a plan, you don't know
> how it's going to end.  If you did, you wouldn't need to run it - you
> could just print the answer.
>
> Note that planning and execution are separate and independant.  It's
> possible to plan a query without ever running it, or to plan it once and
> run it multiple times.  The view reflects I/O requested by postgres; the
> I/O normally comes primarily from execution.
>
> You can look at how the view is defined:
> \sv pg_statio_all_tables
>
> And then you can look at how the functions that it calls are implemented
> (\df+).  Same for pg_buffercache.  It seems like you'll want to learn
> how to navigate the source code to find how things are connected.
>
> --
> Justin
>


Share variable between psql backends in CustomScan

2023-02-17 Thread Amin
Hi,

I am looking for a way to define a global variable in CustomScan plugin
that is shared between different psql backends. Is it possible without
using shared memory? Does postgresql implement any function that
facilitates this?

Thank you,
Amin