How to see/calculate size of index in memory?

2018-09-19 Thread Sam R.

Hi!

I would have following question, if someone could help.

Question 1: How to see/calculate size of index in memory?
BTree, hash index.

I can see size of index e.g. with pg_relation_size FROM pg_class (after 
reindex). Does that tell size of index on disk?

I would be interested how big part of index is in memory. (Whole index?)

PG10/PG11.
Best Regards, Sam



To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Hi!

Related to my other email (size of index in memory), 

Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Or have I missed something, does it matter (to keep indexes in memory)?

Background info: I have plans to use hash indexes: very large amount of data in 
db tables, but (e.g. hash) indexes could be kept in memory.

I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been 
released.

Best Regards, Sam


Re: How to see/calculate size of index in memory?

2018-09-19 Thread Sergei Kornilov
Hello
You can use pg_buffercache contrib module: 
https://www.postgresql.org/docs/current/static/pgbuffercache.html

pg_relation_size - yes, its full size on disk regardless buffer cache

regards, Sergei



Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sergei Kornilov
Hi

effective_cache_size is not cache. It is just approx value for query planner: 
how many data can be found in RAM (both in shared_buffers and OS page cache)

> Q: Size of shared_buffers does not matter regarding keeping index in memory?
shared_buffers is cache for both tables and indexes pages. All data in tables 
and indexes are split to chunks 8 kb each - pages (usually 8kb, it can be 
redefined during source compilation).
Shared buffers cache is fully automatic, active used pages keeps in memory, 
lower used pages may be evicted. You can not pin any table or index to shared 
buffers.

regards, Sergei



Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Hi!
Is is possible to force PostgreSQL to keep an index in memory? The data in db 
table columns is not needed to be kept in memory, only the index. (hash index.)

It would sound optimal in our scenario.I think Oracle has capability to keep 
index in memory (in-memory db functionality). But does PostgreSQL have such a 
functionality? (I keep searching.)
I have read:
Tuning Your PostgreSQL Server - PostgreSQL wiki
(effective_cache_size, shared_buffers)
I have seen responses to:
PostgreSQL Index Caching



Should I actually set shared_buffers to tens of gigabytes also, if I want to 
keep one very big index in memory?

I ma also reading a PG book.

Best Regards, Sam


 

On Wednesday, September 19, 2018 11:40 AM, Sam R.  
wrote:
 

 Hi!

Related to my other email (size of index in memory), 

Other questions,
Q: To keep _index(es)_ in memory, is large enough effective_cache_size enough?
Q: Size of shared_buffers does not matter regarding keeping index in memory?
Or have I missed something, does it matter (to keep indexes in memory)?

Background info: I have plans to use hash indexes: very large amount of data in 
db tables, but (e.g. hash) indexes could be kept in memory.

I am using PostgreSQL 10. I could start to use PostgreSQL 11, after it has been 
released.

Best Regards, Sam


   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Sergei wrote:
> You can not pin any table or index to shared buffers.
Thanks, this is answer to my other question!
In our case, this might be an important feature. 
(Index in memory, other data / columns not.)
> shared_buffers is cache for both tables and indexes pages.
Ok. So, we should set also shared_buffers big.

BR Sam
 

On Wednesday, September 19, 2018 12:10 PM, Sergei Kornilov  
wrote:
 

 Hi

effective_cache_size is not cache. It is just approx value for query planner: 
how many data can be found in RAM (both in shared_buffers and OS page cache)

> Q: Size of shared_buffers does not matter regarding keeping index in memory?
shared_buffers is cache for both tables and indexes pages. All data in tables 
and indexes are split to chunks 8 kb each - pages (usually 8kb, it can be 
redefined during source compilation).
Shared buffers cache is fully automatic, active used pages keeps in memory, 
lower used pages may be evicted. You can not pin any table or index to shared 
buffers.

regards, Sergei


   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread David Rowley
On 19 September 2018 at 21:18, Sam R.  wrote:
> Ok. So, we should set also shared_buffers big.

It might not be quite as beneficial as you might think. If your
database is larger than RAM often having a smaller shared_buffers
setting yields better performance.  The reason is that if you have a
very large shared_buffers that the same buffers can end up cached in
the kernel page cache and shared buffers. If you have a smaller shared
buffers setting then the chances of that double buffering are reduced
and the chances of finding a page cached somewhere increases.

However, if your database is quite small and you can afford to fit all
your data in shared buffers, with enough free RAM for everything else,
then you might benefit from a large shared buffers, but it's important
to also consider that some operations, such as DROP TABLE can become
slow of shared buffers is very large.

You might get more specific recommendations if you mention how much
RAM the server has and how big the data is now and will be in the
future.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Kaixi Luo
Does a large shared_buffers impact checkpoint performance negatively? I was
under the impression that everything inside shared_buffers must be written
during a checkpoint.


Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread David Rowley
On 19 September 2018 at 22:12, Kaixi Luo  wrote:
> Does a large shared_buffers impact checkpoint performance negatively? I was
> under the impression that everything inside shared_buffers must be written
> during a checkpoint.

Only the dirty buffers get written.

Also having too small a shared buffers can mean that buffers must be
written more than they'd otherwise need to be. If a buffer must be
evicted from shared buffers to make way for a new buffer then the
chances of having to evict a dirty buffer increases with smaller
shared buffers. Obviously, this dirty buffer needs to be written out
before the new buffer can be loaded in. In a worst-case scenario, a
backend performing a query would have to do this.  pg_stat_bgwriter is
your friend.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Hi!
Thanks for all of the comments!
David wrote:> if you mention 
> how muchRAM the server has and how big the data is now
Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example. 
( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an 
even little bit bigger value than 64 GB, if needed to. )
BR Sam

On Wednesday, September 19, 2018 1:11 PM, David Rowley 
 wrote:
 

 On 19 September 2018 at 21:18, Sam R.  wrote:
> Ok. So, we should set also shared_buffers big.

It might not be quite as beneficial as you might think. If your
database is larger than RAM often having a smaller shared_buffers
setting yields better performance.  The reason is that if you have a
very large shared_buffers that the same buffers can end up cached in
the kernel page cache and shared buffers. If you have a smaller shared
buffers setting then the chances of that double buffering are reduced
and the chances of finding a page cached somewhere increases.

However, if your database is quite small and you can afford to fit all
your data in shared buffers, with enough free RAM for everything else,
then you might benefit from a large shared buffers, but it's important
to also consider that some operations, such as DROP TABLE can become
slow of shared buffers is very large.

You might get more specific recommendations if you mention how much
RAM the server has and how big the data is now and will be in the
future.

-- 
 David Rowley                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Size of the index of one huge table has been e.g. 16-20 GB (after REINDEX). 

Size of such an index is quite big.
  
BR Samuli

On Wednesday, September 19, 2018 2:01 PM, Sam R.  
wrote:
 

 Hi!
Thanks for all of the comments!
David wrote:> if you mention 
> how muchRAM the server has and how big the data is now
Let's say for example:
RAM: 64 GB
Data: 500 GB - 1.5 TB, for example. 
( RAM: Less would of course be better, e.g. 32 GB, but we could maybe go for an 
even little bit bigger value than 64 GB, if needed to. )
BR Sam

On Wednesday, September 19, 2018 1:11 PM, David Rowley 
 wrote:
 
...
 



   

   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Jeff Janes
On Wed, Sep 19, 2018 at 5:19 AM Sam R.  wrote:

> Hi!
>
> Is is possible to force PostgreSQL to keep an index in memory?
>

It might be possible to put the indexes in a separate tablespace, then do
something at the file-system level to to force the OS cache to keep pages
for that FS in memory.



> The data in db table columns is not needed to be kept in memory, only the
> index. (hash index.)
>

This sounds like speculation.  Do you have hard evidence that this is
actually the case?


>
> It would sound optimal in our scenario.
> I think Oracle has capability to keep index in memory (in-memory db
> functionality). But does PostgreSQL have such a functionality? (I keep
> searching.)
>

There are a lot of Oracle capabilities which encourage people to
micromanage the server in ways that are almost never actually productive.

Should I actually set shared_buffers to tens of gigabytes also, if I want
> to keep one very big index in memory?
>

If your entire database fits in RAM, then it could be useful to set
shared_buffers high enough to fit the entire database.

If fitting the entire database in RAM is hopeless, 10s of gigabytes is
probably too much, unless you have 100s of GB of RAM. PostgreSQL doesn't do
direct IO, but rather uses the OS file cache extensively.  This leads to
double-buffering, where a page is read from disk and stored in the OS file
cache, then handed over to PostgreSQL where it is also stored in
shared_buffers. That means that 1/2 of RAM is often the worse value for
shared_buffers.  You would want it to be either something like 1/20 to 1/10
of RAM, or something like 9/10 or 19/20 of RAM, so that you concentrate
pages into one of the caches or the other.  The low fraction of RAM is the
more generally useful option.  The high fraction of RAM is useful when you
have very high write loads, particularly intensive index updating--and in
that case you probably need someone to intensively monitor and baby-sit the
database.

Cheers,

Jeff


Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Sam R.
Thanks for the comments!
Sam wrote:

>> The data in db table columns is not needed to be kept in memory, only the 
>> index. (hash index.)


Jeff Janes wrote:
> This sounds like speculation.  Do you have hard evidence that this is 
> actually the case?
In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.

In generation phase, it is a very large random number. Our application may not 
even generate the random ID.

We use hash index over the ID.

At the moment, in "pure theory", we will read randomly through the hash 
index.So, no one will be able to know what part of the data (from the table) 
should be kept in memory. 
Side note: Of course there may be (even many) use cases, where same data is 
read again and again. Still: I am thinking now from a very theoretical point of 
view (which we may still apply in practice).

In generic: 
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / 
wish of keeping only the index in memory may be faulty.  (This is one reason 
for these discussions.)

BR Sam

  
   

Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread Wei Shan
I believe you can use pg_prewarm to pin index or table to cache.

https://www.postgresql.org/docs/current/static/pgprewarm.html

On Wed, 19 Sep 2018 at 22:50, Sam R.  wrote:

> Thanks for the comments!
>
> Sam wrote:
>
> >> The data in db table columns is not needed to be kept in memory, only
> the index. (hash index.)
>
>
> Jeff Janes wrote:
> > This sounds like speculation.  Do you have hard evidence that this is
> actually the case?
>
> In our case the "ID" is randomly generated random number. (Large ID.)
> It is not a "sequential" number, but random.
>
> In generation phase, it is a very large random number. Our application may
> not even generate the random ID.
>
> We use hash index over the ID.
>
> At the moment, in "pure theory", we will read randomly through the hash
> index.
> So, no one will be able to know what part of the data (from the table)
> should be kept in memory.
>
> Side note: Of course there may be (even many) use cases, where same data
> is read again and again. Still: I am thinking now from a very theoretical
> point of view (which we may still apply in practice).
>
> In generic:
> I am not certain how PostgreSQL or hash indexes work in detail, so my
> claim / wish of keeping only the index in memory may be faulty.  (This is
> one reason for these discussions.)
>
> BR Sam
>
>
>

-- 
Regards,
Ang Wei Shan


Re: To keep indexes in memory, is large enough effective_cache_size enough?

2018-09-19 Thread David Rowley
On 20 September 2018 at 15:19, Wei Shan  wrote:
> I believe you can use pg_prewarm to pin index or table to cache.
>
> https://www.postgresql.org/docs/current/static/pgprewarm.html

I think the key sentence in the document you linked to is:

"Prewarmed data also enjoys no special protection from cache
evictions, so it is possible that other system activity may evict the
newly prewarmed blocks shortly after they are read"

So this is not pinning. It's merely loading buffers into shared
buffers in the hope that they might be around long enough for you to
make the most of that effort.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services