[BUGS] BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)

2008-12-11 Thread Scott Carey

The following bug has been logged online:

Bug reference:  4575
Logged by:  Scott Carey
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.3.5, 8.3.4
Operating system:   Linux (CentOS 5.2  2.6.18-92.1.10.el5)
Description:All page cache in shared_buffers pinned (duplicated by
OS, always)
Details: 

I have determined that nearly every cached page within shared_buffers is
being pinned in memory, preventing the OS from dropping any such pages from
its page cache.

Effectively, the memory used by shared_buffers for cached file pages is 2*
the page size, these won't page out, they are pinned.

---
To Reproduce:
--
Stop Postgres.  drop the OS page caches as follows:

# sync;echo 3 > /proc/sys/vm/drop_caches

Using 'free -m' or 'top' note the number of bytes that remain in the disk
cache after forcing all to clear.
This number is the baseline un-evictable page cache size.

Configure postgres shared_buffers for ~30% of total RAM.  The purpose of
this is to make the effect obvious.

Start Postgres.  Run the same test as above.  The value will be only
slightly larger than before.  Run 'top' and note the size of the "SHR"
column for the postgres process with the largest value of "SHR".  This is
the shared memory used by postgres currently.  

Execute a query that fills the page cache.  This is any insert  (create
table as select * from other_table works), or any select that does not cause
a sequential scan.  It does not appear that sequential scans ever end up
putting pages in shared_buffers (nor does Vacuum, but I knew about vacuum's
ring buffer before this).  I used a query on a large table (5GB) with an
index, known to do an index scan, and ran repeated variants on the where
clause to hit most of it.

Run top, and note the largest value of the "SHR" column on all postgres
processes.  Now execute the os cache eviction.  Check the remaining cached
memory.
Note that it is now larger than the baseline by essentially the exact size
of the postgres shared memory.

Note, that you can wait for a checkpoint, sync, and be certain that there
are no dirty pages from the OS point of view, and the other baselines rule
out executable pages.  Yet all the pages that correspond to those in
shared_buffers seem to be pinned in memory.

Running such tests over and over, with different data and different values
for shared_buffers, it is consistent and easily reproducible on CentOS 5.2
linux as described above.


-
Context:
-

This is a rather large performance and scale issue.  For some cases, a large
shared_buffers helps because indexes and random-access data tends to be
cached in this area well (seq scans don't kick things out of there).  If
there is memory pressure on the system, it can't kick out the pinned os
pages, effectively making this cache cost twice the space.
Additionally, for heavy write situations a large shared_buffers is required
along with changes to the background writer and checkpoints to make it
perform at peak capability (typically, it helps to be around 5 seconds * the
MB/sec that the I/O is capable of -- 6GB for me).

As far as my previous understanding of how postgres works, pages cached in
shared_buffers should be able to differ from those the OS caches.  So, if
one heavily accessed index is always in shared_buffers, it will not be in
the OS cache.  When pages are written it will pass through the OS cache, but
since no reads make it to the OS, these are evicted relatively quickly from
the OS page cache and highly conserved in shared_buffers.

This bug is preventing this behavior.

There is one other behavior I have seen:  kswapd CPU use is much higher when
there are a large number of pinned pages in the system.  kswapd will use
significant CPU during disk reads (10% of a 3Ghz CPU for every 250MB/sec
read) if there is ~8GB of pinned data.  The more data pinned the higher the
CPU used here.  Without postgres running, pinning memory, the kswapd time
doing the same activity, is significantly lower.


This bug was first noticed due to how the linux kernel behaves when it wants
to evict items from page cache but they are pinned.  With shared_buffers at
25% of RAM, the system was unable to allocate more than 50% of the remaining
75% to processes.  The result when approaching that limit was higher and
higher system CPU% use, until all 8 cores spin at 100% when the remaining
page cache is roughly equal in size to shared_buffers.  It takes a low
'swapppiness' value to expose this before a swap storm occurs.

Although I have not tried it, it would not surprise me if configuring
shared_buffers to 55% of RAM with low swappiness, then filling
shared_buffers with data from reads, may make the system fall down. Default
swappiness would likely swap storm instead but may have similarly bad
behavior, considering that close to half of the system memory should still
be freely available.

Given the various articles or blog posts out there where some have
configured shared

[BUGS] Canot instal

2008-12-11 Thread Alex Rezende
Hello,

I canot instal a postgre version 8.2.5...I have a 2 clicks on the doc. and
appears a box error:

 This installation package can not be open Verify that the package exists
and that you can access it, or contact the application vendor to verify that
this is a valid Windows install package.

whats the problem?

Reggards,

Alex Rezende


Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)

2008-12-11 Thread Pavan Deolasee
On Thu, Dec 11, 2008 at 5:37 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
>
>
> Run top, and note the largest value of the "SHR" column on all postgres
> processes.  Now execute the os cache eviction.  Check the remaining cached
> memory.
> Note that it is now larger than the baseline by essentially the exact size
> of the postgres shared memory.
>

Isn't the shared memory on Linux non-swappable, unlike Solaris where
you have an option to make is swappable ? As and when shared memory
pages are accessed, they are allocated and can not be swapped out. I
don't know if these pages are counted as part of the OS cache, but
assuming they are, I don't see any problem with the above observation.

May be you can try to write a C program which creates, attaches and
accesses every page of the shared memory and check if you see the same
behavior.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)

2008-12-11 Thread Pavan Deolasee
On Thu, Dec 11, 2008 at 2:05 PM, Pavan Deolasee
<[EMAIL PROTECTED]> wrote:
>
> Isn't the shared memory on Linux non-swappable, unlike Solaris where
> you have an option to make is swappable ?

Or may be my linux kernel knowledge is stale. I see a SHM_LOCK/UNLOCK
operations in shmctl(2) which can be used to control swapping of shm.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)

2008-12-11 Thread Tom Lane
"Scott Carey" <[EMAIL PROTECTED]> writes:
> I have determined that nearly every cached page within shared_buffers is
> being pinned in memory, preventing the OS from dropping any such pages from
> its page cache.

Shouldn't you be complaining to kernel folk rather than here?
What do you think we could do about it?

(I'm not convinced that you're seeing anything except a
platform-specific vagary in how "top" counts things, anyway.)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)

2008-12-11 Thread Scott Carey
Ok, I'll double check to validate that there is true duplication.

If shared memory cannot be swapped, and shows up in the page cache list (which 
would be odd, but ok, since shared mem is not page cache it is application 
memory), then only those pages (the actual shared_buffers) would not be 
swappable, not a 'clone' of those buffers in the os page cache.

I certainly don't care if the shared memory isn't pageable, I do care if a 
clone of that memory is also not pageable.



From: Pavan Deolasee [pavan.deola...@gmail.com]
Sent: Thursday, December 11, 2008 12:35 AM
To: Scott Carey
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned 
(duplicated by OS, always)

On Thu, Dec 11, 2008 at 5:37 AM, Scott Carey  wrote:
>
>
> Run top, and note the largest value of the "SHR" column on all postgres
> processes.  Now execute the os cache eviction.  Check the remaining cached
> memory.
> Note that it is now larger than the baseline by essentially the exact size
> of the postgres shared memory.
>

Isn't the shared memory on Linux non-swappable, unlike Solaris where
you have an option to make is swappable ? As and when shared memory
pages are accessed, they are allocated and can not be swapped out. I
don't know if these pages are counted as part of the OS cache, but
assuming they are, I don't see any problem with the above observation.

May be you can try to write a C program which creates, attaches and
accesses every page of the shared memory and check if you see the same
behavior.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)

2008-12-11 Thread Scott Carey
I am 99.9% certian its not a fluke of "top" (or 'free').  Or a fluke with the 
drop_caches linux vm signal.  Otherwise, the system would not spin at 100% 
System cpu getting no work done if an attempt to allocate memory above that 
threshold, the original symptom that led me down this path of investigation.

So on a 32GB machine, setting shared_buffers to 8GB, filling them up via index 
scans, waiting for a checkpoint and syncing to be sure the dirty pages are 
negligible, then trying to allocate (and use) more than 16GB of RAM will make 
the sytem unresponsive.  Set shared_buffers to 4GB and repeat, and it takes 
24GB of allocations to cause the problem.  The system essentially behaves like 
the shared_buffer space is 2x its size, except that the more useless half of it 
cannot be paged out.

When I first saw this behavior, i did not connect the 8GB that the OS could not 
free to the 8GB that postgres was caching, and blamed linux, spending much 
effort with linux vm tunables.  But now that they are connected, It would seem 
more likely to be on this side of things.  After all, how can linux even know 
what pages postgres still has in cache?  AFAIK, you aren't memory mapping 
pages, just read() ing them?  Wouldn't the OS only know that there is memory 
allocated to shared space in a process, and not know those contents are mapped 
pages?

Alternatively, the same behavior could occur if only the first pages put in the 
buffer, that replace 'empty space,' are being pinned.  I would need some sort 
of tool that can tell me what blocks are in the OS page cache and which are in 
postgres to distinguish between those given the symptoms.

If there are any ideas on how I could truly distinguish where this bug actually 
is, or further characterize it in ways that would be useful to that end, that 
would be great.

I should have time to set shared_buffers above 50% and see if it kills things 
later today -- this sort of test would not require any special cache flushing 
particular to linux to see.

Thanks,

Scott


From: Tom Lane [...@sss.pgh.pa.us]
Sent: Thursday, December 11, 2008 5:57 AM
To: Scott Carey
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned 
(duplicated by OS, always)

"Scott Carey"  writes:
> I have determined that nearly every cached page within shared_buffers is
> being pinned in memory, preventing the OS from dropping any such pages from
> its page cache.

Shouldn't you be complaining to kernel folk rather than here?
What do you think we could do about it?

(I'm not convinced that you're seeing anything except a
platform-specific vagary in how "top" counts things, anyway.)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] ECPG Preprocessor throws Syntax Error [Devel Repository]

2008-12-11 Thread Ahmed Shinwari
Hi,

Yesterday I pull source from PG-Devel repository (Head) and was playing
around with ECPG. I got following two issues while parsing my script files
through ECPG preprocessor which I think are potential bugs;


1- I get this error message from preprocessor,

*ERROR: cursor "m_name_cursor" already defined *

Below is the code snippet, error message is thrown on the last line of this
snippet;

...
...
EXEC SQL DECLARE m_name_cursor CURSOR FOR SELECT name,id FROM m_table3;
EXEC SQL OPEN m_name_cursor;
EXEC SQL CLOSE m_name_cursor;

/* Declaring cursor with the same name */
EXEC SQL DECLARE m_name_cursor CURSOR FOR SELECT name,id FROM m_table4;


2- Preprocessor throws this error;

*ERROR: syntax error at or near "-"*

Below is the code snippet, the error is thrown on the last line of the
snippet;

...
...
const char *stmt1 = "SELECT abs(?)";
EXEC SQL PREPARE mystmt1 FROM :stmt1;
EXEC SQL EXECUTE mystmt1 INTO :result USING -.55566;



Regards,
Ahmed


Re: [BUGS] BUG #4575: All page cache in shared_buffers pinned (duplicated by OS, always)

2008-12-11 Thread Tom Lane
Scott Carey  writes:
> I am 99.9% certian its not a fluke of "top" (or 'free').  Or a fluke with the 
> drop_caches linux vm signal.  Otherwise, the system would not spin at 100% 
> System cpu getting no work done if an attempt to allocate memory above that 
> threshold, the original symptom that led me down this path of investigation.

That's certainly a kernel bug.

> If there are any ideas on how I could truly distinguish where this bug 
> actually is, or further characterize it in ways that would be useful to that 
> end, that would be great.

It's in the kernel.  Userland doesn't have any way to "pin" OS disk cache
pages in memory, which AFAICT is what you are claiming happens.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs