Problem in Page Cache Replacement

2012-11-20 Thread metin d
I have two PostgreSQL databases named data-1 and data-2 that sit on the same 
machine. Both databases keep 40 GB of data, and the total memory available on 
the machine is 68GB.

I started data-1 and data-2, and ran several queries to go over all their data. 
Then, I shut down data-1 and kept issuing queries against data-2. For some 
reason, the OS still holds on to large parts of data-1's pages in its page 
cache, and reserves about 35 GB of RAM to data-2's files. As a result, my 
queries on data-2 keep hitting disk.

I'm checking page cache usage with fincore. When I run a table scan query 
against data-2, I see that data-2's pages get evicted and put back into the 
cache in a round-robin manner. Nothing happens to data-1's pages, although they 
haven't been touched for days.

Does anybody know why data-1's pages aren't evicted from the page cache? I'm 
open to all kind of suggestions you think it might relate to problem.

This is an EC2 m2.4xlarge instance on Amazon with 68 GB of RAM and no swap 
space. The kernel version is:

$ uname -r
3.2.28-45.62.amzn1.x86_64
Edit:

and it seems that I use one NUMA instance, if  you think that it can a problem.

$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 70007 MB
node 0 free: 360 MB
node distances:
node   0
  0:  10
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem in Page Cache Replacement

2012-11-21 Thread metin d


>  Curious. Added linux-mm list to CC to catch more attention. If you run
> echo 1 >/proc/sys/vm/drop_caches does it evict data-1 pages from memory?


I'm guessing it'd evict the entries, but am wondering if we could run any more 
diagnostics before trying this.

We regularly use a setup where we have two databases; one gets used frequently 
and the other one about once a month. It seems like the memory manager keeps 
unused pages in memory at the expense of frequently used database's performance.

My understanding was that under memory pressure from heavily accessed pages, 
unused pages would eventually get evicted. Is there anything else we can try on 
this host to understand why this is happening?

Thank you,

Metin


- Original Message -----
From: Jan Kara 
To: metin d 
Cc: "linux-kernel@vger.kernel.org" ; 
linux...@kvack.org
Sent: Tuesday, November 20, 2012 8:25 PM
Subject: Re: Problem in Page Cache Replacement

On Tue 20-11-12 09:42:42, metin d wrote:
> I have two PostgreSQL databases named data-1 and data-2 that sit on the
> same machine. Both databases keep 40 GB of data, and the total memory
> available on the machine is 68GB.
> 
> I started data-1 and data-2, and ran several queries to go over all their
> data. Then, I shut down data-1 and kept issuing queries against data-2.
> For some reason, the OS still holds on to large parts of data-1's pages
> in its page cache, and reserves about 35 GB of RAM to data-2's files. As
> a result, my queries on data-2 keep hitting disk.
> 
> I'm checking page cache usage with fincore. When I run a table scan query
> against data-2, I see that data-2's pages get evicted and put back into
> the cache in a round-robin manner. Nothing happens to data-1's pages,
> although they haven't been touched for days.
> 
> Does anybody know why data-1's pages aren't evicted from the page cache?
> I'm open to all kind of suggestions you think it might relate to problem.
  Curious. Added linux-mm list to CC to catch more attention. If you run
echo 1 >/proc/sys/vm/drop_caches
  does it evict data-1 pages from memory?

> This is an EC2 m2.4xlarge instance on Amazon with 68 GB of RAM and no
> swap space. The kernel version is:
> 
> $ uname -r
> 3.2.28-45.62.amzn1.x86_64
> Edit:
> 
> and it seems that I use one NUMA instance, if  you think that it can a 
> problem.
> 
> $ numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 70007 MB
> node 0 free: 360 MB
> node distances:
> node   0
>   0:  10

                                Honza
-- 
Jan Kara 
SUSE Labs, CR

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem in Page Cache Replacement

2012-11-21 Thread metin d
>  Curious. Added linux-mm list to CC to catch more attention. If you run
> echo 1 >/proc/sys/vm/drop_caches does it evict data-1 pages from memory?

I'm guessing it'd evict the entries, but am wondering if we could run any more 
diagnostics before trying this.

We regularly use a setup where we have two databases; one gets used frequently 
and the other one about once a month. It seems like the memory manager keeps 
unused pages in memory at the expense of frequently used database's performance.

My understanding was that under memory pressure from heavily accessed pages, 
unused pages would eventually get evicted. Is there anything else we can try on 
this host to understand why this is happening?

Thank you,

Metin

On Tue 20-11-12 09:42:42, metin d wrote:
> I have two PostgreSQL databases named data-1 and data-2 that sit on the
> same machine. Both databases keep 40 GB of data, and the total memory
> available on the machine is 68GB.
> 
> I started data-1 and data-2, and ran several queries to go over all their
> data. Then, I shut down data-1 and kept issuing queries against data-2.
> For some reason, the OS still holds on to large parts of data-1's pages
> in its page cache, and reserves about 35 GB of RAM to data-2's files. As
> a result, my queries on data-2 keep hitting disk.
> 
> I'm checking page cache usage with fincore. When I run a table scan query
> against data-2, I see that data-2's pages get evicted and put back into
> the cache in a round-robin manner. Nothing happens to data-1's pages,
> although they haven't been touched for days.
> 
> Does anybody know why data-1's pages aren't evicted from the page cache?
> I'm open to all kind of suggestions you think it might relate to problem.
  Curious. Added linux-mm list to CC to catch more attention. If you run
echo 1 >/proc/sys/vm/drop_caches
  does it evict data-1 pages from memory?

> This is an EC2 m2.4xlarge instance on Amazon with 68 GB of RAM and no
> swap space. The kernel version is:
> 
> $ uname -r
> 3.2.28-45.62.amzn1.x86_64
> Edit:
> 
> and it seems that I use one NUMA instance, if  you think that it can a 
> problem.
> 
> $ numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 70007 MB
> node 0 free: 360 MB
> node distances:
> node   0
>   0:  10

-- 
Jan Kara 
SUSE Labs, CR

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem in Page Cache Replacement

2012-11-21 Thread metin d


Hi Fengguang,

I run tests and attached the results. The line below I guess shows the data-1 
page caches.

0x0008006c       6584051    25718  
__RU_lA___P    referenced,uptodate,lru,active,private
Metin



From: Jaegeuk Hanse 
To: Fengguang Wu  
Cc: metin d ; Jan Kara ; 
"linux-kernel@vger.kernel.org" ; 
"linux...@kvack.org"  
Sent: Wednesday, November 21, 2012 11:42 AM
Subject: Re: Problem in Page Cache Replacement

On 11/21/2012 05:02 PM, Fengguang Wu wrote:
> On Wed, Nov 21, 2012 at 04:34:40PM +0800, Jaegeuk Hanse wrote:
>> Cc Fengguang Wu.
>>
>> On 11/21/2012 04:13 PM, metin d wrote:
>>>>    Curious. Added linux-mm list to CC to catch more attention. If you run
>>>> echo 1 >/proc/sys/vm/drop_caches does it evict data-1 pages from memory?
>>> I'm guessing it'd evict the entries, but am wondering if we could run any 
>>> more diagnostics before trying this.
>>>
>>> We regularly use a setup where we have two databases; one gets used 
>>> frequently and the other one about once a month. It seems like the memory 
>>> manager keeps unused pages in memory at the expense of frequently used 
>>> database's performance.
>>> My understanding was that under memory pressure from heavily
>>> accessed pages, unused pages would eventually get evicted. Is there
>>> anything else we can try on this host to understand why this is
>>> happening?
> We may debug it this way.
>
> 1) run 'fadvise data-2 0 0 dontneed' to drop data-2 cached pages
>     (please double check via /proc/vmstat whether it does the expected work)
>
> 2) run 'page-types -r' with root, to view the page status for the
>     remaining pages of data-1
>
> The fadvise tool comes from Andrew Morton's ext3-tools. (source code attached)
> Please compile them with options "-Dlinux -I. -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE"
>
> page-types can be found in the kernel source tree tools/vm/page-types.c
>
> Sorry that sounds a bit twisted.. I do have a patch to directly dump
> page cache status of a user specified file, however it's not
> upstreamed yet.

Hi Fengguang,

Thanks for you detail steps, I think metin can have a try.

         flags    page-count       MB  symbolic-flags long-symbolic-flags
0x        607699     2373 
___
0x0001        343227     1340 
___r___    reserved

But I have some questions of the print of page-type:

Is 2373MB here mean total memory in used include page cache? I don't 
think so.
Which kind of pages will be marked reserved?
Which line of long-symbolic-flags is for page cache?

Regards,
Jaegeuk

>
> Thanks,
> Fengguang
>
>>> On Tue 20-11-12 09:42:42, metin d wrote:
>>>> I have two PostgreSQL databases named data-1 and data-2 that sit on the
>>>> same machine. Both databases keep 40 GB of data, and the total memory
>>>> available on the machine is 68GB.
>>>>
>>>> I started data-1 and data-2, and ran several queries to go over all their
>>>> data. Then, I shut down data-1 and kept issuing queries against data-2.
>>>> For some reason, the OS still holds on to large parts of data-1's pages
>>>> in its page cache, and reserves about 35 GB of RAM to data-2's files. As
>>>> a result, my queries on data-2 keep hitting disk.
>>>>
>>>> I'm checking page cache usage with fincore. When I run a table scan query
>>>> against data-2, I see that data-2's pages get evicted and put back into
>>>> the cache in a round-robin manner. Nothing happens to data-1's pages,
>>>> although they haven't been touched for days.
>>>>
>>>> Does anybody know why data-1's pages aren't evicted from the page cache?
>>>> I'm open to all kind of suggestions you think it might relate to problem.
>>>    Curious. Added linux-mm list to CC to catch more attention. If you run
>>> echo 1 >/proc/sys/vm/drop_caches
>>>    does it evict data-1 pages from memory?
>>>
>>>> This is an EC2 m2.4xlarge instance on Amazon with 68 GB of RAM and no
>>>> swap space. The kernel version is:
>>>>
>>>> $ uname -r
>>>> 3.2.28-45.62.amzn1.x86_64
>>>> Edit:
>>>>
>>>> and it seems that I use one NUMA instance, if  you think that it can a 
>>>> problem.
>>>>
>>>> $ numactl --hardware
>>>> available: 1 

Re: Problem in Page Cache Replacement

2012-11-22 Thread metin d
Hi,

Yes data-2 is bigger than half of memory. I'm willing to try those patches. 

This is the version of this machine:

$ uname -r
3.2.28-45.62.amzn1.x86_64



- Original Message -
From: Johannes Weiner 
To: Jan Kara 
Cc: metin d ; "linux-kernel@vger.kernel.org" 
; linux...@kvack.org
Sent: Wednesday, November 21, 2012 11:34 PM
Subject: Re: Problem in Page Cache Replacement

Hi,

On Tue, Nov 20, 2012 at 07:25:00PM +0100, Jan Kara wrote:
> On Tue 20-11-12 09:42:42, metin d wrote:
> > I have two PostgreSQL databases named data-1 and data-2 that sit on the
> > same machine. Both databases keep 40 GB of data, and the total memory
> > available on the machine is 68GB.
> > 
> > I started data-1 and data-2, and ran several queries to go over all their
> > data. Then, I shut down data-1 and kept issuing queries against data-2.
> > For some reason, the OS still holds on to large parts of data-1's pages
> > in its page cache, and reserves about 35 GB of RAM to data-2's files. As
> > a result, my queries on data-2 keep hitting disk.
> > 
> > I'm checking page cache usage with fincore. When I run a table scan query
> > against data-2, I see that data-2's pages get evicted and put back into
> > the cache in a round-robin manner. Nothing happens to data-1's pages,
> > although they haven't been touched for days.
> > 
> > Does anybody know why data-1's pages aren't evicted from the page cache?
> > I'm open to all kind of suggestions you think it might relate to problem.

This might be because we do not deactive pages as long as there is
cache on the inactive list.  I'm guessing that the inter-reference
distance of data-2 is bigger than half of memory, so it's never
getting activated and data-1 is never challenged.

I have a series of patches that detects a thrashing inactive list and
handles working set changes up to the size of memory.  Would you be
willing to test them?  They are currently based on 3.4, let me know
what version works best for you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem in Page Cache Replacement

2012-11-22 Thread metin d
Hi Johannes,

Yes, problem was as you projected. I tried to make "active" data-2 pages by 
manually reading them twice, and finally data-1 are got out of page cache.

We have large files in PostgreSQL and Hadoop that we sequentially scan over; 
and try to fit our working set into total memory. So I hope your patches will 
take place in the soonest linux kernel version.

Thanks,
Metin


- Original Message -
From: Johannes Weiner 
To: Jaegeuk Hanse 
Cc: Jan Kara ; metin d ; 
"linux-kernel@vger.kernel.org" ; 
linux...@kvack.org
Sent: Thursday, November 22, 2012 3:09 AM
Subject: Re: Problem in Page Cache Replacement

On Thu, Nov 22, 2012 at 08:48:07AM +0800, Jaegeuk Hanse wrote:
> On 11/22/2012 05:34 AM, Johannes Weiner wrote:
> >Hi,
> >
> >On Tue, Nov 20, 2012 at 07:25:00PM +0100, Jan Kara wrote:
> >>On Tue 20-11-12 09:42:42, metin d wrote:
> >>>I have two PostgreSQL databases named data-1 and data-2 that sit on the
> >>>same machine. Both databases keep 40 GB of data, and the total memory
> >>>available on the machine is 68GB.
> >>>
> >>>I started data-1 and data-2, and ran several queries to go over all their
> >>>data. Then, I shut down data-1 and kept issuing queries against data-2.
> >>>For some reason, the OS still holds on to large parts of data-1's pages
> >>>in its page cache, and reserves about 35 GB of RAM to data-2's files. As
> >>>a result, my queries on data-2 keep hitting disk.
> >>>
> >>>I'm checking page cache usage with fincore. When I run a table scan query
> >>>against data-2, I see that data-2's pages get evicted and put back into
> >>>the cache in a round-robin manner. Nothing happens to data-1's pages,
> >>>although they haven't been touched for days.
> >>>
> >>>Does anybody know why data-1's pages aren't evicted from the page cache?
> >>>I'm open to all kind of suggestions you think it might relate to problem.
> >This might be because we do not deactive pages as long as there is
> >cache on the inactive list.  I'm guessing that the inter-reference
> >distance of data-2 is bigger than half of memory, so it's never
> >getting activated and data-1 is never challenged.
> 
> Hi Johannes,
> 
> What's the meaning of "inter-reference distance"

It's the number of memory accesses between two accesses to the same
page:

  A B C D A B C E ...
    |___|
    |       |

> and why compare it with half of memoy, what's the trick?

If B gets accessed twice, it gets activated.  If it gets evicted in
between, the second access will be a fresh page fault and B will not
be recognized as frequently used.

Our cutoff for scanning the active list is cache size / 2 right now
(inactive_file_is_low), leaving 50% of memory to the inactive list.
If the inter-reference distance for pages on the inactive list is
bigger than that, they get evicted before their second access.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem in Page Cache Replacement

2012-11-23 Thread metin d
- Original Message -

From: Jaegeuk Hanse 
To: metin d 
Cc: Jan Kara ; "linux-kernel@vger.kernel.org" 
; linux...@kvack.org
Sent: Friday, November 23, 2012 3:58 AM
Subject: Re: Problem in Page Cache Replacement

On 11/21/2012 02:25 AM, Jan Kara wrote:
> On Tue 20-11-12 09:42:42, metin d wrote:
>> I have two PostgreSQL databases named data-1 and data-2 that sit on the
>> same machine. Both databases keep 40 GB of data, and the total memory
>> available on the machine is 68GB.
>>
>> I started data-1 and data-2, and ran several queries to go over all their
>> data. Then, I shut down data-1 and kept issuing queries against data-2.
>> For some reason, the OS still holds on to large parts of data-1's pages
>> in its page cache, and reserves about 35 GB of RAM to data-2's files. As
>> a result, my queries on data-2 keep hitting disk.
>>
>> I'm checking page cache usage with fincore. When I run a table scan query
>> against data-2, I see that data-2's pages get evicted and put back into
>> the cache in a round-robin manner. Nothing happens to data-1's pages,
>> although they haven't been touched for days.

> Hi metin d,

> fincore is a tool or ...? How could I get it?

> Regards,
> Jaegeuk


Hi Jaegeuk,

Yes, it is a tool, you get it from here :
http://code.google.com/p/linux-ftools/


Regards,
Metin
>>
>> Does anybody know why data-1's pages aren't evicted from the page cache?
>> I'm open to all kind of suggestions you think it might relate to problem.
>    Curious. Added linux-mm list to CC to catch more attention. If you run
> echo 1 >/proc/sys/vm/drop_caches
>    does it evict data-1 pages from memory?
>
>> This is an EC2 m2.4xlarge instance on Amazon with 68 GB of RAM and no
>> swap space. The kernel version is:
>>
>> $ uname -r
>> 3.2.28-45.62.amzn1.x86_64
>> Edit:
>>
>> and it seems that I use one NUMA instance, if  you think that it can a 
>> problem.
>>
>> $ numactl --hardware
>> available: 1 nodes (0)
>> node 0 cpus: 0 1 2 3 4 5 6 7
>> node 0 size: 70007 MB
>> node 0 free: 360 MB
>> node distances:
>> node   0
>>    0:  10
>                                 Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problem in Page Cache Replacement

2012-11-23 Thread metin d
- Original Message -

From: Jaegeuk Hanse 
To: metin d 
Cc: Jan Kara ; "linux-kernel@vger.kernel.org" 
; "linux...@kvack.org" 
Sent: Friday, November 23, 2012 10:17 AM
Subject: Re: Problem in Page Cache Replacement

On 11/23/2012 04:08 PM, metin d wrote:
> - Original Message -
>
> From: Jaegeuk Hanse 
> To: metin d 
> Cc: Jan Kara ; "linux-kernel@vger.kernel.org" 
> ; linux...@kvack.org
> Sent: Friday, November 23, 2012 3:58 AM
> Subject: Re: Problem in Page Cache Replacement
>
> On 11/21/2012 02:25 AM, Jan Kara wrote:
>> On Tue 20-11-12 09:42:42, metin d wrote:
>>> I have two PostgreSQL databases named data-1 and data-2 that sit on the
>>> same machine. Both databases keep 40 GB of data, and the total memory
>>> available on the machine is 68GB.
>>>
>>> I started data-1 and data-2, and ran several queries to go over all their
>>> data. Then, I shut down data-1 and kept issuing queries against data-2.
>>> For some reason, the OS still holds on to large parts of data-1's pages
>>> in its page cache, and reserves about 35 GB of RAM to data-2's files. As
>>> a result, my queries on data-2 keep hitting disk.
>>>
>>> I'm checking page cache usage with fincore. When I run a table scan query
>>> against data-2, I see that data-2's pages get evicted and put back into
>>> the cache in a round-robin manner. Nothing happens to data-1's pages,
>>> although they haven't been touched for days.
>> Hi metin d,
>> fincore is a tool or ...? How could I get it?
>> Regards,
>> Jaegeuk
>
> Hi Jaegeuk,
>
> Yes, it is a tool, you get it from here :
> http://code.google.com/p/linux-ftools/


> Hi Metin,

> Could you give me a link to download it? I can't get it from the link 
> you give me. Thanks in advance. :-)

> Regards,
> Jaegeuk

Hi Jaegeuk,

You may need to install mercurial on your system, I'm able to download source 
code with this command:

hg clone https://code.google.com/p/linux-ftools/


Regards,
Metin

>
>
> Regards,
> Metin
>>> Does anybody know why data-1's pages aren't evicted from the page cache?
>>> I'm open to all kind of suggestions you think it might relate to problem.
>>      Curious. Added linux-mm list to CC to catch more attention. If you run
>> echo 1 >/proc/sys/vm/drop_caches
>>      does it evict data-1 pages from memory?
>>
>>> This is an EC2 m2.4xlarge instance on Amazon with 68 GB of RAM and no
>>> swap space. The kernel version is:
>>>
>>> $ uname -r
>>> 3.2.28-45.62.amzn1.x86_64
>>> Edit:
>>>
>>> and it seems that I use one NUMA instance, if  you think that it can a 
>>> problem.
>>>
>>> $ numactl --hardware
>>> available: 1 nodes (0)
>>> node 0 cpus: 0 1 2 3 4 5 6 7
>>> node 0 size: 70007 MB
>>> node 0 free: 360 MB
>>> node distances:
>>> node   0
>>>      0:  10
>>                                  Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/