dule_idle' component of the scheduler. We have developed
a 'token passing' benchmark which attempts to address these issues
(called reflex at the above site). However, I would really like
to get a pointer to a community acceptable workload/benchmark for
these low thread cases.
--
M
duling decisions as contention on the runqueue
locks increase. However, at this point one could argue that
we have moved away from a 'realistic' low task count system load.
> lmbench's lat_ctx for example, and other tools in lmbench trigger various
> scheduler workloads as wel
multi-queue patch I developed, the
scheduler always attempts to make the same global scheduling decisions
as the current scheduler.
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-ke
ons, load balancing algorithms take considerable effort
to get working in a reasonable well performing manner.
>
> Could you make a port of your thing on recent kernels?
There is a 2.4.2 patch on the web page. I'll put out a 2.4.3 patch
as soon as I get some time.
--
Mike Krave
1.661
1024 FRC196.425 6.166
2048 FRC FRC 23.291
4096 FRC FRC 47.117
*FRC = failed to reach confidence level
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux
On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote:
> > Here are some very preliminary numbers from sched_test_yield
> > (which was previously posted to this (lse-tech) list by Bill
> > Hartner).
y secondary
to reducing lock contention within the scheduler. A co-worker down
the hall just ran pgbench (a postgresql db) benchmark and saw
contention on the runqueue lock at 57%. Now, I know nothing about this
benchmark, but it will be interesting to see what happens after
applying my patch.
On Fri, Jan 19, 2001 at 02:30:41AM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote:
> > was less than the number of processors. I'll give the tests a try
> > with a smaller number of threads. I'm also open to suggestions
of
running tasks is less than the number of processors.
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
o tasks that last ran on the
current CPU. In our multi-queue scheduler, tasks on a remote queue
must have high enough priority (to overcome this boost) before being
moved to the local queue.
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
15450
On Thu, Jan 18, 2001 at 05:34:35PM -0800, Mike Kravetz wrote:
> On Fri, Jan 19, 2001 at 02:30:41AM +0100, Andrea Arcangeli wrote:
> > On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote:
> > > was less than the number of processors. I'll give the tests a try
>
On Fri, Jan 19, 2001 at 12:49:21PM -0800, Mike Kravetz showed his lack
of internet slang understanding and wrote:
>
> It was my intention to post IIRC numbers for small thread counts today.
> However, the benchmark (not the system) seems to hang on occasion. This
> occurs on both th
actthreads has to be zero.
Not as currently coded. If two threads try to decrement actthreads
at the same time, there is no guarantee that it will be decremented
twice. That is why you need to put some type of synchronization in
place.
--
Mike Kravetz [EMAIL PROT
esults in
the not too distant future. Until then, we'll be looking into
optimizations to help out the multi-queue scheduler at low
thread counts.
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "u
stem. Is
that an accurate statement?
If the above is accurate, then I am wondering what would be a
good scheduler benchmark for these low task count situations.
I could undo the optimizations in sys_sched_yield() (for testing
purposes only!), and run the existing benchmarks. Can anyone
suggest
://sourceforge.net/projects/lse
Thanks,
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
15450 SW Koll Parkway
Beaverton, OR 97006-6063 (503)578-3494
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the
Ragnar,
Are you sure that was line 115? Could it have been line 515?
Also, do you have any Oops data?
Thanks,
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
15450 SW Koll Parkway
Beaverton, OR 97006-6063 (503)578-3494
On Wed
problems.
I'm curious, is this behavior by design OR are we just getting
lucky?
Thanks,
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
15450 SW Koll Parkway
Beaverton, OR 97006-6063 (503)578-3494
-
To unsubscribe from this lis
George,
I can't answer your question. However, have you noticed that this
lock ordering has changed in the test11 kernel. The new sequence is:
read_lock_irq(&tasklist_lock);
spin_lock(&runqueue_lock);
Perhaps the person who made this change could provide their reasoning.
An
Scheduling Scalability page is at:
http://lse.sourceforge.net/scheduling/
If you are interested in this work, please join the lse-tech
mailing list at:
http://sourceforge.net/projects/lse
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To
? OR Is the reasoning that in
these cases there is so much 'scheduling' activity that we
should force the reschedule?
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kern
ask
wakeups that could potentially be run in parallel (on
separate CPUS with no other serialization in the way)
then you 'might' see some benefit. Those are some big IFs.
I know little about the networking stack or this workload.
Just wanted to explain how this scheduling work 'co
try out some of our scheduler patches
located at:
http://lse.sourceforge.net/scheduling/
I would be interested in your observations.
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux
t case of lock
contention. This was done at the expense of the normal case.
I'm currently working on this situation and expect to have a new
patch out in the not too distant future.
I expect the numbers will get better.
--
Mike Kravetz [EMAIL PROTECTED]
I
On Mon, Apr 04, 2005 at 10:50:09AM -0700, Dave Hansen wrote:
diff -puN mm/Kconfig~A6-mm-Kconfig mm/Kconfig
--- memhotplug/mm/Kconfig~A6-mm-Kconfig 2005-04-04 09:04:48.0 -0700
+++ memhotplug-dave/mm/Kconfig 2005-04-04 10:15:23.0 -0700
@@ -0,0 +1,25 @@
> +choice
> + prompt "Memor
FLAT for others.
--
Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]>
diff -Naupr linux-2.6.12-rc2-mm1/arch/ppc64/Kconfig
linux-2.6.12-rc2-mm1.work/arch/ppc64/Kconfig
--- linux-2.6.12-rc2-mm1/arch/ppc64/Kconfig 2005-04-05 18:44:57.0
+
+++ linux-2.6.12-rc2-mm1.work/arch
antics correct, but we also need to be aware of performance in
the non-realtime case.
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [E
threshold value as opposed to 1.
My guess is that the threshold value was changed from 0 to
1 in the 2.4 kernel for better performance with some workload.
Anyone remember what that workload was/is?
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology C
On Thu, Aug 04, 2005 at 03:19:52PM -0700, Christoph Lameter wrote:
> This code already exist in the memory hotplug code base and Ray already
> had a working implementation for page migration. The migration code will
> also be necessary in order to relocate pages with ECC single bit failures
> th
On Tue, Oct 02, 2007 at 07:06:32AM +0200, Ingo Molnar wrote:
> * Mike Kravetz <[EMAIL PROTECTED]> wrote:
> >
> > My observations/debugging/conclusions are based on an earlier version
> > of the code. It appears the same code/issue still exists in the most
> > v
On Tue, Oct 02, 2007 at 07:06:32AM +0200, Ingo Molnar wrote:
> Index: linux-rt-rebase.q/kernel/sched.c
> ===
> --- linux-rt-rebase.q.orig/kernel/sched.c
> +++ linux-rt-rebase.q/kernel/sched.c
> @@ -1819,6 +1819,13 @@ out_set_cpu:
>
Hi Ingo,
After applying the fix to try_to_wake_up() I was still seeing some large
latencies for realtime tasks. Some debug code pointed out two additional
causes of these latencies. I have put fixes into my 'old' kernel and the
scheduler related latencies have gone away. I'm pretty confident th
On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote:
> After applying the fix to try_to_wake_up() I was still seeing some large
> latencies for realtime tasks.
I've been looking for places in the code where reschedule IPIs should
be sent in the case of 'overload' to
On Mon, Oct 08, 2007 at 11:04:12PM -0400, Steven Rostedt wrote:
> On Mon, Oct 08, 2007 at 11:45:23AM -0700, Mike Kravetz wrote:
> > Are these accurate statements? I'll start working on a reliable delivery
> > mechanism for RealTime scheduling. But, I just want to make sur
On Mon, Oct 08, 2007 at 10:46:21PM -0400, Steven Rostedt wrote:
> Mike,
>
> Can you attach your Signed-off-by to this patch, please.
>
>
> On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote:
> > Hi Ingo,
> >
> > After applying the fix to try_to_wak
On Tue, Oct 09, 2007 at 01:59:37PM -0400, Steven Rostedt wrote:
> This has been complied tested (and no more ;-)
>
> The idea here is when we find a situation that we just scheduled in an
> RT task and we either pushed a lesser RT task away or more than one RT
> task was scheduled on this CPU befo
On Tue, Oct 09, 2007 at 04:50:47PM -0400, Steven Rostedt wrote:
> > I did something like this a while ago for another scheduling project.
> > A couple 'possible' optimizations to think about are:
> > 1) Only scan the remote runqueues once and keep a local copy of the
> >remote priorities for su
On Wed, Oct 10, 2007 at 10:49:35AM -0400, Gregory Haskins wrote:
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3e75c62..b7f7a96 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1869,7 +1869,8 @@ out_activate:
>* extra locking in this particular case, because
>
On Wed, Oct 10, 2007 at 07:50:52AM -0400, Steven Rostedt wrote:
> On Tue, Oct 09, 2007 at 11:49:53AM -0700, Mike Kravetz wrote:
> > The more I try understand the IPI handling the more confused I get. :(
> > At fist I was concerned about an IPI happening in the middle of the
> &g
5 and OpenPower 720.
--
Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]>
diff -Naupr linux-2.6.11.4/arch/ppc64/mm/numa.c
linux-2.6.11.4.work/arch/ppc64/mm/numa.c
--- linux-2.6.11.4/arch/ppc64/mm/numa.c 2005-03-16 00:09:31.0 +
+++ linux-2.6.11.4.work/arch/ppc64/mm/numa.c2005-
On Thu, Feb 17, 2005 at 04:03:53PM -0800, Dave Hansen wrote:
> The attached patch
Just tried to compile this and noticed that there is no definition
of valid_section_nr(), referenced in sparse_init.
--
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of
prox equal to the number of CPUs yet
scheduler performance has gone downhill.
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTE
ue waiting for a CPU? If there are other
tasks on the runqueue, isn't it possible that another task has a
higher goodness value than the task being awakened. In such a case,
isn't is possible that the awakened task could sit on the runqueue
(waiting for a CPU) while tasks with
that higher-priority process?
No. reschedule_idle() never directly performs a 'task to task' context
switch itself. Instead, it simply marks a currently running task to
indicate that a reschedule is needed on that task's CPU. No task context
switch will occur until schedule() is
I haven't had any problem fully utilizing 8 CPUs on 2.4.* kernels. This
may seem obvious, but do you have more than 4 CPUs worth of work for the
system to do? What is the runqueue length during this benchmark?
--
Mike Kravetz [EMAIL PROTECTED]
IBM Linux Te
On Thu, Mar 10, 2005 at 02:36:13AM -0800, Andrew Morton wrote:
>
> This patch causes the non-numa G5 to oops very early in boot in
> smp_call_function().
>
OK - Let me take a look.
--
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EM
On Fri, Mar 11, 2005 at 07:51:38PM +1100, Paul Mackerras wrote:
>
> Anyway, the ultimate reason seems to be that the numa.c code is
> assuming that an address value and a size value occupy the same number
> of cells. On the G5 we have #address-cells = 2 but #size-cells = 1.
> Previously this didn
is on a machine known to break with
the previous version (such as G5).
--
Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]>
diff -Naupr linux-2.6.11/arch/ppc64/mm/numa.c
linux-2.6.11.work/arch/ppc64/mm/numa.c
--- linux-2.6.11/arch/ppc64/mm/numa.c 2005-03-02 07:38:38.0 +
+++ l
On Wed, May 09, 2001 at 11:29:22AM -0500, Andrew M. Theurer wrote:
>
> I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a
> workload with Samba, and I wanted to get some feedback on results so
> far.
Do you have any kernel profile or lock contention data?
--
On Wed, Mar 23, 2005 at 11:11:10PM +1100, Michael Ellerman wrote:
>
> Can you test this on your 720 or whatever it was? And if anyone else
> has an interesting NUMA machine they can test it on I'd love to hear
> about it!
>
I've tested this with various config options on my 720. Appears to
work
On Wed, Dec 13, 2006 at 07:20:57PM +0100, Arnd Bergmann wrote:
> After a lot of debugging in spufs, I found that a crash that we encountered
> on Cell actually was caused by a change in the memory management.
>
> The patch that caused it is archived in http://lkml.org/lkml/2006/11/1/43,
> and this
either case.
The other thing that needs to be changed is the locking in
dissolve_free_huge_page(). I believe the lock only needs to be held if
we are removing the huge page from the pool. It is not a correctness
but performance issue.
--
Mike Kravetz
>
>
> Gerald Schae
a revalidation needs to be done while holding the lock.
That question made me think about huge page reservations. I don't think
the offline code takes this into account. But, you would not want your
huge page count to drop below the reserved huge page count
(resv_huge_pages).
So, shouldn't this be another condition to check before allowing the huge
page to be dissolved?
--
Mike Kravetz
prevented removal of
* the reserve map region for a page. The huge page itself was free'ed
* and removed from the page cache. This routine will adjust the subpool
* usage count, and the global reserve count if needed. By incrementing
* these counts, the reserve map entry which could n
On 09/23/2016 07:56 PM, zhong jiang wrote:
> On 2016/9/24 1:19, Mike Kravetz wrote:
>> On 09/22/2016 06:53 PM, zhong jiang wrote:
>>> At present, we need to call hugetlb_fix_reserve_count when
>>> hugetlb_unrserve_pages fails,
>>> and PagePrivate will decide
a V2 patch with the non-hugepage/non-thp build error fixed.
I'd really like to fix this without adding another #ifdef to the routine.
--
Mike Kravetz
>
> url:
> https://github.com/0day-ci/linux/commits/Mike-Kravetz/sparc64-mm-Fix-more-TSB-sizing-issues/20160831-054025
> bas
PAGE_SIZE pages should be used to size the huge
page TSB. A new compile time constant REAL_HPAGE_PER_HPAGE is used
to multiply hugetlb_pte_count before sizing the TSB.
Changes from V1
- Fixed build issue if hugetlb or THP not configured
Signed-off-by: Mike Kravetz
---
arch/sparc/include/asm/page_6
spin_unlock(&hugetlb_lock);
>
Adding Naoya as he was the original author of this code.
>From quick look it appears that the huge page will be migrated (allocated
on another node). If my understanding is correct, then max_huge_pages
should not be adjusted here.
--
Mike Kravetz
hugetlbfs users pre-allocate pages for their use, and
this 'might' be something useful for contiguous allocations as well.
I wonder if going down the path of a separate devide/filesystem/etc for
contiguous allocations might be a better option. It would keep the
implementation somewhat separate. However, I would then be afraid that
we end up with another 'separate/special vm' as in the case of hugetlbfs
today.
--
Mike Kravetz
On 10/16/2017 11:07 AM, Michal Hocko wrote:
> On Mon 16-10-17 10:43:38, Mike Kravetz wrote:
>> Just to be clear, the posix standard talks about a typed memory object.
>> The suggested implementation has one create a connection to the memory
>> object to receive a fd, then use
On 10/16/2017 02:03 PM, Laura Abbott wrote:
> On 10/16/2017 01:32 PM, Mike Kravetz wrote:
>> On 10/16/2017 11:07 AM, Michal Hocko wrote:
>>> On Mon 16-10-17 10:43:38, Mike Kravetz wrote:
>>>> Just to be clear, the posix standard talks about a typed memory object.
>
initialization sequence well enough to know if it would be
possible for driver code to make CMA reservations. But, it looks doubtful.
--
Mike Kravetz
On 10/31/2017 11:40 AM, Marc-André Lureau wrote:
> The functions are called through shmem_fcntl() only.
And no danger in removing the EXPORTs as the routines only work
with shmem file structs.
>
> Signed-off-by: Marc-André Lureau
Reviewed-by: Mike Kravetz
--
Mike Kravetz
> --
FS defined without CONFIG_TMPFS
is unlikely, but I think possible. Based on the above #ifdef/#else, I
think hugetlbfs seals will not work if CONFIG_TMPFS is not defined.
--
Mike Kravetz
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 37260c5e12fa..b7811979611f 100644
> --- a/m
to be accessed by code in mm/shmem.c
for file sealing operations. Move inode information definition from .c
file to header for needed access.
--
Mike Kravetz
>
> Signed-off-by: Marc-André Lureau
> ---
> fs/hugetlbfs/inode.c| 10 --
> include/linux/hugetlb.h | 10
GROW: added similar check as shmem_setattr() & shmem_fallocate()
>
> Except write() operation that doesn't exist with hugetlbfs, that
> should make sealing as close as it can be to shmem support.
>
> Signed-off-by: Marc-André Lureau
Looks fine to me,
Reviewed-by: Mike Krave
urn &HUGETLBFS_I(file_inode(file))->seals;
> +#endif
> +
> + return NULL;
> +}
> +
As mentioned in patch 2, I think this code will need to be restructured
so that hugetlbfs file sealing will work even is CONFIG_TMPFS is not
defined. The above routine is behind #ifdef C
On 10/23/2017 03:10 PM, Dave Hansen wrote:
> On 10/03/2017 04:56 PM, Mike Kravetz wrote:
>> mmap(MAP_CONTIG) would have the following semantics:
>> - The entire mapping (length size) would be backed by physically contiguous
>> pages.
>> - If 'length' p
ly be added here as well. Although, in practice one does tend
to use a single huge pages size. If you change the default huge page
size, then those entries will be in /proc/meminfo.
--
Mike Kravetz
h. The 'trick' is coming up with a name or
description that is not confusing. Unfortunately, we have to leave the
existing entries. So, this new entry will be greater than or equal to
HugePages_Total. :( I guess Hugetlb is as good of a name as any?
--
Mike Kravetz
outstanding issue is sorting out the config option dependencies. Although,
IMO this is not a strict requirement for this series. I have addressed this
issue in a follow on series:
http://lkml.kernel.org/r/20171109014109.21077-1-mike.krav...@oracle.com
--
Mike Kravetz
On 11/07/2017 04:27 AM, Marc-André
gt;> think hugetlbfs seals will not work if CONFIG_TMPFS is not defined.
>
> Good point, memfd_create() will not exists either.
>
> I think this is a separate concern, and preexisting from this patch series
> though.
Ah yes. I should have addressed this when adding hugetlb
header for needed access.
>
> Ok, Does the patch get your Reviewed-by tag with that change?
>
> thanks
>
Yes, you can add
Reviewed-by: Mike Kravetz
with an updated commit message.
--
Mike Kravetz
ate. So, we do not really
need to worry about those special (a)io cases for hugetlbfs.
--
Mike Kravetz
> you need to make sure there are no page references
> left around. For instance, on shmem any process might trigger the
> kernel to GUP mapped shmem pages for asynchronous IO
On 11/03/2017 10:41 AM, David Herrmann wrote:
> Hi
>
> On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz wrote:
>> On 11/03/2017 10:03 AM, David Herrmann wrote:
>>> Hi
>>>
>>> On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau
>>> wrote:
>>&g
h.
>>
>> Ah yes. I should have addressed this when adding hugetlbfs memfd_create
>> support.
>>
>> Of course, one 'simple' way to address this would be to make CONFIG_HUGETLBFS
>> depend on CONFIG_TMPFS. Not sure what people think about this?
>
On 11/03/2017 10:56 AM, Mike Kravetz wrote:
> On 11/03/2017 10:41 AM, David Herrmann wrote:
>> Hi
>>
>> On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz wrote:
>>> On 11/03/2017 10:03 AM, David Herrmann wrote:
>>>> Hi
>>>>
>>>&
d_str = MEMFD_HUGE_STR;
else
memfd_str = MEMFD_STR;
then prepend output strings with memfd_str.
This is just a suggestion and optional.
--
Mike Kravetz
>
> Signed-off-by: Marc-André Lureau
> ---
> tools/testing/selftests/memfd/memfd_test.c | 150
> +++
total number of huge
pages to zero, the poisoned page will be counted as 'surplus'.
I was thinking about keeping at least a bad page count (if not
a list) to avoid user confusion. It may be overkill as I have
not given too much thought to this issue. Anyone else have
thoughts here?
Mi
ge in unrecoverable
memory error")
Cc: Naoya Horiguchi
Cc: Michal Hocko
Cc: Aneesh Kumar
Cc: Anshuman Khandual
Cc: Andrew Morton
Cc:
Signed-off-by: Mike Kravetz
---
fs/hugetlbfs/inode.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlb
empt to mirror private anon mapping will fail.
>
> Suggested-by: Mike Kravetz
> Signed-off-by: Anshuman Khandual
The tests themselves look fine. However, they are pretty simple and
could very easily be combined into one 'mremap_mirror.c' file. I
would prefer that they
tation. In addition,
even though applications shouldn't care where new mappings are placed it
would not surprise me that such a change will be noticeable to some.
--
Mike Kravetz
When performing a fallocate hole punch, set up a hugetlb_falloc struct
and make i_private point to it. i_private will point to this struct for
the duration of the operation. At the end of the operation, wake up
anyone who faulted on the hole and is on the waitq.
Signed-off-by: Mike Kravetz
hugetlb_fault_mutex_table could be
used for races with small hole punch operations. However, we need something
that will work for large holes as well.
Mike Kravetz (3):
mm/hugetlb: Define hugetlb_falloc structure for hole punch race
mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch
mm/hugetlb: page
A hugetlb_falloc structure is pointed to by i_private during fallocate
hole punch operations. Page faults check this structure and if they are
in the hole, wait for the operation to finish.
Signed-off-by: Mike Kravetz
---
include/linux/hugetlb.h | 10 ++
1 file changed, 10 insertions
At page fault time, check i_private which indicates a fallocate hole punch
is in progress. If the fault falls within the hole, wait for the hole
punch operation to complete before proceeding with the fault.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 37
When performing a fallocate hole punch, set up a hugetlb_falloc struct
and make i_private point to it. i_private will point to this struct for
the duration of the operation. At the end of the operation, wake up
anyone who faulted on the hole and is on the waitq.
Signed-off-by: Mike Kravetz
A hugetlb_falloc structure is pointed to by i_private during fallocate
hole punch operations. Page faults check this structure and if they are
in the hole, wait for the operation to finish.
Signed-off-by: Mike Kravetz
---
include/linux/hugetlb.h | 10 ++
1 file changed, 10 insertions
removing. The unmap within remove_inode_hugepages occurs
with the hugetlb_fault_mutex held so that no other faults can occur
until the page is removed.
The (unmodified) routine hugetlb_vmdelete_list was moved ahead of
remove_inode_hugepages to satisfy the new reference.
Signed-off-by: Mike Kravetz
At page fault time, check i_private which indicates a fallocate hole punch
is in progress. If the fault falls within the hole, wait for the hole
punch operation to complete before proceeding with the fault.
Signed-off-by: Mike Kravetz
---
mm/hugetlb.c | 39
patch 4/4 to unmap single pages in remove_inode_hugepages
Mike Kravetz (4):
mm/hugetlb: Define hugetlb_falloc structure for hole punch race
mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch
mm/hugetlb: page faults check for fallocate hole punch in progress and
wait
mm/hu
On 10/20/2015 05:11 PM, Dave Hansen wrote:
> On 10/20/2015 04:52 PM, Mike Kravetz wrote:
>> if (hole_end > hole_start) {
>> struct address_space *mapping = inode->i_mapping;
>> +DECLARE_WAIT_QUEUE_HEAD_O
Code was added to remove_inode_hugepages that will unmap a page if
it is mapped. i_mmap_lock_write() must be taken during the call
to hugetlb_vmdelete_list(). This is to prevent mappings(vmas) from
being added or deleted while the list of vmas is being examined.
Signed-off-by: Mike Kravetz
On 10/19/2015 04:16 PM, Andrew Morton wrote:
> On Fri, 16 Oct 2015 15:08:29 -0700 Mike Kravetz
> wrote:
>
>> When performing a fallocate hole punch, set up a hugetlb_falloc struct
>> and make i_private point to it. i_private will point to this struct for
>> the dur
On 10/19/2015 04:18 PM, Andrew Morton wrote:
> On Fri, 16 Oct 2015 15:08:27 -0700 Mike Kravetz
> wrote:
>
>> The hugetlbfs fallocate hole punch code can race with page faults. The
>> result is that after a hole punch operation, pages may remain within the
>> hole
On 10/19/2015 07:22 PM, Hugh Dickins wrote:
> On Mon, 19 Oct 2015, Mike Kravetz wrote:
>> On 10/19/2015 04:16 PM, Andrew Morton wrote:
>>> On Fri, 16 Oct 2015 15:08:29 -0700 Mike Kravetz
>>> wrote:
>>
>>>>mutex_lock(&inode->i_mut
On 08/27/2018 12:46 AM, Michal Hocko wrote:
> On Fri 24-08-18 11:08:24, Mike Kravetz wrote:
>> On 08/24/2018 01:41 AM, Michal Hocko wrote:
>>> On Thu 23-08-18 13:59:16, Mike Kravetz wrote:
>>>
>>> Acked-by: Michal Hocko
>>>
>>> One nit be
On 08/27/2018 06:46 AM, Jerome Glisse wrote:
> On Mon, Aug 27, 2018 at 09:46:45AM +0200, Michal Hocko wrote:
>> On Fri 24-08-18 11:08:24, Mike Kravetz wrote:
>>> Here is an updated patch which does as you suggest above.
>> [...]
>>> @@ -1409,6 +1419,32 @@ static
On 08/29/2018 02:11 PM, Jerome Glisse wrote:
> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
>> [...]
>>>> What would be the best
huge_pmd_unshare for hugetlbfs huge pages. If it is a
shared mapping it will be 'unshared' which removes the page table
entry and drops reference on PMD page. After this, flush caches and
TLB.
Signed-off-by: Mike Kravetz
---
I am not %100 sure on the required flushing, so suggestions would be
a
1 - 100 of 1393 matches
Mail list logo