Re: a quest for a better scheduler

2001-04-03 Thread Mike Kravetz
dule_idle' component of the scheduler. We have developed a 'token passing' benchmark which attempts to address these issues (called reflex at the above site). However, I would really like to get a pointer to a community acceptable workload/benchmark for these low thread cases. -- M

Re: a quest for a better scheduler

2001-04-03 Thread Mike Kravetz
duling decisions as contention on the runqueue locks increase. However, at this point one could argue that we have moved away from a 'realistic' low task count system load. > lmbench's lat_ctx for example, and other tools in lmbench trigger various > scheduler workloads as wel

Re: a quest for a better scheduler

2001-04-03 Thread Mike Kravetz
multi-queue patch I developed, the scheduler always attempts to make the same global scheduling decisions as the current scheduler. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-ke

Re: a quest for a better scheduler

2001-04-04 Thread Mike Kravetz
ons, load balancing algorithms take considerable effort to get working in a reasonable well performing manner. > > Could you make a port of your thing on recent kernels? There is a 2.4.2 patch on the web page. I'll put out a 2.4.3 patch as soon as I get some time. -- Mike Krave

multi-queue scheduler update

2001-01-18 Thread Mike Kravetz
1.661 1024 FRC196.425 6.166 2048 FRC FRC 23.291 4096 FRC FRC 47.117 *FRC = failed to reach confidence level -- Mike Kravetz [EMAIL PROTECTED] IBM Linux

Re: [Lse-tech] Re: multi-queue scheduler update

2001-01-18 Thread Mike Kravetz
On Fri, Jan 19, 2001 at 01:26:16AM +0100, Andrea Arcangeli wrote: > On Thu, Jan 18, 2001 at 03:53:11PM -0800, Mike Kravetz wrote: > > Here are some very preliminary numbers from sched_test_yield > > (which was previously posted to this (lse-tech) list by Bill > > Hartner).

Re: multi-queue scheduler update

2001-01-18 Thread Mike Kravetz
y secondary to reducing lock contention within the scheduler. A co-worker down the hall just ran pgbench (a postgresql db) benchmark and saw contention on the runqueue lock at 57%. Now, I know nothing about this benchmark, but it will be interesting to see what happens after applying my patch.

Re: [Lse-tech] Re: multi-queue scheduler update

2001-01-18 Thread Mike Kravetz
On Fri, Jan 19, 2001 at 02:30:41AM +0100, Andrea Arcangeli wrote: > On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote: > > was less than the number of processors. I'll give the tests a try > > with a smaller number of threads. I'm also open to suggestions

Re: [Lse-tech] Re: multi-queue scheduler update

2001-01-19 Thread Mike Kravetz
of running tasks is less than the number of processors. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/

Re: [Lse-tech] Re: multi-queue scheduler update

2001-01-19 Thread Mike Kravetz
o tasks that last ran on the current CPU. In our multi-queue scheduler, tasks on a remote queue must have high enough priority (to overcome this boost) before being moved to the local queue. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center 15450

Re: [Lse-tech] Re: multi-queue scheduler update

2001-01-19 Thread Mike Kravetz
On Thu, Jan 18, 2001 at 05:34:35PM -0800, Mike Kravetz wrote: > On Fri, Jan 19, 2001 at 02:30:41AM +0100, Andrea Arcangeli wrote: > > On Thu, Jan 18, 2001 at 04:52:25PM -0800, Mike Kravetz wrote: > > > was less than the number of processors. I'll give the tests a try >

Re: [Lse-tech] Re: multi-queue scheduler update

2001-01-19 Thread Mike Kravetz
On Fri, Jan 19, 2001 at 12:49:21PM -0800, Mike Kravetz showed his lack of internet slang understanding and wrote: > > It was my intention to post IIRC numbers for small thread counts today. > However, the benchmark (not the system) seems to hang on occasion. This > occurs on both th

Re: [Lse-tech] Re: multi-queue scheduler update

2001-01-19 Thread Mike Kravetz
actthreads has to be zero. Not as currently coded. If two threads try to decrement actthreads at the same time, there is no guarantee that it will be decremented twice. That is why you need to put some type of synchronization in place. -- Mike Kravetz [EMAIL PROT

Re: multi-queue scheduler update

2001-01-19 Thread Mike Kravetz
esults in the not too distant future. Until then, we'll be looking into optimizations to help out the multi-queue scheduler at low thread counts. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "u

more on scheduler benchmarks

2001-01-22 Thread Mike Kravetz
stem. Is that an accurate statement? If the above is accurate, then I am wondering what would be a good scheduler benchmark for these low task count situations. I could undo the optimizations in sys_sched_yield() (for testing purposes only!), and run the existing benchmarks. Can anyone suggest

Scheduler Scalability CFP

2000-11-16 Thread Mike Kravetz
://sourceforge.net/projects/lse Thanks, -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center 15450 SW Koll Parkway Beaverton, OR 97006-6063 (503)578-3494 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the

Re: test12pre6: BUG in schedule (sched.c, 115)

2000-12-06 Thread Mike Kravetz
Ragnar, Are you sure that was line 115? Could it have been line 515? Also, do you have any Oops data? Thanks, -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center 15450 SW Koll Parkway Beaverton, OR 97006-6063 (503)578-3494 On Wed

test9: running tasks not in run-queue

2000-11-08 Thread Mike Kravetz
problems. I'm curious, is this behavior by design OR are we just getting lucky? Thanks, -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center 15450 SW Koll Parkway Beaverton, OR 97006-6063 (503)578-3494 - To unsubscribe from this lis

Re: Lock ordering, inquiring minds want to know.

2000-12-07 Thread Mike Kravetz
George, I can't answer your question. However, have you noticed that this lock ordering has changed in the test11 kernel. The new sequence is: read_lock_irq(&tasklist_lock); spin_lock(&runqueue_lock); Perhaps the person who made this change could provide their reasoning. An

Scheduling Scalability Update

2000-12-15 Thread Mike Kravetz
Scheduling Scalability page is at: http://lse.sourceforge.net/scheduling/ If you are interested in this work, please join the lse-tech mailing list at: http://sourceforge.net/projects/lse -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To

sys_sched_yield fast path

2001-03-09 Thread Mike Kravetz
? OR Is the reasoning that in these cases there is so much 'scheduling' activity that we should force the reschedule? -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kern

Re: How to optimize routing performance

2001-03-15 Thread Mike Kravetz
ask wakeups that could potentially be run in parallel (on separate CPUS with no other serialization in the way) then you 'might' see some benefit. Those are some big IFs. I know little about the networking stack or this workload. Just wanted to explain how this scheduling work 'co

Re: linux scheduler limitations?

2001-03-29 Thread Mike Kravetz
try out some of our scheduler patches located at: http://lse.sourceforge.net/scheduling/ I would be interested in your observations. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux

Re: [Lse-tech] multi-queue scheduler update

2001-01-23 Thread Mike Kravetz
t case of lock contention. This was done at the expense of the normal case. I'm currently working on this situation and expect to have a new patch out in the not too distant future. I expect the numbers will get better. -- Mike Kravetz [EMAIL PROTECTED] I

Re: [PATCH 1/4] create mm/Kconfig for arch-independent memory options

2005-04-04 Thread Mike Kravetz
On Mon, Apr 04, 2005 at 10:50:09AM -0700, Dave Hansen wrote: diff -puN mm/Kconfig~A6-mm-Kconfig mm/Kconfig --- memhotplug/mm/Kconfig~A6-mm-Kconfig 2005-04-04 09:04:48.0 -0700 +++ memhotplug-dave/mm/Kconfig 2005-04-04 10:15:23.0 -0700 @@ -0,0 +1,25 @@ > +choice > + prompt "Memor

[PATCH] ppc64 Kconfig memory models

2005-04-05 Thread Mike Kravetz
FLAT for others. -- Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]> diff -Naupr linux-2.6.12-rc2-mm1/arch/ppc64/Kconfig linux-2.6.12-rc2-mm1.work/arch/ppc64/Kconfig --- linux-2.6.12-rc2-mm1/arch/ppc64/Kconfig 2005-04-05 18:44:57.0 + +++ linux-2.6.12-rc2-mm1.work/arch

reschedule_idle changes in ac kernels

2001-06-04 Thread Mike Kravetz
antics correct, but we also need to be aware of performance in the non-realtime case. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [E

Re: reschedule_idle changes in ac kernels

2001-06-05 Thread Mike Kravetz
threshold value as opposed to 1. My guess is that the threshold value was changed from 0 to 1 in the 2.4 kernel for better performance with some workload. Anyone remember what that workload was/is? -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology C

Re: NUMA policy interface

2005-08-04 Thread Mike Kravetz
On Thu, Aug 04, 2005 at 03:19:52PM -0700, Christoph Lameter wrote: > This code already exist in the memory hotplug code base and Ray already > had a working implementation for page migration. The migration code will > also be necessary in order to relocate pages with ECC single bit failures > th

Re: -rt scheduling: wakeup bug?

2007-10-02 Thread Mike Kravetz
On Tue, Oct 02, 2007 at 07:06:32AM +0200, Ingo Molnar wrote: > * Mike Kravetz <[EMAIL PROTECTED]> wrote: > > > > My observations/debugging/conclusions are based on an earlier version > > of the code. It appears the same code/issue still exists in the most > > v

Re: -rt scheduling: wakeup bug?

2007-10-03 Thread Mike Kravetz
On Tue, Oct 02, 2007 at 07:06:32AM +0200, Ingo Molnar wrote: > Index: linux-rt-rebase.q/kernel/sched.c > === > --- linux-rt-rebase.q.orig/kernel/sched.c > +++ linux-rt-rebase.q/kernel/sched.c > @@ -1819,6 +1819,13 @@ out_set_cpu: >

-rt more realtime scheduling issues

2007-10-05 Thread Mike Kravetz
Hi Ingo, After applying the fix to try_to_wake_up() I was still seeing some large latencies for realtime tasks. Some debug code pointed out two additional causes of these latencies. I have put fixes into my 'old' kernel and the scheduler related latencies have gone away. I'm pretty confident th

Re: -rt more realtime scheduling issues

2007-10-08 Thread Mike Kravetz
On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote: > After applying the fix to try_to_wake_up() I was still seeing some large > latencies for realtime tasks. I've been looking for places in the code where reschedule IPIs should be sent in the case of 'overload' to

Re: -rt more realtime scheduling issues

2007-10-09 Thread Mike Kravetz
On Mon, Oct 08, 2007 at 11:04:12PM -0400, Steven Rostedt wrote: > On Mon, Oct 08, 2007 at 11:45:23AM -0700, Mike Kravetz wrote: > > Are these accurate statements? I'll start working on a reliable delivery > > mechanism for RealTime scheduling. But, I just want to make sur

Re: [PATCH RT] fix rt-task scheduling issue

2007-10-09 Thread Mike Kravetz
On Mon, Oct 08, 2007 at 10:46:21PM -0400, Steven Rostedt wrote: > Mike, > > Can you attach your Signed-off-by to this patch, please. > > > On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote: > > Hi Ingo, > > > > After applying the fix to try_to_wak

Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.

2007-10-09 Thread mike kravetz
On Tue, Oct 09, 2007 at 01:59:37PM -0400, Steven Rostedt wrote: > This has been complied tested (and no more ;-) > > The idea here is when we find a situation that we just scheduled in an > RT task and we either pushed a lesser RT task away or more than one RT > task was scheduled on this CPU befo

Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.

2007-10-09 Thread mike kravetz
On Tue, Oct 09, 2007 at 04:50:47PM -0400, Steven Rostedt wrote: > > I did something like this a while ago for another scheduling project. > > A couple 'possible' optimizations to think about are: > > 1) Only scan the remote runqueues once and keep a local copy of the > >remote priorities for su

Re: [PATCH] RT: Fix special-case exception for preempting the local CPU

2007-10-10 Thread mike kravetz
On Wed, Oct 10, 2007 at 10:49:35AM -0400, Gregory Haskins wrote: > diff --git a/kernel/sched.c b/kernel/sched.c > index 3e75c62..b7f7a96 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -1869,7 +1869,8 @@ out_activate: >* extra locking in this particular case, because >

Re: -rt more realtime scheduling issues

2007-10-10 Thread Mike Kravetz
On Wed, Oct 10, 2007 at 07:50:52AM -0400, Steven Rostedt wrote: > On Tue, Oct 09, 2007 at 11:49:53AM -0700, Mike Kravetz wrote: > > The more I try understand the IPI handling the more confused I get. :( > > At fist I was concerned about an IPI happening in the middle of the > &g

[PATCH] PPC64 NUMA memory fixup (another try)

2005-03-16 Thread Mike Kravetz
5 and OpenPower 720. -- Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]> diff -Naupr linux-2.6.11.4/arch/ppc64/mm/numa.c linux-2.6.11.4.work/arch/ppc64/mm/numa.c --- linux-2.6.11.4/arch/ppc64/mm/numa.c 2005-03-16 00:09:31.0 + +++ linux-2.6.11.4.work/arch/ppc64/mm/numa.c2005-

Re: [RFC][PATCH] Sparse Memory Handling (hot-add foundation)

2005-02-17 Thread Mike Kravetz
On Thu, Feb 17, 2005 at 04:03:53PM -0800, Dave Hansen wrote: > The attached patch Just tried to compile this and noticed that there is no definition of valid_section_nr(), referenced in sparse_init. -- Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: Threads FAQ entry incomplete

2001-06-20 Thread Mike Kravetz
prox equal to the number of CPUs yet scheduler performance has gone downhill. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTE

Re: wake_up vs. wake_up_sync

2001-06-27 Thread Mike Kravetz
ue waiting for a CPU? If there are other tasks on the runqueue, isn't it possible that another task has a higher goodness value than the task being awakened. In such a case, isn't is possible that the awakened task could sit on the runqueue (waiting for a CPU) while tasks with

Re: wake_up vs. wake_up_sync

2001-06-27 Thread Mike Kravetz
that higher-priority process? No. reschedule_idle() never directly performs a 'task to task' context switch itself. Instead, it simply marks a currently running task to indicate that a reschedule is needed on that task's CPU. No task context switch will occur until schedule() is

Re: Strange thread behaviour on 8-way x86 machine

2001-07-03 Thread Mike Kravetz
I haven't had any problem fully utilizing 8 CPUs on 2.4.* kernels. This may seem obvious, but do you have more than 4 CPUs worth of work for the system to do? What is the runqueue length during this benchmark? -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Te

Re: [PATCH] PPC64 NUMA memory fixup

2005-03-10 Thread mike kravetz
On Thu, Mar 10, 2005 at 02:36:13AM -0800, Andrew Morton wrote: > > This patch causes the non-numa G5 to oops very early in boot in > smp_call_function(). > OK - Let me take a look. -- Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EM

Re: [PATCH] PPC64 NUMA memory fixup

2005-03-11 Thread mike kravetz
On Fri, Mar 11, 2005 at 07:51:38PM +1100, Paul Mackerras wrote: > > Anyway, the ultimate reason seems to be that the numa.c code is > assuming that an address value and a size value occupy the same number > of cells. On the G5 we have #address-cells = 2 but #size-cells = 1. > Previously this didn

Re: [PATCH] PPC64 NUMA memory fixup

2005-03-11 Thread mike kravetz
is on a machine known to break with the previous version (such as G5). -- Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]> diff -Naupr linux-2.6.11/arch/ppc64/mm/numa.c linux-2.6.11.work/arch/ppc64/mm/numa.c --- linux-2.6.11/arch/ppc64/mm/numa.c 2005-03-02 07:38:38.0 + +++ l

Re: Linux 2.4 Scalability, Samba, and Netbench

2001-05-09 Thread Mike Kravetz
On Wed, May 09, 2001 at 11:29:22AM -0500, Andrew M. Theurer wrote: > > I am evaluating Linux 2.4 SMP scalability, using Netbench(r) as a > workload with Samba, and I wanted to get some feedback on results so > far. Do you have any kernel profile or lock contention data? --

Re: [PATCH] ppc64: Add mem=X option, updated NUMA support

2005-03-23 Thread Mike kravetz
On Wed, Mar 23, 2005 at 11:11:10PM +1100, Michael Ellerman wrote: > > Can you test this on your 720 or whatever it was? And if anyone else > has an interesting NUMA machine they can test it on I'd love to hear > about it! > I've tested this with various config options on my 720. Appears to work

Re: Bug: early_pfn_in_nid() called when not early

2006-12-13 Thread Mike Kravetz
On Wed, Dec 13, 2006 at 07:20:57PM +0100, Arnd Bergmann wrote: > After a lot of debugging in spufs, I found that a crash that we encountered > on Cell actually was caused by a change in the memory management. > > The patch that caused it is archived in http://lkml.org/lkml/2006/11/1/43, > and this

Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size

2016-09-20 Thread Mike Kravetz
either case. The other thing that needs to be changed is the locking in dissolve_free_huge_page(). I believe the lock only needs to be held if we are removing the huge page from the pool. It is not a correctness but performance issue. -- Mike Kravetz > > > Gerald Schae

Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size

2016-09-22 Thread Mike Kravetz
a revalidation needs to be done while holding the lock. That question made me think about huge page reservations. I don't think the offline code takes this into account. But, you would not want your huge page count to drop below the reserved huge page count (resv_huge_pages). So, shouldn't this be another condition to check before allowing the huge page to be dissolved? -- Mike Kravetz

Re: [RFC] remove unnecessary condition in remove_inode_hugepages

2016-09-23 Thread Mike Kravetz
prevented removal of * the reserve map region for a page. The huge page itself was free'ed * and removed from the page cache. This routine will adjust the subpool * usage count, and the global reserve count if needed. By incrementing * these counts, the reserve map entry which could n

Re: [RFC] remove unnecessary condition in remove_inode_hugepages

2016-09-24 Thread Mike Kravetz
On 09/23/2016 07:56 PM, zhong jiang wrote: > On 2016/9/24 1:19, Mike Kravetz wrote: >> On 09/22/2016 06:53 PM, zhong jiang wrote: >>> At present, we need to call hugetlb_fix_reserve_count when >>> hugetlb_unrserve_pages fails, >>> and PagePrivate will decide

Re: [PATCH] sparc64 mm: Fix more TSB sizing issues

2016-08-30 Thread Mike Kravetz
a V2 patch with the non-hugepage/non-thp build error fixed. I'd really like to fix this without adding another #ifdef to the routine. -- Mike Kravetz > > url: > https://github.com/0day-ci/linux/commits/Mike-Kravetz/sparc64-mm-Fix-more-TSB-sizing-issues/20160831-054025 > bas

[PATCH v2] sparc64 mm: Fix more TSB sizing issues

2016-08-31 Thread Mike Kravetz
PAGE_SIZE pages should be used to size the huge page TSB. A new compile time constant REAL_HPAGE_PER_HPAGE is used to multiply hugetlb_pte_count before sizing the TSB. Changes from V1 - Fixed build issue if hugetlb or THP not configured Signed-off-by: Mike Kravetz --- arch/sparc/include/asm/page_6

Re: [PATCH] mm: fix the incorrect hugepages count

2016-08-08 Thread Mike Kravetz
spin_unlock(&hugetlb_lock); > Adding Naoya as he was the original author of this code. >From quick look it appears that the huge page will be migrated (allocated on another node). If my understanding is correct, then max_huge_pages should not be adjusted here. -- Mike Kravetz

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Mike Kravetz
hugetlbfs users pre-allocate pages for their use, and this 'might' be something useful for contiguous allocations as well. I wonder if going down the path of a separate devide/filesystem/etc for contiguous allocations might be a better option. It would keep the implementation somewhat separate. However, I would then be afraid that we end up with another 'separate/special vm' as in the case of hugetlbfs today. -- Mike Kravetz

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Mike Kravetz
On 10/16/2017 11:07 AM, Michal Hocko wrote: > On Mon 16-10-17 10:43:38, Mike Kravetz wrote: >> Just to be clear, the posix standard talks about a typed memory object. >> The suggested implementation has one create a connection to the memory >> object to receive a fd, then use

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Mike Kravetz
On 10/16/2017 02:03 PM, Laura Abbott wrote: > On 10/16/2017 01:32 PM, Mike Kravetz wrote: >> On 10/16/2017 11:07 AM, Michal Hocko wrote: >>> On Mon 16-10-17 10:43:38, Mike Kravetz wrote: >>>> Just to be clear, the posix standard talks about a typed memory object. >

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-17 Thread Mike Kravetz
initialization sequence well enough to know if it would be possible for driver code to make CMA reservations. But, it looks doubtful. -- Mike Kravetz

Re: [PATCH 1/6] shmem: unexport shmem_add_seals()/shmem_get_seals()

2017-11-01 Thread Mike Kravetz
On 10/31/2017 11:40 AM, Marc-André Lureau wrote: > The functions are called through shmem_fcntl() only. And no danger in removing the EXPORTs as the routines only work with shmem file structs. > > Signed-off-by: Marc-André Lureau Reviewed-by: Mike Kravetz -- Mike Kravetz > --

Re: [PATCH 2/6] shmem: rename functions that are memfd-related

2017-11-01 Thread Mike Kravetz
FS defined without CONFIG_TMPFS is unlikely, but I think possible. Based on the above #ifdef/#else, I think hugetlbfs seals will not work if CONFIG_TMPFS is not defined. -- Mike Kravetz > diff --git a/mm/shmem.c b/mm/shmem.c > index 37260c5e12fa..b7811979611f 100644 > --- a/m

Re: [PATCH 3/6] hugetlb: expose hugetlbfs_inode_info in header

2017-11-01 Thread Mike Kravetz
to be accessed by code in mm/shmem.c for file sealing operations. Move inode information definition from .c file to header for needed access. -- Mike Kravetz > > Signed-off-by: Marc-André Lureau > --- > fs/hugetlbfs/inode.c| 10 -- > include/linux/hugetlb.h | 10

Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

2017-11-01 Thread Mike Kravetz
GROW: added similar check as shmem_setattr() & shmem_fallocate() > > Except write() operation that doesn't exist with hugetlbfs, that > should make sealing as close as it can be to shmem support. > > Signed-off-by: Marc-André Lureau Looks fine to me, Reviewed-by: Mike Krave

Re: [PATCH 5/6] shmem: add sealing support to hugetlb-backed memfd

2017-11-01 Thread Mike Kravetz
urn &HUGETLBFS_I(file_inode(file))->seals; > +#endif > + > + return NULL; > +} > + As mentioned in patch 2, I think this code will need to be restructured so that hugetlbfs file sealing will work even is CONFIG_TMPFS is not defined. The above routine is behind #ifdef C

Re: [RFC] mmap(MAP_CONTIG)

2017-10-24 Thread Mike Kravetz
On 10/23/2017 03:10 PM, Dave Hansen wrote: > On 10/03/2017 04:56 PM, Mike Kravetz wrote: >> mmap(MAP_CONTIG) would have the following semantics: >> - The entire mapping (length size) would be backed by physically contiguous >> pages. >> - If 'length' p

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Mike Kravetz
ly be added here as well. Although, in practice one does tend to use a single huge pages size. If you change the default huge page size, then those entries will be in /proc/meminfo. -- Mike Kravetz

Re: [PATCH] mm: show stats for non-default hugepage sizes in /proc/meminfo

2017-11-13 Thread Mike Kravetz
h. The 'trick' is coming up with a name or description that is not confusing. Unfortunately, we have to leave the existing entries. So, this new entry will be greater than or equal to HugePages_Total. :( I guess Hugetlb is as good of a name as any? -- Mike Kravetz

Re: [PATCH v3 0/9] memfd: add sealing to hugetlb-backed memory

2017-11-14 Thread Mike Kravetz
outstanding issue is sorting out the config option dependencies. Although, IMO this is not a strict requirement for this series. I have addressed this issue in a follow on series: http://lkml.kernel.org/r/20171109014109.21077-1-mike.krav...@oracle.com -- Mike Kravetz On 11/07/2017 04:27 AM, Marc-André

Re: [PATCH 2/6] shmem: rename functions that are memfd-related

2017-11-03 Thread Mike Kravetz
gt;> think hugetlbfs seals will not work if CONFIG_TMPFS is not defined. > > Good point, memfd_create() will not exists either. > > I think this is a separate concern, and preexisting from this patch series > though. Ah yes. I should have addressed this when adding hugetlb

Re: [PATCH 3/6] hugetlb: expose hugetlbfs_inode_info in header

2017-11-03 Thread Mike Kravetz
header for needed access. > > Ok, Does the patch get your Reviewed-by tag with that change? > > thanks > Yes, you can add Reviewed-by: Mike Kravetz with an updated commit message. -- Mike Kravetz

Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

2017-11-03 Thread Mike Kravetz
ate. So, we do not really need to worry about those special (a)io cases for hugetlbfs. -- Mike Kravetz > you need to make sure there are no page references > left around. For instance, on shmem any process might trigger the > kernel to GUP mapped shmem pages for asynchronous IO

Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

2017-11-03 Thread Mike Kravetz
On 11/03/2017 10:41 AM, David Herrmann wrote: > Hi > > On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz wrote: >> On 11/03/2017 10:03 AM, David Herrmann wrote: >>> Hi >>> >>> On Tue, Oct 31, 2017 at 7:40 PM, Marc-André Lureau >>> wrote: >>&g

Re: [PATCH 2/6] shmem: rename functions that are memfd-related

2017-11-03 Thread Mike Kravetz
h. >> >> Ah yes. I should have addressed this when adding hugetlbfs memfd_create >> support. >> >> Of course, one 'simple' way to address this would be to make CONFIG_HUGETLBFS >> depend on CONFIG_TMPFS. Not sure what people think about this? >

Re: [PATCH 4/6] hugetlbfs: implement memfd sealing

2017-11-03 Thread Mike Kravetz
On 11/03/2017 10:56 AM, Mike Kravetz wrote: > On 11/03/2017 10:41 AM, David Herrmann wrote: >> Hi >> >> On Fri, Nov 3, 2017 at 6:12 PM, Mike Kravetz wrote: >>> On 11/03/2017 10:03 AM, David Herrmann wrote: >>>> Hi >>>> >>>&

Re: [PATCH 6/6] memfd-tests: test hugetlbfs sealing

2017-11-03 Thread Mike Kravetz
d_str = MEMFD_HUGE_STR; else memfd_str = MEMFD_STR; then prepend output strings with memfd_str. This is just a suggestion and optional. -- Mike Kravetz > > Signed-off-by: Marc-André Lureau > --- > tools/testing/selftests/memfd/memfd_test.c | 150 > +++

[PATCH 0/1] mm:hugetlbfs: Fix hwpoison reserve accounting

2017-10-19 Thread Mike Kravetz
total number of huge pages to zero, the poisoned page will be counted as 'surplus'. I was thinking about keeping at least a bad page count (if not a list) to avoid user confusion. It may be overkill as I have not given too much thought to this issue. Anyone else have thoughts here? Mi

[PATCH 1/1] mm:hugetlbfs: Fix hwpoison reserve accounting

2017-10-19 Thread Mike Kravetz
ge in unrecoverable memory error") Cc: Naoya Horiguchi Cc: Michal Hocko Cc: Aneesh Kumar Cc: Anshuman Khandual Cc: Andrew Morton Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlb

Re: [PATCH V3] selftests/vm: Add tests validating mremap mirror functionality

2017-10-19 Thread Mike Kravetz
empt to mirror private anon mapping will fail. > > Suggested-by: Mike Kravetz > Signed-off-by: Anshuman Khandual The tests themselves look fine. However, they are pretty simple and could very easily be combined into one 'mremap_mirror.c' file. I would prefer that they

Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL

2017-10-27 Thread Mike Kravetz
tation. In addition, even though applications shouldn't care where new mappings are placed it would not surprise me that such a change will be noticeable to some. -- Mike Kravetz

[PATCH 2/3] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch

2015-10-16 Thread Mike Kravetz
When performing a fallocate hole punch, set up a hugetlb_falloc struct and make i_private point to it. i_private will point to this struct for the duration of the operation. At the end of the operation, wake up anyone who faulted on the hole and is on the waitq. Signed-off-by: Mike Kravetz

[PATCH 0/3] hugetlbfs fallocate hole punch race with page faults

2015-10-16 Thread Mike Kravetz
hugetlb_fault_mutex_table could be used for races with small hole punch operations. However, we need something that will work for large holes as well. Mike Kravetz (3): mm/hugetlb: Define hugetlb_falloc structure for hole punch race mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch mm/hugetlb: page

[PATCH 1/3] mm/hugetlb: Define hugetlb_falloc structure for hole punch race

2015-10-16 Thread Mike Kravetz
A hugetlb_falloc structure is pointed to by i_private during fallocate hole punch operations. Page faults check this structure and if they are in the hole, wait for the operation to finish. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 10 ++ 1 file changed, 10 insertions

[PATCH 3/3] mm/hugetlb: page faults check for fallocate hole punch in progress and wait

2015-10-16 Thread Mike Kravetz
At page fault time, check i_private which indicates a fallocate hole punch is in progress. If the fault falls within the hole, wait for the hole punch operation to complete before proceeding with the fault. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 37

[PATCH v2 2/4] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch

2015-10-20 Thread Mike Kravetz
When performing a fallocate hole punch, set up a hugetlb_falloc struct and make i_private point to it. i_private will point to this struct for the duration of the operation. At the end of the operation, wake up anyone who faulted on the hole and is on the waitq. Signed-off-by: Mike Kravetz

[PATCH v2 1/4] mm/hugetlb: Define hugetlb_falloc structure for hole punch race

2015-10-20 Thread Mike Kravetz
A hugetlb_falloc structure is pointed to by i_private during fallocate hole punch operations. Page faults check this structure and if they are in the hole, wait for the operation to finish. Signed-off-by: Mike Kravetz --- include/linux/hugetlb.h | 10 ++ 1 file changed, 10 insertions

[PATCH v2 4/4] mm/hugetlb: Unmap pages to remove if page fault raced with hole punch

2015-10-20 Thread Mike Kravetz
removing. The unmap within remove_inode_hugepages occurs with the hugetlb_fault_mutex held so that no other faults can occur until the page is removed. The (unmodified) routine hugetlb_vmdelete_list was moved ahead of remove_inode_hugepages to satisfy the new reference. Signed-off-by: Mike Kravetz

[PATCH v2 3/4] mm/hugetlb: page faults check for fallocate hole punch in progress and wait

2015-10-20 Thread Mike Kravetz
At page fault time, check i_private which indicates a fallocate hole punch is in progress. If the fault falls within the hole, wait for the hole punch operation to complete before proceeding with the fault. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 39

[PATCH v2 0/4] hugetlbfs fallocate hole punch race with page faults

2015-10-20 Thread Mike Kravetz
patch 4/4 to unmap single pages in remove_inode_hugepages Mike Kravetz (4): mm/hugetlb: Define hugetlb_falloc structure for hole punch race mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch mm/hugetlb: page faults check for fallocate hole punch in progress and wait mm/hu

Re: [PATCH v2 2/4] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch

2015-10-20 Thread Mike Kravetz
On 10/20/2015 05:11 PM, Dave Hansen wrote: > On 10/20/2015 04:52 PM, Mike Kravetz wrote: >> if (hole_end > hole_start) { >> struct address_space *mapping = inode->i_mapping; >> +DECLARE_WAIT_QUEUE_HEAD_O

[PATCH] mm/hugetlb: i_mmap_lock_write before unmapping in remove_inode_hugepages

2015-10-21 Thread Mike Kravetz
Code was added to remove_inode_hugepages that will unmap a page if it is mapped. i_mmap_lock_write() must be taken during the call to hugetlb_vmdelete_list(). This is to prevent mappings(vmas) from being added or deleted while the list of vmas is being examined. Signed-off-by: Mike Kravetz

Re: [PATCH 2/3] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch

2015-10-19 Thread Mike Kravetz
On 10/19/2015 04:16 PM, Andrew Morton wrote: > On Fri, 16 Oct 2015 15:08:29 -0700 Mike Kravetz > wrote: > >> When performing a fallocate hole punch, set up a hugetlb_falloc struct >> and make i_private point to it. i_private will point to this struct for >> the dur

Re: [PATCH 0/3] hugetlbfs fallocate hole punch race with page faults

2015-10-19 Thread Mike Kravetz
On 10/19/2015 04:18 PM, Andrew Morton wrote: > On Fri, 16 Oct 2015 15:08:27 -0700 Mike Kravetz > wrote: > >> The hugetlbfs fallocate hole punch code can race with page faults. The >> result is that after a hole punch operation, pages may remain within the >> hole

Re: [PATCH 2/3] mm/hugetlb: Setup hugetlb_falloc during fallocate hole punch

2015-10-19 Thread Mike Kravetz
On 10/19/2015 07:22 PM, Hugh Dickins wrote: > On Mon, 19 Oct 2015, Mike Kravetz wrote: >> On 10/19/2015 04:16 PM, Andrew Morton wrote: >>> On Fri, 16 Oct 2015 15:08:29 -0700 Mike Kravetz >>> wrote: >> >>>>mutex_lock(&inode->i_mut

Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages

2018-08-27 Thread Mike Kravetz
On 08/27/2018 12:46 AM, Michal Hocko wrote: > On Fri 24-08-18 11:08:24, Mike Kravetz wrote: >> On 08/24/2018 01:41 AM, Michal Hocko wrote: >>> On Thu 23-08-18 13:59:16, Mike Kravetz wrote: >>> >>> Acked-by: Michal Hocko >>> >>> One nit be

Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages

2018-08-29 Thread Mike Kravetz
On 08/27/2018 06:46 AM, Jerome Glisse wrote: > On Mon, Aug 27, 2018 at 09:46:45AM +0200, Michal Hocko wrote: >> On Fri 24-08-18 11:08:24, Mike Kravetz wrote: >>> Here is an updated patch which does as you suggest above. >> [...] >>> @@ -1409,6 +1419,32 @@ static

Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages

2018-08-29 Thread Mike Kravetz
On 08/29/2018 02:11 PM, Jerome Glisse wrote: > On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote: >> On Wed 29-08-18 14:14:25, Jerome Glisse wrote: >>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote: >> [...] >>>> What would be the best

[PATCH] mm: migration: fix migration of huge PMD shared pages

2018-08-12 Thread Mike Kravetz
huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops reference on PMD page. After this, flush caches and TLB. Signed-off-by: Mike Kravetz --- I am not %100 sure on the required flushing, so suggestions would be a

  1   2   3   4   5   6   7   8   9   10   >