On Tue, 2009-07-07 at 15:06 -0300, Marcelo Tosatti wrote:
> KVM guests with CONFIG_LOCKDEP=y trigger the following warning:
>
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> Pid: 4624, comm: qemu-system-x86 Not tainted 2.6.31-rc2-03981-g3abaf21
> #32
> Call Trace:
On Tue, 2009-07-07 at 15:37 -0300, Marcelo Tosatti wrote:
> >>>
> >>> Is there any way around this other than completly shutting down lockdep?
> >>>
> >>
> >> When we created this the promise was that kvm would only do this on a
> >> fresh mm with only a few vmas, has that changed
> >
> > The
On Tue, 2009-07-07 at 12:25 -0700, Linus Torvalds wrote:
>
> On Tue, 7 Jul 2009, Peter Zijlstra wrote:
> >
> > Another issue, at about >=256 vmas we'll overflow the preempt count. So
> > disabling lockdep will only 'fix' this for a short while, until
On Tue, 2011-06-14 at 22:26 -0300, Glauber Costa wrote:
> On 06/14/2011 07:42 AM, Peter Zijlstra wrote:
> > On Mon, 2011-06-13 at 19:31 -0400, Glauber Costa wrote:
> >> @@ -1981,12 +1987,29 @@ static void update_rq_clock_task(struct rq
> >> *rq, s64 delta)
> >
On Wed, 2011-06-29 at 10:52 +0300, Avi Kivity wrote:
> On 06/13/2011 04:34 PM, Avi Kivity wrote:
> > This patchset exposes an emulated version 1 architectural performance
> > monitoring unit to KVM guests. The PMU is emulated using perf_events,
> > so the host kernel can multiplex host-wide, host-
On Tue, 2011-06-28 at 18:10 +0200, Joerg Roedel wrote:
> On Fri, Jun 17, 2011 at 03:37:29PM +0200, Joerg Roedel wrote:
> > this is the second version of the patch-set to support the AMD
> > guest-/host only bits in the performance counter MSRs. Due to lack of
> > time I havn't looked into emulating
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> + return __touch_steal_time(is_idle, UINT_MAX, NULL);
That wants to be ULLONG_MAX, because max_steal is a u64, with UINT_MAX
the comparison:
+ if (steal > max_steal)
Isn't true per-se and the compiler cannot optimize t
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> + if (static_branch((¶virt_steal_rq_enabled))) {
> + int is_idle;
> + u64 st;
> +
> + is_idle = ((rq->curr != rq->idle) ||
> +
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> @@ -2063,12 +2092,7 @@ static int irqtime_account_si_update(void)
>
> #define sched_clock_irqtime(0)
>
> -static void update_rq_clock_task(struct rq *rq, s64 delta)
> -{
> - rq->clock_task += delta;
> -}
> -
> -#endif /* CONF
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> This patch accounts steal time time in kernel/sched.
> I kept it from last proposal, because I still see advantages
> in it: Doing it here will give us easier access from scheduler
> variables such as the cpu rq. The next patch shows an exam
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> + if (static_branch(¶virt_steal_enabled)) {
How is that going to compile on !CONFIG_PARAVIRT or !x86 in general?
Only x86-PARAVIRT will provide that variable.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> +static inline u64 steal_ticks(u64 steal)
> +{
> + if (unlikely(steal > NSEC_PER_SEC))
> + return steal / TICK_NSEC;
That won't compile on a number of 32bit architecture, use div_u64 or
something similar.
> +
> +
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> + version: guest has to check version before and after grabbing
> + time information and check that they are both equal and even.
> + An odd version indicates an in-progress update.
That's generall
On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> +static noinline bool touch_steal_time(int is_idle)
That noinline is very unlucky there,
> +{
> + u64 steal, st = 0;
> +
> + if (static_branch(¶virt_steal_enabled)) {
> +
> + steal = paravirt_steal_clock(smp_proce
On Thu, 2011-06-30 at 23:50 -0300, Glauber Costa wrote:
> I was under the impression that the proper use of jump labels required
> each label to be tied to a single location. If we make it inline, the
> same key would point to multiple locations, and we would have trouble
> altering all of the lo
On Thu, 2011-06-30 at 23:53 -0300, Glauber Costa wrote:
> On 06/30/2011 06:54 PM, Peter Zijlstra wrote:
> > On Wed, 2011-06-29 at 11:29 -0400, Glauber Costa wrote:
> >> + if (static_branch(¶virt_steal_enabled)) {
> >
> > How is that going to compile on !CONF
On Fri, 2011-07-01 at 17:22 -0400, Glauber Costa wrote:
> @@ -1971,8 +1974,14 @@ static inline u64 steal_ticks(u64 steal)
>
> static void update_rq_clock_task(struct rq *rq, s64 delta)
> {
> - s64 irq_delta;
> -
> +/*
> + * In theory, the compile should just see 0 here, and optimize out t
On Fri, 2011-07-01 at 17:22 -0400, Glauber Costa wrote:
> @@ -3929,6 +3945,23 @@ void account_process_tick(struct task_struct *p, int
> user_tick)
> return;
> }
>
> +#ifdef CONFIG_PARAVIRT
> + if (static_branch(¶virt_steal_enabled)) {
> + u64 steal, st
d,
> prev_steal_time_rq. This is because otherwise, information about time
> accounted in update_process_tick() would never reach us in update_rq_clock().
>
> Signed-off-by: Glauber Costa
> CC: Rik van Riel
> CC: Jeremy Fitzhardinge
> CC: Peter Zijlstra
> CC: Avi Kiv
off-by: Glauber Costa
> CC: Rik van Riel
> CC: Jeremy Fitzhardinge
Acked-by: Peter Zijlstra
Venki, can you have a look at that irqtime_account_process_tick(), I
think adding the steal time up front like this is fine, because it
suffers from the same 'problem' as both irqtime thin
d,
> prev_steal_time_rq. This is because otherwise, information about time
> accounted in update_process_tick() would never reach us in
> update_rq_clock().
>
> Signed-off-by: Glauber Costa
> CC: Rik van Riel
> CC: Jeremy Fitzhardinge
Acked-by: Peter Zijlstra
--
To unsub
On Tue, 2011-07-12 at 10:20 +0300, Avi Kivity wrote:
> Maybe we need a generic "run this function in this task's context"
> mechanism instead. Like an IPI, but targeting tasks instead of cpus.
>
kernel/event/core.c:task_function_call() like?
--
To unsubscribe from this list: send the line "uns
On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
> Similar, but with stronger guarantees: when the function is called,
> current == p, and the task was either sleeping or in userspace.
If the task is sleeping, current can never be p.
--
To unsubscribe from this list: send the line "unsubscri
On Tue, 2011-07-12 at 12:27 +0300, Avi Kivity wrote:
> On 07/12/2011 12:18 PM, Peter Zijlstra wrote:
> > >
> > > The guarantee is that the task was sleeping just before the function is
> > > called. Of course it's woken up to run the function.
> > >
On Tue, 2011-07-12 at 12:16 +0300, Avi Kivity wrote:
> On 07/12/2011 12:14 PM, Peter Zijlstra wrote:
> > On Tue, 2011-07-12 at 12:08 +0300, Avi Kivity wrote:
> > > Similar, but with stronger guarantees: when the function is called,
> > > current == p, and the tas
On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
> Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
> are doing uncore_init() on virtualized CPU which breaks boot.
I think you're the first.. I don't normally use kvm if I can at all
avoid it.
But I think its a 'simple'
On Thu, 2012-08-16 at 14:06 +0300, Avi Kivity wrote:
> Another option is to deal with them on the host side. That has the
> benefit of working with non-Linux guests too.
Right, its an insane amount of MSRs though, but it could be done if
someone takes the time to enumerate them all.
If KVM then
On Fri, 2012-08-17 at 09:40 +0800, Yan, Zheng wrote:
>
> Peter, do I need to submit a patch disables uncore on virtualized CPU?
>
I think Avi prefers the method where KVM 'fakes' the MSRs and we have to
detect if the MSRs actually work or not.
If you're willing to have a go at that, please do so
On Sun, 2012-08-19 at 12:55 +0300, Avi Kivity wrote:
> > I think Avi prefers the method where KVM 'fakes' the MSRs and we have to
> > detect if the MSRs actually work or not.
>
> s/we have/we don't have/.
So for the 'normal' PMU we actually do check to see if the MSRs are
being faked and bail if
On Tue, 2012-08-21 at 11:34 +0300, Avi Kivity wrote:
> On 08/21/2012 10:11 AM, Peter Zijlstra wrote:
> > On Sun, 2012-08-19 at 12:55 +0300, Avi Kivity wrote:
> >> > I think Avi prefers the method where KVM 'fakes' the MSRs and we have to
> >> > detect if
On Thu, 2011-09-01 at 17:55 -0700, Jeremy Fitzhardinge wrote:
> From: Jeremy Fitzhardinge
>
> We need to make sure interrupts are disabled while we're relying on the
> contents of the per-cpu lock_waiting values, otherwise an interrupt
> handler could come in, try to take some other lock, block,
On Thu, 2011-09-01 at 17:55 -0700, Jeremy Fitzhardinge wrote:
> + /* Make sure an interrupt handler can't upset things in a
> + partially setup state. */
> local_irq_save(flags);
>
> + /*
> +* We don't really care if we're overwriting some other
> +* (
On Thu, 2011-09-01 at 17:55 -0700, Jeremy Fitzhardinge wrote:
> From: Srivatsa Vaddagiri
>
> We must release the lock before checking to see if the lock is in
> slowpath or else there's a potential race where the lock enters the
> slow path after the unlocker has checked the slowpath flag, but be
On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote:
>
> > I know that its generally considered bad form, but there's at least one
> > spinlock that's only taken from NMI context and thus hasn't got any
> > deadlock potential.
>
> Which one?
arch/x86/kernel/traps.c:nmi_reason_lock
It
On Fri, 2011-09-02 at 14:50 -0700, Jeremy Fitzhardinge wrote:
> On 09/02/2011 01:47 PM, Peter Zijlstra wrote:
> > On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote:
> >>> I know that its generally considered bad form, but there's at least one
> >>&g
support them if need arise.
Signed-off-by: Gleb Natapov
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-g...@redhat.com
---
arch/x86/include/asm/perf_event.h | 12
arch/x86/kernel/cpu/perf_event.h | 12
arch/x86/kernel/cpu
On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote:
> +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> +{
> + int i, nr_msrs;
> + struct perf_guest_switch_msr *msrs;
> +
> + msrs = perf_guest_get_msrs(&nr_msrs);
> +
> + if (!msrs)
> + return;
> +
On Wed, 2011-10-05 at 15:48 +0200, Avi Kivity wrote:
> On 10/05/2011 02:01 PM, Gleb Natapov wrote:
> > This patch series consists of Joerg series named "perf support for amd
> > guest/host-only bits v2" [1] rebased to 3.1.0-rc7 and in addition,
> > support for intel cpus for the same functionality.
On Wed, 2011-10-05 at 17:29 +0200, Gleb Natapov wrote:
> On Wed, Oct 05, 2011 at 04:19:39PM +0200, Peter Zijlstra wrote:
> > On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote:
> > > +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
> > > +{
&
On Wed, 2011-10-12 at 17:51 -0700, Jeremy Fitzhardinge wrote:
>
> This is is all unnecessary complication if you're not using PV ticket
> locks, it also uses the jump-label machinery to use the standard
> "add"-based unlock in the non-PV case.
>
> if (TICKET_SLOWPATH_FLAG &&
>
On Thu, 2011-11-03 at 14:33 +0200, Gleb Natapov wrote:
> @@ -1580,6 +1580,8 @@ __init int intel_pmu_init(void)
> x86_pmu.num_counters= eax.split.num_counters;
> x86_pmu.cntval_bits = eax.split.bit_width;
> x86_pmu.cntval_mask = (1ULL << ea
On Thu, 2011-11-03 at 14:33 +0200, Gleb Natapov wrote:
> + case 0xa: { /* Architectural Performance Monitoring */
> + struct x86_pmu_capability cap;
> +
> + perf_get_x86_pmu_capability(&cap);
> +
> + /*
> +* Only support guest architec
On Thu, 2011-11-03 at 14:33 +0200, Gleb Natapov wrote:
> @@ -35,6 +35,7 @@ config KVM
> select KVM_MMIO
> select TASKSTATS
> select TASK_DELAY_ACCT
> + select PERF_EVENTS
Do you really want to make that an unconditional part of KVM? I know we
can't currently build x8
On Thu, 2011-11-03 at 14:33 +0200, Gleb Natapov wrote:
> +static void kvm_perf_overflow_intr(struct perf_event *perf_event,
> + struct perf_sample_data *data, struct pt_regs *regs)
> +{
> + struct kvm_pmc *pmc = perf_event->overflow_handler_context;
> + struct kvm_pmu *pmu
On Thu, 2011-11-03 at 14:33 +0200, Gleb Natapov wrote:
> +static u64 read_pmc(struct kvm_pmc *pmc)
> +{
> + u64 counter, enabled, running;
> +
> + counter = pmc->counter;
> +
> + if (pmc->perf_event)
> + counter += perf_event_read_value(pmc->perf_event,
> +
On Mon, 2011-11-07 at 16:46 +0200, Avi Kivity wrote:
> On 11/07/2011 04:34 PM, Peter Zijlstra wrote:
> > On Thu, 2011-11-03 at 14:33 +0200, Gleb Natapov wrote:
> > > +static void kvm_perf_overflow_intr(struct perf_event *perf_event,
> > > + struct per
On Mon, 2011-11-07 at 17:41 +0200, Gleb Natapov wrote:
> > > + entry->eax = min(cap.version, 2)
> > > + | (cap.num_counters_gp << 8)
> > > + | (cap.bit_width_gp << 16)
> > > + | (cap.events_mask_len << 24);
> Do you
On Mon, 2011-11-07 at 17:53 +0200, Gleb Natapov wrote:
> I removed branch-miss-retired here because for perf user it exists. Perf
> approximates it by other event but perf user shouldn't know that. A
> guest is not always runs with exactly same cpu model number as a host,
> so if we will not drop t
On Mon, 2011-11-07 at 17:25 +0200, Avi Kivity wrote:
> On 11/07/2011 05:19 PM, Gleb Natapov wrote:
> > >
> > > note, this needs a fairly huge PMI skew to happen.
> > >
> > No, it need not. It is enough to get exit reason as hlt instead of nmi
> > for a vcpu to go to blocking state instead of reen
On Mon, 2011-11-07 at 18:22 +0200, Gleb Natapov wrote:
> > Right, so what model number do you expose?
> Depends on what management wants. You can specify -cpu Nehalem or -cpu
> Conroe or even override model manually by doing -cpu host,model=15.
Oh cute ;-)
--
To unsubscribe from this list: send
On Mon, 2011-11-07 at 17:25 +0200, Gleb Natapov wrote:
> > Since the below programming doesn't use perf_event_attr::pinned, yes.
> >
> Yes, that is on todo :). Actually I do want to place all guest perf
> counters into the same event group and make it pinned. But currently perf
> event groups are
On Tue, 2011-11-08 at 11:22 +0100, Ingo Molnar wrote:
>
> We do even more than that, the perf ABI is fully backwards *and*
> forwards compatible: you can run older perf on newer ABIs and newer
> perf on older ABIs.
The ABI yes, the tool no, the tool very much relies on some newer ABI
parts. Su
On Tue, 2011-11-08 at 13:15 +0100, Ingo Molnar wrote:
>
> The one notable thing that isnt being tested in a natural way is the
> 'group of events' abstraction - which, ironically, has been added on
> the perfmon guys' insistence. No app beyond the PAPI self-test makes
> actual use of it though,
On Tue, 2011-11-08 at 14:49 +0200, Gleb Natapov wrote:
> > It might make sense to introduce cpuid10_ebx or so, also I think the
> cpuid10_ebx will have only one field though (event_mask).
>
> > At the very least add a full ebx iteration to disable unsupported events
> > in the intel-v1 case.
> I d
On Tue, 2011-11-08 at 15:54 +0200, Gleb Natapov wrote:
> Isn't it better to introduce mapping between ebx bits and architectural
> events and do for_each_set_bit loop?
Probably, but I only thought of that halfway through ;-)
> But I wouldn't want to introduce
> patch as below as part of this ser
On Tue, 2011-11-08 at 16:18 +0200, Gleb Natapov wrote:
> On Tue, Nov 08, 2011 at 03:12:27PM +0100, Peter Zijlstra wrote:
> > On Tue, 2011-11-08 at 15:54 +0200, Gleb Natapov wrote:
> > > Isn't it better to introduce mapping between ebx bits and architectural
> > >
On Tue, 2011-11-08 at 13:59 +0100, Ingo Molnar wrote:
>
> > Also the self monitor stuff, perf-tool doesn't use that for obvious
> > reasons.
>
> Indeed, and that's PAPI's strong point.
>
> We could try to utilize it via some clever LD_PRELOAD trickery?
Wouldn't be really meaningful, a perf-tes
On Wed, 2011-11-09 at 10:33 -0200, Arnaldo Carvalho de Melo wrote:
>
> Ingo, would that G+ page be useful for that?
>
*groan*
Can we please keep things sane?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo inf
On Thu, 2011-11-10 at 14:57 +0200, Gleb Natapov wrote:
> +
> + /* disable event that reported as not presend by cpuid */
> + for_each_set_bit(bit, x86_pmu.events_mask,
> + min(x86_pmu.events_mask_len, x86_pmu.max_events))
> + intel_perfmon_event_map[i
On Thu, 2011-11-10 at 14:57 +0200, Gleb Natapov wrote:
> This patchset exposes an emulated version 2 architectural performance
> monitoring unit to KVM guests. The PMU is emulated using perf_events,
> so the host kernel can multiplex host-wide, host-user, and the
> guest on available resources.
>
On Mon, 2011-11-21 at 20:48 +0530, Bharata B Rao wrote:
> I looked at Peter's recent work in this area.
> (https://lkml.org/lkml/2011/11/17/204)
>
> It introduces two interfaces:
>
> 1. ms_tbind() to bind a thread to a memsched(*) group
> 2. ms_mbind() to bind a memory region to memsched group
>
On Mon, 2011-11-21 at 21:30 +0530, Bharata B Rao wrote:
>
> In the original post of this mail thread, I proposed a way to export
> guest RAM ranges (Guest Physical Address-GPA) and their corresponding host
> host virtual mappings (Host Virtual Address-HVA) from QEMU (via QEMU monitor).
> The idea
On Mon, 2011-11-21 at 20:03 +0200, Avi Kivity wrote:
>
> Does ms_mbind() require that its vmas in its area be completely
> contained in the region, or does it split vmas on demand? I suggest the
> latter to avoid exposing implementation details.
as implemented (which is still rather incomplete)
On Fri, 2011-01-14 at 03:03 -0500, Rik van Riel wrote:
> From: Mike Galbraith
>
> Currently only implemented for fair class tasks.
>
> Add a yield_to_task method() to the fair scheduling class. allowing the
> caller of yield_to() to accelerate another thread in it's thread group,
> task group.
>
On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote:
> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
>
> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
> is woken up because of an event like interrupt.
>
> KVM_HC_KICK_CPU allows the callin
On Wed, 2011-01-19 at 22:53 +0530, Srivatsa Vaddagiri wrote:
> On Wed, Jan 19, 2011 at 10:42:39PM +0530, Srivatsa Vaddagiri wrote:
> > Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
> >
> > KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or
> > it
> > is
On Thu, 2011-01-20 at 17:29 +0530, Srivatsa Vaddagiri wrote:
>
> If we had a yield-to [1] sort of interface _and_ information on which vcpu
> owns a lock, then lock-spinners can yield-to the owning vcpu,
and then I'd nak it for being stupid ;-)
really, yield*() is retarded, never even consider
On Thu, 2011-01-20 at 16:33 -0500, Rik van Riel wrote:
> The clear_buddies function does not seem to play well with the concept
> of hierarchical runqueues. In the following tree, task groups are
> represented by 'G', tasks by 'T', next by 'n' and last by 'l'.
>
> (nl)
> /\
>G(nl
On Thu, 2011-01-20 at 16:33 -0500, Rik van Riel wrote:
> Use the buddy mechanism to implement yield_task_fair. This
> allows us to skip onto the next highest priority se at every
> level in the CFS tree, unless doing so would introduce gross
> unfairness in CPU time distribution.
>
> We order the
On Thu, 2011-01-20 at 16:34 -0500, Rik van Riel wrote:
> From: Mike Galbraith
>
> Currently only implemented for fair class tasks.
>
> Add a yield_to_task method() to the fair scheduling class. allowing the
> caller of yield_to() to accelerate another thread in it's thread group,
> task group.
>
t scheduler
> would wrongly think that all cpus have the same ability to run processes,
> lowering the overall throughput.
>
> Signed-off-by: Glauber Costa
> CC: Rik van Riel
> CC: Jeremy Fitzhardinge
> CC: Peter Zijlstra
> CC: Avi Kivity
> ---
> include/linux/sched.
On Mon, 2011-01-24 at 16:51 -0200, Glauber Costa wrote:
> > I would really much rather see you change update_rq_clock_task() and
> > subtract your ns resolution steal time from our wall-time,
> > update_rq_clock_task() already updates the cpu_power relative to the
> > remaining time available.
>
>
On Mon, 2011-01-24 at 16:51 -0200, Glauber Costa wrote:
>
> > I thought kvm had a ns resolution steal-time clock?
> Yes, the one I introduced earlier in this series is nsec. However, user
> and system will be accounted in usec at most, so there is no point in
> using nsec here.
Well, the schedule
On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:
> I fail to see how does clock_task influence cpu power.
> If we also have to touch clock_task for better accounting of other
> stuff, it is a separate story.
> But for cpu_power, I really fail. Please enlighten me.
static void update_rq_clo
On Tue, 2011-01-25 at 18:47 -0200, Glauber Costa wrote:
> On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
> > On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:
> >
> > > I fail to see how does clock_task influence cpu power.
> > > If we also ha
On Tue, 2011-01-25 at 19:27 -0200, Glauber Costa wrote:
> On Tue, 2011-01-25 at 22:07 +0100, Peter Zijlstra wrote:
> > On Tue, 2011-01-25 at 18:47 -0200, Glauber Costa wrote:
> > > On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
> > > > On Tue, 2011-01-25
On Wed, 2011-01-26 at 13:43 -0200, Glauber Costa wrote:
> yes, but once this delta is subtracted from rq->clock_task, this value is not
> used to dictate power, unless I am mistaken.
>
> power is adjusted according to scale_rt_power(), which does it using the
> values of rq->rt_avg, rq->age_stamp
On Wed, 2011-01-26 at 17:46 +0100, Peter Zijlstra wrote:
> it uses a per-cpu virt_steal_time() clock which is
> expected to return steal-time in ns.
This clock should return u64 and wrap on u64 and be provided when
CONFIG_SCHED_PARAVIRT.
--
To unsubscribe from this list: send th
On Fri, 2011-01-28 at 14:52 -0500, Glauber Costa wrote:
> + u64 to = (get_kernel_ns() - vcpu->arch.this_time_out);
> + /*
> +* using nanoseconds introduces noise, which accumulates
> easily
> +* leading to big steal time values. We want,
On Fri, 2011-01-28 at 14:52 -0500, Glauber Costa wrote:
> + /*
> +* using nanoseconds introduces noise, which accumulates easily
> +* leading to big steal time values. We want, however, to keep the
> +* interface nanosecond-based for future-proofness. The hypervisor ma
On Fri, 2011-01-28 at 14:52 -0500, Glauber Costa wrote:
> +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> +static DEFINE_PER_CPU(u64, cpu_steal_time);
> +
> +#ifndef CONFIG_64BIT
> +static DEFINE_PER_CPU(seqcount_t, steal_time_seq);
> +
> +static inline void steal_time_write_begin(void)
> +{
> + __t
On Mon, 2011-01-31 at 12:25 +0100, Peter Zijlstra wrote:
> On Fri, 2011-01-28 at 14:52 -0500, Glauber Costa wrote:
>
> > +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> > +static DEFINE_PER_CPU(u64, cpu_steal_time);
> > +
> > +#ifndef CONFIG_64BIT
> > +static DEFIN
On Wed, 2011-01-26 at 17:21 -0500, Rik van Riel wrote:
> +static struct sched_entity *__pick_second_entity(struct cfs_rq *cfs_rq)
> +{
> + struct rb_node *left = cfs_rq->rb_leftmost;
> + struct rb_node *second;
> +
> + if (!left)
> + return NULL;
> +
> + second = rb_nex
On Wed, 2011-01-26 at 17:21 -0500, Rik van Riel wrote:
> +bool __sched yield_to(struct task_struct *p, bool preempt)
> +{
> + struct task_struct *curr = current;
> + struct rq *rq, *p_rq;
> + unsigned long flags;
> + bool yielded = 0;
> +
> + local_irq_save(flags);
> +
On Wed, 2011-01-26 at 17:23 -0500, Rik van Riel wrote:
> Export the symbols required for a race-free kvm_vcpu_on_spin.
Avi, you asked for an example of why I hated KVM as a module :-)
> Signed-off-by: Rik van Riel
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3b159c5..adc8f47 100644
>
On Mon, 2011-01-31 at 15:26 +0200, Avi Kivity wrote:
> On 01/31/2011 01:51 PM, Peter Zijlstra wrote:
> > On Wed, 2011-01-26 at 17:23 -0500, Rik van Riel wrote:
> > > Export the symbols required for a race-free kvm_vcpu_on_spin.
> >
> > Avi, you asked for an example
On Mon, 2011-01-31 at 16:40 -0500, Rik van Riel wrote:
>
> v8:
> - some more changes and cleanups suggested by Peter
Did you, by accident, send out the -v7 patches again? I don't think I've
spotted a difference..
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a
On Tue, 2011-02-01 at 09:50 -0500, Rik van Riel wrote:
> +/**
> + * yield_to - yield the current processor to another thread in
> + * your thread group, or accelerate that thread toward the
> + * processor it's on.
> + *
> + * It's the caller's job to ensure that the target task struct
> + * can't
On Tue, 2011-02-01 at 09:51 -0500, Rik van Riel wrote:
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -375,13 +375,6 @@ static struct ctl_table kern_table[] = {
> .mode = 0644,
> .proc_handler = sched_rt_handler,
> },
> - {
> -
On Tue, 2011-02-01 at 13:53 -0200, Glauber Costa wrote:
>
> And since the granularity of the cpu accounting is too coarse, we end up
> with much more steal time than we should, because things that are less
> than 1 unity of cputime, are often rounded up to 1 unity of cputime.
See, that! is the pr
On Tue, 2011-02-01 at 13:59 -0200, Glauber Costa wrote:
>
> Because that part is kvm-specific, and this is scheduler general.
> It seemed cleaner to me to do it this way. But I can do it differently,
> certainly.
Well, any steal time clock will be hypervisor specific, but if we agree
that anythi
On Tue, 2011-02-01 at 15:00 -0200, Glauber Costa wrote:
>
> > What you can do is: steal_ticks = steal_time_clock() / TICK_NSEC, or
> > simply keep a steal time delta and every time it overflows
> > cputime_one_jiffy insert a steal-time tick.
>
> What do you think about keeping accounting in msec/
On Tue, 2011-02-01 at 14:22 -0200, Glauber Costa wrote:
>
>
> Which tick accounting? In your other e-mail , you pointed that this only
> runs in touch_steal_time, which is fine, will change.
That tick ;-), all the account_foo muck is per tick.
> But all the rest
> here, that is behind the hype
On Tue, 2011-02-01 at 17:55 -0200, Glauber Costa wrote:
>
> update_rq_clock_task still have to keep track of what was the last steal
> time value we saw, in the same way it does for irq.
Right, the CONFIG_SCHED_PARAVIRT patch I sent earlier adds a
prev_steal_time member to struct rq for this purp
On Tue, 2011-02-01 at 09:51 -0500, Rik van Riel wrote:
> -static void yield_task_fair(struct rq *rq)
> -{
> - struct task_struct *curr = rq->curr;
> - struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> - struct sched_entity *rightmost, *se = &curr->se;
> -
> - /*
> -* Are
On Tue, 2011-02-01 at 09:50 -0500, Rik van Riel wrote:
> +bool __sched yield_to(struct task_struct *p, bool preempt)
> +{
> + struct task_struct *curr = current;
> + struct rq *rq, *p_rq;
> + unsigned long flags;
> + bool yielded = 0;
> +
> + local_irq_save(flags);
> +
On Fri, 2011-02-11 at 13:19 -0500, Glauber Costa wrote:
> static void update_rq_clock_task(struct rq *rq, s64 delta)
> {
> + s64 irq_delta = 0, steal = 0;
>
> +#ifdef CONFIG_IRQ_TIME_ACCOUNTING
> irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
>
> /*
> @@ -1926,20
On Fri, 2011-02-11 at 13:19 -0500, Glauber Costa wrote:
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d747f94..5dbf509 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -302,6 +302,7 @@ long io_schedule_timeout(long timeout);
> extern void cpu_init (vo
On Tue, 2011-02-15 at 16:35 +0200, Avi Kivity wrote:
> On 02/11/2011 08:19 PM, Glauber Costa wrote:
> > This patch accounts steal time time in kernel/sched.
> > I kept it from last proposal, because I still see advantages
> > in it: Doing it here will give us easier access from scheduler
> > variab
On Tue, 2011-02-15 at 17:17 +0200, Avi Kivity wrote:
>
> Ah, so we're all set. Do you know if any user tools process this
> information?
I suppose there are, I bet Jeremy knows, Xen after all supports this
stuff ;-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body
1 - 100 of 522 matches
Mail list logo