from:"Wolf"

"Shutdown" not shutting down...

2000-10-18 Thread Wolf


I was wondering if someone could explain something to me.  I've got a
dual-boot Red Hat 6.2/Win95B box, AMD K6-500, NVidia / SGS
Thomson|Riva128 video card w/8MB memory, BTC-1817 sound card, 64MB RAM,
and a Phoebe 56K modem.  On the Win95 side, when I shutdown the system
it goes through the shutdown procedure, and at the end it turns off the
box.

In Linux, "shutdown -h now" results in Linux going through the process,
but at the end I get the following:

Call trace: [] [] [] []
[] []
[] [] [] [] []
[]
Code: <1>Unable to handle kernel paging request at virtual address
893d
current->tss.cr3 = 03799000, %cr3 = 03799000
*pde = 
Oops: 
CPU: 0
EIP: 0010: []
EFLAGS: 00010046
eax:893d  ebx:  ecx:893e  edx:0002
esi:0026  edi:c0346000  ebp:c400  esp:c0345d54
ds:0018  ex:0018  ss:0018
Process halt (pid 913, process nr:14, stackpage = c0345000)
Stack: fee1dead 6789 0082 893e
   c480 c010a4c5 c0345da4 c01e3907 c01e39dc
   fdf8 c0345da4 c010a9c2 c01e39dc c0345da4
   fdf8 c0344000 bfff8026 c010a119 c0345da4
   fdf8 0001 0001  bfff8026
Call trace: [] [] [] []
[] [] []
   [] [] [] [] []
[] [] []
   [] [] [] [] []
Code: 0f b6 0c 03 89 4c 24 14 51 68 ff 38 1e c0 e8 4e a3 00 00 83


(the above was all that I could see on the screen. if there was anything
prior to the first Call trace, I'm afraid my tired old eyes missed it. 
this was also all manually copied, and manually entered, but I believe
I've gotten it correctly)

Anyone have any idea what I'm doing wrong here?

Thx,
Wolf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-11-26 Thread Michael Wolf


On 10/22/2012 10:33 AM, Rik van Riel wrote:

On 10/16/2012 10:23 PM, Michael Wolf wrote:

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.


How do s390 and Power systems deal with reporting that kind
of information?

IMHO it would be good to see what those do, so we do not end
up re-inventing the wheel, and confusing admins with yet another
way of reporting the information...

Sorry for the delay in the response.  I'm assuming you are asking about 
s390 and Power lpars.
In the case of lpar on POWER systems they simply report steal time and 
do not alter it in any way.
They do however report how much processor is assigned to the partition 
and that information is

in /proc/ppc64/lparcfg.

Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/5] Alter steal time reporting in KVM

2012-11-26 Thread Michael Wolf

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.  To ease the confusion this patch set
adds the idea of consigned (expected steal) time.  The host will separate
the consigned time from the steal time.  The consignment limit passed to the
host will be the amount of steal time expected within a fixed period of
time.  Any other steal time accruing during that period will show as the
traditional steal time.

---

Michael Wolf (5):
  Alter the amount of steal time reported by the guest.
  Expand the steal time msr to also contain the consigned time.
  Add the code to send the consigned time from the host to the guest
  Add a timer to allow the separation of consigned from steal time.
  Add an ioctl to communicate the consign limit to the host.


 arch/x86/include/asm/kvm_host.h   |   11 +++
 arch/x86/include/asm/kvm_para.h   |3 +-
 arch/x86/include/asm/paravirt.h   |4 +--
 arch/x86/include/asm/paravirt_types.h |2 +
 arch/x86/kernel/kvm.c |8 ++---
 arch/x86/kernel/paravirt.c|4 +--
 arch/x86/kvm/x86.c|   50 -
 fs/proc/stat.c|9 +-
 include/linux/kernel_stat.h   |2 +
 include/linux/kvm_host.h  |2 +
 include/uapi/linux/kvm.h  |2 +
 kernel/sched/core.c   |   10 ++-
 kernel/sched/cputime.c|   21 +-
 kernel/sched/sched.h  |2 +
 virt/kvm/kvm_main.c   |7 +
 15 files changed, 120 insertions(+), 17 deletions(-)

-- 
Signature

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] Alter the amount of steal time reported by the guest.

2012-11-26 Thread Michael Wolf

Modify the amount of stealtime that the kernel reports via the /proc interface.
Steal time will now be broken down into steal_time and consigned_time.
Consigned_time will represent the amount of time that is expected to be lost
due to overcommitment of the physical cpu or by using cpu capping.  The amount
consigned_time will be passed in using an ioctl.  The time will be expressed in
the number of nanoseconds to be lost in during the fixed period.  The fixed 
period
is currently 1/10th of a second.

Signed-off-by: Michael Wolf 
---
 fs/proc/stat.c  |9 +++--
 include/linux/kernel_stat.h |1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index e296572..cb7fe80 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -82,7 +82,7 @@ static int show_stat(struct seq_file *p, void *v)
int i, j;
unsigned long jif;
u64 user, nice, system, idle, iowait, irq, softirq, steal;
-   u64 guest, guest_nice;
+   u64 guest, guest_nice, consign;
u64 sum = 0;
u64 sum_softirq = 0;
unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
@@ -90,10 +90,11 @@ static int show_stat(struct seq_file *p, void *v)
 
user = nice = system = idle = iowait =
irq = softirq = steal = 0;
-   guest = guest_nice = 0;
+   guest = guest_nice = consign = 0;
getboottime(&boottime);
jif = boottime.tv_sec;
 
+
for_each_possible_cpu(i) {
user += kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE];
@@ -105,6 +106,7 @@ static int show_stat(struct seq_file *p, void *v)
steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign += kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
sum += kstat_cpu_irqs_sum(i);
sum += arch_irq_stat_cpu(i);
 
@@ -128,6 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
 
for_each_online_cpu(i) {
@@ -142,6 +145,7 @@ static int show_stat(struct seq_file *p, void *v)
steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign = kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
seq_printf(p, "cpu%d", i);
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice));
@@ -153,6 +157,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
}
seq_printf(p, "intr %llu", (unsigned long long)sum);
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 1865b1f..e5978b0 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -28,6 +28,7 @@ enum cpu_usage_stat {
CPUTIME_STEAL,
CPUTIME_GUEST,
CPUTIME_GUEST_NICE,
+   CPUTIME_CONSIGN,
NR_STATS,
 };
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] Add the code to send the consigned time from the host to the guest

2012-11-26 Thread Michael Wolf

Add the code to send the consigned time from the host to the guest.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/include/asm/kvm_para.h |3 ++-
 arch/x86/include/asm/paravirt.h |4 ++--
 arch/x86/kernel/kvm.c   |3 ++-
 arch/x86/kernel/paravirt.c  |4 ++--
 arch/x86/kvm/x86.c  |2 ++
 include/linux/kernel_stat.h |1 +
 kernel/sched/cputime.c  |   21 +++--
 kernel/sched/sched.h|2 ++
 9 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..434d378 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -426,6 +426,7 @@ struct kvm_vcpu_arch {
u64 msr_val;
u64 last_steal;
u64 accum_steal;
+   u64 accum_consigned;
struct gfn_to_hva_cache stime;
struct kvm_steal_time steal;
} st;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index eb3e9d8..1763369 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -42,9 +42,10 @@
 
 struct kvm_steal_time {
__u64 steal;
+   __u64 consigned;
__u32 version;
__u32 flags;
-   __u32 pad[12];
+   __u32 pad[10];
 };
 
 #define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a5f9f30..d39e8d0 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
-   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
+   PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index ac357b3..4439a5c 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -372,7 +372,7 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static void kvm_steal_clock(int cpu, u64 *steal)
+static void kvm_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
struct kvm_steal_time *src;
int version;
@@ -382,6 +382,7 @@ static void kvm_steal_clock(int cpu, u64 *steal)
version = src->version;
rmb();
*steal = src->steal;
+   *consigned = src->consigned;
rmb();
} while ((version & 1) || (version != src->version));
 }
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..3797683 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -207,9 +207,9 @@ static void native_flush_tlb_single(unsigned long addr)
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
-static u64 native_steal_clock(int cpu)
+static void native_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
-   return 0;
+   *steal = *consigned = 0;
 }
 
 /* These are in entry.S */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1eefebe..683b531 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1565,8 +1565,10 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
return;
 
vcpu->arch.st.steal.steal += vcpu->arch.st.accum_steal;
+   vcpu->arch.st.steal.consigned += vcpu->arch.st.accum_consigned;
vcpu->arch.st.steal.version += 2;
vcpu->arch.st.accum_steal = 0;
+   vcpu->arch.st.accum_consigned = 0;
 
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index e5978b0..91afaa3 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -126,6 +126,7 @@ extern unsigned long long task_delta_exec(struct 
task_struct *);
 extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t, 
cputime_t);
 extern void account_steal_time(cputime_t);
+extern void account_consigned_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
 extern void account_process_tick(struct task_struct *, int user);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 593b647..53bd0be 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -244,6 +244,18 @@ void account_system_time(struct task_struct *p, int 
hardirq_offset,
 }
 
 /*
+ * This accounts for the time that is split out of steal time.
+ *

[PATCH 4/5] Add a timer to allow the separation of consigned from steal time.

2012-11-26 Thread Michael Wolf

Add a timer to the host.  This will define the period.  During a period
the first n ticks will go into the consigned bucket.  Any other ticks that
occur within the period will be placed in the stealtime bucket.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h |   10 +
 arch/x86/include/asm/paravirt.h |2 +-
 arch/x86/kvm/x86.c  |   42 ++-
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 434d378..4794c95 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -41,6 +41,8 @@
 #define KVM_PIO_PAGE_OFFSET 1
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2
 
+#define KVM_STEAL_TIMER_DELAY 1UL
+
 #define CR0_RESERVED_BITS   \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
@@ -353,6 +355,14 @@ struct kvm_vcpu_arch {
bool tpr_access_reporting;
 
/*
+* timer used to determine if the time should be counted as
+* steal time or consigned time.
+*/
+   struct hrtimer steal_timer;
+   u64 current_consigned;
+   u64 consigned_limit;
+
+   /*
 * Paging state of the vcpu
 *
 * If the vcpu runs in guest mode with two level paging this still saves
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d39e8d0..6db79f9 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,7 +196,7 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
+static inline void paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 683b531..c91f4c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1546,13 +1546,32 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 static void accumulate_steal_time(struct kvm_vcpu *vcpu)
 {
u64 delta;
+   u64 steal_delta;
+   u64 consigned_delta;
 
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
-   vcpu->arch.st.accum_steal = delta;
+
+   /* split the delta into steal and consigned */
+   if (vcpu->arch.current_consigned < vcpu->arch.consigned_limit) {
+   vcpu->arch.current_consigned += delta;
+   if (vcpu->arch.current_consigned > vcpu->arch.consigned_limit) {
+   steal_delta = vcpu->arch.current_consigned
+   -  vcpu->arch.consigned_limit;
+   consigned_delta = delta - steal_delta;
+   } else {
+   consigned_delta = delta;
+   steal_delta = 0;
+   }
+   } else {
+   consigned_delta = 0;
+   steal_delta = delta;
+   }
+   vcpu->arch.st.accum_steal = steal_delta;
+   vcpu->arch.st.accum_consigned = consigned_delta;
 }
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
@@ -6203,11 +6222,25 @@ bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
 
 struct static_key kvm_no_apic_vcpu __read_mostly;
 
+enum hrtimer_restart steal_timer_fn(struct hrtimer *data)
+{
+   struct kvm_vcpu *vcpu;
+   ktime_t now;
+
+   vcpu = container_of(data, struct kvm_vcpu, arch.steal_timer);
+   vcpu->arch.current_consigned = 0;
+   now = ktime_get();
+   hrtimer_forward(&vcpu->arch.steal_timer, now,
+   ktime_set(0, KVM_STEAL_TIMER_DELAY));
+   return HRTIMER_RESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
struct page *page;
struct kvm *kvm;
int r;
+   ktime_t ktime;
 
BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;
@@ -6251,6 +6284,12 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
+   /* Initialize and start a timer to capture steal and consigned time */
+   hrtimer_init(&vcpu->arch.steal_timer, CLOCK_MONOTONIC,
+   HRTIMER_MODE_REL);
+   vcpu->arch.steal_timer.function = &steal_timer_fn;
+   ktime = ktime_set(0, KVM_STEAL_TIMER_DELAY);
+   hrtimer_start(&vcpu->arch.steal_timer, ktime, HRTIMER_MODE_REL);
 
return 0;
 fail_free_mce_banks:
@@ -6269,6 +6308,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
int idx;
 
+

[PATCH 5/5] Add an ioctl to communicate the consign limit to the host.

2012-11-26 Thread Michael Wolf

Add an ioctl to communicate the consign limit to the host.

Signed-off-by: Michael Wolf 
---
 arch/x86/kvm/x86.c   |6 ++
 include/linux/kvm_host.h |2 ++
 include/uapi/linux/kvm.h |2 ++
 virt/kvm/kvm_main.c  |7 +++
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c91f4c9..5d57469 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5938,6 +5938,12 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm_vcpu *vcpu, long 
entitlement)
+{
+   vcpu->arch.consigned_limit = entitlement;
+   return 0;
+}
+
 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
struct i387_fxsave_struct *fxsave =
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0e2212f..de13648 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -590,6 +590,8 @@ void kvm_arch_hardware_unsetup(void);
 void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm_vcpu *vcpu,
+   long entitlement);
 
 void kvm_free_physmem(struct kvm *kvm);
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0a6d6ba..86f24bb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -921,6 +921,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL_IO(KVMIO,   0xad)
+/* Set the consignment limit which will be used to separete steal time */
+#define KVM_SET_ENTITLEMENT  _IOW(KVMIO, 0xae, unsigned long)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be70035..c712fe5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2062,6 +2062,13 @@ out_free2:
r = 0;
break;
}
+   case KVM_SET_ENTITLEMENT: {
+   r = kvm_arch_vcpu_ioctl_set_entitlement(vcpu, arg);
+   if (r)
+   goto out;
+   r = 0;
+   break;
+   }
default:
r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] Expand the steal time msr to also contain the consigned time.

2012-11-26 Thread Michael Wolf

Add a consigned field.  This field will hold the time lost due to capping or 
overcommit.
The rest of the time will still show up in the steal-time field.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/paravirt.h   |4 ++--
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/kvm.c |7 ++-
 kernel/sched/core.c   |   10 +-
 kernel/sched/cputime.c|2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a0facf3..a5f9f30 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
 {
-   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 142236e..5d4fc8b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,7 +95,7 @@ struct pv_lazy_ops {
 
 struct pv_time_ops {
unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
+   void (*steal_clock)(int cpu, unsigned long long *steal);
unsigned long (*get_tsc_khz)(void);
 };
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 4180a87..ac357b3 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -372,9 +372,8 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static u64 kvm_steal_clock(int cpu)
+static void kvm_steal_clock(int cpu, u64 *steal)
 {
-   u64 steal;
struct kvm_steal_time *src;
int version;
 
@@ -382,11 +381,9 @@ static u64 kvm_steal_clock(int cpu)
do {
version = src->version;
rmb();
-   steal = src->steal;
+   *steal = src->steal;
rmb();
} while ((version & 1) || (version != src->version));
-
-   return steal;
 }
 
 void kvm_disable_steal_time(void)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c2e077c..b21d92d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -748,6 +748,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
  */
 #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
s64 steal = 0, irq_delta = 0;
+   u64 consigned = 0;
 #endif
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
@@ -776,8 +777,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
u64 st;
+   u64 cs;
 
-   steal = paravirt_steal_clock(cpu_of(rq));
+   paravirt_steal_clock(cpu_of(rq), &steal, &consigned);
+   /*
+* since we are not assigning the steal time to cpustats
+* here, just combine the steal and consigned times to
+* do the rest of the calculations.
+*/
+   steal += consigned;
steal -= rq->prev_steal_time_rq;
 
if (unlikely(steal > delta))
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8d859da..593b647 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -275,7 +275,7 @@ static __always_inline bool steal_account_process_tick(void)
if (static_key_false(¶virt_steal_enabled)) {
u64 steal, st = 0;
 
-   steal = paravirt_steal_clock(smp_processor_id());
+   paravirt_steal_clock(smp_processor_id(), &steal);
steal -= this_rq()->prev_steal_time;
 
st = steal_ticks(steal);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] Alter the amount of steal time reported by the guest.

2012-11-26 Thread Michael Wolf

Modify the amount of stealtime that the kernel reports via the /proc interface.
Steal time will now be broken down into steal_time and consigned_time.
Consigned_time will represent the amount of time that is expected to be lost
due to overcommitment of the physical cpu or by using cpu capping.  The amount
consigned_time will be passed in using an ioctl.  The time will be expressed in
the number of nanoseconds to be lost in during the fixed period.  The fixed 
period
is currently 1/10th of a second.

Signed-off-by: Michael Wolf 
---
 fs/proc/stat.c  |9 +++--
 include/linux/kernel_stat.h |1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index e296572..cb7fe80 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -82,7 +82,7 @@ static int show_stat(struct seq_file *p, void *v)
int i, j;
unsigned long jif;
u64 user, nice, system, idle, iowait, irq, softirq, steal;
-   u64 guest, guest_nice;
+   u64 guest, guest_nice, consign;
u64 sum = 0;
u64 sum_softirq = 0;
unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
@@ -90,10 +90,11 @@ static int show_stat(struct seq_file *p, void *v)
 
user = nice = system = idle = iowait =
irq = softirq = steal = 0;
-   guest = guest_nice = 0;
+   guest = guest_nice = consign = 0;
getboottime(&boottime);
jif = boottime.tv_sec;
 
+
for_each_possible_cpu(i) {
user += kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE];
@@ -105,6 +106,7 @@ static int show_stat(struct seq_file *p, void *v)
steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign += kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
sum += kstat_cpu_irqs_sum(i);
sum += arch_irq_stat_cpu(i);
 
@@ -128,6 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
 
for_each_online_cpu(i) {
@@ -142,6 +145,7 @@ static int show_stat(struct seq_file *p, void *v)
steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign = kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
seq_printf(p, "cpu%d", i);
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice));
@@ -153,6 +157,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
}
seq_printf(p, "intr %llu", (unsigned long long)sum);
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 1865b1f..e5978b0 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -28,6 +28,7 @@ enum cpu_usage_stat {
CPUTIME_STEAL,
CPUTIME_GUEST,
CPUTIME_GUEST_NICE,
+   CPUTIME_CONSIGN,
NR_STATS,
 };
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/5] Alter stealtime reporting in KVM

2012-11-26 Thread Michael Wolf

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.  To ease the confusion this patch set
adds the idea of consigned (expected steal) time.  The host will separate
the consigned time from the steal time.  The consignment limit passed to the
host will be the amount of steal time expected within a fixed period of
time.  Any other steal time accruing during that period will show as the
traditional steal time.

---

Michael Wolf (5):
  Alter the amount of steal time reported by the guest.
  Expand the steal time msr to also contain the consigned time.
  Add the code to send the consigned time from the host to the guest
  Add a timer to allow the separation of consigned from steal time.
  Add an ioctl to communicate the consign limit to the host.


 CREDITS|5 
 Documentation/arm64/memory.txt |   12 
 Documentation/cgroups/memory.txt   |4 
 .../devicetree/bindings/net/mdio-gpio.txt  |9 
 Documentation/filesystems/proc.txt |   16 
 Documentation/hwmon/fam15h_power   |2 
 Documentation/kernel-parameters.txt|   20 
 Documentation/networking/netdev-features.txt   |2 
 Documentation/scheduler/numa-problem.txt   |   20 
 MAINTAINERS|   87 +
 Makefile   |2 
 arch/alpha/kernel/osf_sys.c|6 
 arch/arm/boot/Makefile |   10 
 arch/arm/boot/dts/tegra30.dtsi |4 
 arch/arm/include/asm/io.h  |4 
 arch/arm/include/asm/sched_clock.h |2 
 arch/arm/include/asm/vfpmacros.h   |   12 
 arch/arm/include/uapi/asm/hwcap.h  |3 
 arch/arm/kernel/sched_clock.c  |   18 
 arch/arm/mach-at91/at91rm9200_devices.c|2 
 arch/arm/mach-at91/at91sam9260_devices.c   |2 
 arch/arm/mach-at91/at91sam9261_devices.c   |2 
 arch/arm/mach-at91/at91sam9263_devices.c   |2 
 arch/arm/mach-at91/at91sam9g45_devices.c   |   12 
 arch/arm/mach-davinci/dm644x.c |3 
 arch/arm/mach-highbank/system.c|3 
 arch/arm/mach-imx/clk-gate2.c  |2 
 arch/arm/mach-imx/ehci-imx25.c |2 
 arch/arm/mach-imx/ehci-imx35.c |2 
 arch/arm/mach-omap2/board-igep0020.c   |5 
 arch/arm/mach-omap2/clockdomains44xx_data.c|2 
 arch/arm/mach-omap2/devices.c  |   79 +
 arch/arm/mach-omap2/omap_hwmod.c   |   63 +
 arch/arm/mach-omap2/omap_hwmod_44xx_data.c |   36 
 arch/arm/mach-omap2/twl-common.c   |3 
 arch/arm/mach-omap2/vc.c   |2 
 arch/arm/mach-pxa/hx4700.c |8 
 arch/arm/mach-pxa/spitz_pm.c   |8 
 arch/arm/mm/alignment.c|2 
 arch/arm/plat-omap/include/plat/omap_hwmod.h   |6 
 arch/arm/tools/Makefile|2 
 arch/arm/vfp/vfpmodule.c   |9 
 arch/arm/xen/enlighten.c   |   11 
 arch/arm/xen/hypercall.S   |   14 
 arch/arm64/Kconfig |1 
 arch/arm64/include/asm/elf.h   |5 
 arch/arm64/include/asm/fpsimd.h|5 
 arch/arm64/include/asm/io.h|   10 
 arch/arm64/include/asm/pgtable-hwdef.h |6 
 arch/arm64/include/asm/pgtable.h   |   40 -
 arch/arm64/include/asm/processor.h |2 
 arch/arm64/include/asm/unistd.h|1 
 arch/arm64/kernel/perf_event.c |   10 
 arch/arm64/kernel/process.c|   18 
 arch/arm64/kernel/smp.c|3 
 arch/arm64/mm/init.c   |2 
 arch/frv/Kconfig   |1 
 arch/frv/boot/Makefile |   10 
 arch/frv/include/asm/unistd.h  |1 
 arch/frv/kernel/entry.S|   28 
 arch/frv/kernel/process.c  |5 
 arch/frv/mb93090-mb00/pci-dma-nommu.c  |1 
 arch/h8300/include/asm/cache.h |3 
 arch/ia64/mm/init.c|1 
 arch/m68k/include/asm/signal.h |6 
 arch/mips/cavium-octeon/executive/cvmx-l2c.c   |  900 
 arch/unicore32/include/asm/byteorder.h |   24 
 arch

[PATCH 2/5] Expand the steal time msr to also contain the consigned time.

2012-11-26 Thread Michael Wolf

Add a consigned field.  This field will hold the time lost due to capping or 
overcommit.
The rest of the time will still show up in the steal-time field.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/paravirt.h   |4 ++--
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/kvm.c |7 ++-
 kernel/sched/core.c   |   10 +-
 kernel/sched/cputime.c|2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a0facf3..a5f9f30 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
 {
-   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 142236e..5d4fc8b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,7 +95,7 @@ struct pv_lazy_ops {
 
 struct pv_time_ops {
unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
+   void (*steal_clock)(int cpu, unsigned long long *steal);
unsigned long (*get_tsc_khz)(void);
 };
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 4180a87..ac357b3 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -372,9 +372,8 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static u64 kvm_steal_clock(int cpu)
+static void kvm_steal_clock(int cpu, u64 *steal)
 {
-   u64 steal;
struct kvm_steal_time *src;
int version;
 
@@ -382,11 +381,9 @@ static u64 kvm_steal_clock(int cpu)
do {
version = src->version;
rmb();
-   steal = src->steal;
+   *steal = src->steal;
rmb();
} while ((version & 1) || (version != src->version));
-
-   return steal;
 }
 
 void kvm_disable_steal_time(void)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c2e077c..b21d92d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -748,6 +748,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
  */
 #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
s64 steal = 0, irq_delta = 0;
+   u64 consigned = 0;
 #endif
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
@@ -776,8 +777,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
u64 st;
+   u64 cs;
 
-   steal = paravirt_steal_clock(cpu_of(rq));
+   paravirt_steal_clock(cpu_of(rq), &steal, &consigned);
+   /*
+* since we are not assigning the steal time to cpustats
+* here, just combine the steal and consigned times to
+* do the rest of the calculations.
+*/
+   steal += consigned;
steal -= rq->prev_steal_time_rq;
 
if (unlikely(steal > delta))
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 8d859da..593b647 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -275,7 +275,7 @@ static __always_inline bool steal_account_process_tick(void)
if (static_key_false(¶virt_steal_enabled)) {
u64 steal, st = 0;
 
-   steal = paravirt_steal_clock(smp_processor_id());
+   paravirt_steal_clock(smp_processor_id(), &steal);
steal -= this_rq()->prev_steal_time;
 
st = steal_ticks(steal);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] Add the code to send the consigned time from the host to the guest

2012-11-26 Thread Michael Wolf

Add the code to send the consigned time from the host to the guest.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/include/asm/kvm_para.h |3 ++-
 arch/x86/include/asm/paravirt.h |4 ++--
 arch/x86/kernel/kvm.c   |3 ++-
 arch/x86/kernel/paravirt.c  |4 ++--
 arch/x86/kvm/x86.c  |2 ++
 include/linux/kernel_stat.h |1 +
 kernel/sched/cputime.c  |   21 +++--
 kernel/sched/sched.h|2 ++
 9 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..434d378 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -426,6 +426,7 @@ struct kvm_vcpu_arch {
u64 msr_val;
u64 last_steal;
u64 accum_steal;
+   u64 accum_consigned;
struct gfn_to_hva_cache stime;
struct kvm_steal_time steal;
} st;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index eb3e9d8..1763369 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -42,9 +42,10 @@
 
 struct kvm_steal_time {
__u64 steal;
+   __u64 consigned;
__u32 version;
__u32 flags;
-   __u32 pad[12];
+   __u32 pad[10];
 };
 
 #define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a5f9f30..d39e8d0 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
-   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
+   PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index ac357b3..4439a5c 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -372,7 +372,7 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static void kvm_steal_clock(int cpu, u64 *steal)
+static void kvm_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
struct kvm_steal_time *src;
int version;
@@ -382,6 +382,7 @@ static void kvm_steal_clock(int cpu, u64 *steal)
version = src->version;
rmb();
*steal = src->steal;
+   *consigned = src->consigned;
rmb();
} while ((version & 1) || (version != src->version));
 }
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..3797683 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -207,9 +207,9 @@ static void native_flush_tlb_single(unsigned long addr)
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
-static u64 native_steal_clock(int cpu)
+static void native_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
-   return 0;
+   *steal = *consigned = 0;
 }
 
 /* These are in entry.S */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1eefebe..683b531 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1565,8 +1565,10 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
return;
 
vcpu->arch.st.steal.steal += vcpu->arch.st.accum_steal;
+   vcpu->arch.st.steal.consigned += vcpu->arch.st.accum_consigned;
vcpu->arch.st.steal.version += 2;
vcpu->arch.st.accum_steal = 0;
+   vcpu->arch.st.accum_consigned = 0;
 
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index e5978b0..91afaa3 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -126,6 +126,7 @@ extern unsigned long long task_delta_exec(struct 
task_struct *);
 extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t, 
cputime_t);
 extern void account_steal_time(cputime_t);
+extern void account_consigned_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
 extern void account_process_tick(struct task_struct *, int user);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 593b647..53bd0be 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -244,6 +244,18 @@ void account_system_time(struct task_struct *p, int 
hardirq_offset,
 }
 
 /*
+ * This accounts for the time that is split out of steal time.
+ *

[PATCH 4/5] Add a timer to allow the separation of consigned from steal time.

2012-11-26 Thread Michael Wolf

Add a timer to the host.  This will define the period.  During a period
the first n ticks will go into the consigned bucket.  Any other ticks that
occur within the period will be placed in the stealtime bucket.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h |   10 +
 arch/x86/include/asm/paravirt.h |2 +-
 arch/x86/kvm/x86.c  |   42 ++-
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 434d378..4794c95 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -41,6 +41,8 @@
 #define KVM_PIO_PAGE_OFFSET 1
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2
 
+#define KVM_STEAL_TIMER_DELAY 1UL
+
 #define CR0_RESERVED_BITS   \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
@@ -353,6 +355,14 @@ struct kvm_vcpu_arch {
bool tpr_access_reporting;
 
/*
+* timer used to determine if the time should be counted as
+* steal time or consigned time.
+*/
+   struct hrtimer steal_timer;
+   u64 current_consigned;
+   u64 consigned_limit;
+
+   /*
 * Paging state of the vcpu
 *
 * If the vcpu runs in guest mode with two level paging this still saves
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d39e8d0..6db79f9 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,7 +196,7 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
+static inline void paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 683b531..c91f4c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1546,13 +1546,32 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 static void accumulate_steal_time(struct kvm_vcpu *vcpu)
 {
u64 delta;
+   u64 steal_delta;
+   u64 consigned_delta;
 
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
-   vcpu->arch.st.accum_steal = delta;
+
+   /* split the delta into steal and consigned */
+   if (vcpu->arch.current_consigned < vcpu->arch.consigned_limit) {
+   vcpu->arch.current_consigned += delta;
+   if (vcpu->arch.current_consigned > vcpu->arch.consigned_limit) {
+   steal_delta = vcpu->arch.current_consigned
+   -  vcpu->arch.consigned_limit;
+   consigned_delta = delta - steal_delta;
+   } else {
+   consigned_delta = delta;
+   steal_delta = 0;
+   }
+   } else {
+   consigned_delta = 0;
+   steal_delta = delta;
+   }
+   vcpu->arch.st.accum_steal = steal_delta;
+   vcpu->arch.st.accum_consigned = consigned_delta;
 }
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
@@ -6203,11 +6222,25 @@ bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
 
 struct static_key kvm_no_apic_vcpu __read_mostly;
 
+enum hrtimer_restart steal_timer_fn(struct hrtimer *data)
+{
+   struct kvm_vcpu *vcpu;
+   ktime_t now;
+
+   vcpu = container_of(data, struct kvm_vcpu, arch.steal_timer);
+   vcpu->arch.current_consigned = 0;
+   now = ktime_get();
+   hrtimer_forward(&vcpu->arch.steal_timer, now,
+   ktime_set(0, KVM_STEAL_TIMER_DELAY));
+   return HRTIMER_RESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
struct page *page;
struct kvm *kvm;
int r;
+   ktime_t ktime;
 
BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;
@@ -6251,6 +6284,12 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
+   /* Initialize and start a timer to capture steal and consigned time */
+   hrtimer_init(&vcpu->arch.steal_timer, CLOCK_MONOTONIC,
+   HRTIMER_MODE_REL);
+   vcpu->arch.steal_timer.function = &steal_timer_fn;
+   ktime = ktime_set(0, KVM_STEAL_TIMER_DELAY);
+   hrtimer_start(&vcpu->arch.steal_timer, ktime, HRTIMER_MODE_REL);
 
return 0;
 fail_free_mce_banks:
@@ -6269,6 +6308,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
int idx;
 
+

Re: [PATCH 0/5] Alter steal time reporting in KVM

2012-11-27 Thread Michael Wolf


On 11/27/2012 02:48 AM, Glauber Costa wrote:

Hi,

On 11/27/2012 12:36 AM, Michael Wolf wrote:

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.  To ease the confusion this patch set
adds the idea of consigned (expected steal) time.  The host will separate
the consigned time from the steal time.  The consignment limit passed to the
host will be the amount of steal time expected within a fixed period of
time.  Any other steal time accruing during that period will show as the
traditional steal time.

If you submit this again, please include a version number in your series.
Will do.  The patchset was sent twice yesterday by mistake.  Got an 
error the first time and didn't

think the patches went out.  This has been corrected.


It would also be helpful to include a small changelog about what changed
between last version and this version, so we could focus on that.
yes, will do that.  When I took the RFC off the patches I was looking at 
it as a new patchset which was

a mistake.  I will make sure to add a changelog when I submit again.


As for the rest, I answered your previous two submissions saying I don't
agree with the concept. If you hadn't changed anything, resending it
won't change my mind.

I could of course, be mistaken or misguided. But I had also not seen any
wave of support in favor of this previously, so basically I have no new
data to make me believe I should see it any differently.

Let's try this again:

* Rik asked you in your last submission how does ppc handle this. You
said, and I quote: "In the case of lpar on POWER systems they simply
report steal time and do not alter it in any way.
They do however report how much processor is assigned to the partition
and that information is in /proc/ppc64/lparcfg."
Yes, but we still get questions from users asking what is steal time? 
why am I seeing this?


Now, that is a *way* more sensible thing to do. Much more. "Confusing
users" is something extremely subjective. This is specially true about
concepts that are know for quite some time, like steal time. If you out
of a sudden change the meaning of this, it is sure to confuse a lot more
users than it would clarify.
Something like this could certainly be done.  But when I was submitting 
the patch set as
an RFC then qemu was passing a cpu percentage that would be used by the 
guest kernel
to adjust the steal time. This percentage was being stored on the guest 
as a sysctl value.

Avi stated he didn't like that kind of coupling, and that the value
could get out of sync.  Anthony stated "The guest shouldn't need to know 
it's entitlement. Or at least, it's up to a management tool to report 
that in a way that's meaningful for the guest."


So perhaps I misunderstood what they were suggesting, but I took it to 
mean that they did not
want the guest to know what the entitlement was.  That the host should 
take care of it and just
report the already adjusted data to the guest.  So in this version of 
the code the host would use a set
period for a timer and be passed essentially a number of ticks of 
expected steal time.  The host
would then use the timer to break out the steal time into consigned and 
steal buckets which would be

reported to the guest.

Both the consigned and the steal would be reported via /proc/stat. So 
anyone needing to see total
time away could add the two fields together.  The user, however, when 
using tools like top or vmstat

would see the usage based on what the guest is entitled to.

Do you have suggestions for how I can build consensus around one of the 
two approaches?









---

Michael Wolf (5):
   Alter the amount of steal time reported by the guest.
   Expand the steal time msr to also contain the consigned time.
   Add the code to send the consigned time from the host to the guest
   Add a timer to allow the separation of consigned from steal time.
   Add an ioctl to communicate the consign limit to the host.


  arch/x86/include/asm/kvm_host.h   |   11 +++
  arch/x86/include/asm/kvm_para.h   |3 +-
  arch/x86/include/asm/paravirt.h   |4 +--
  arch/x86/include/asm/paravirt_types.h |2 +
  arch/x86/kernel/kvm.c |8 ++---
  arch/x86/kernel/paravirt.c|4 +--
  arch/x86/kvm/x86.c|   50 -
  fs/proc/stat.c|9 +-
  include/linux/kernel_stat.h   |2 +
  include/linux/kvm_host.h  |2 +
  include/uapi/linux/kvm.h  |2 +
  kernel/sched/core.c   |   10 ++-
  kernel/sched/cputime.c|   21 +-
  kernel/sched/sched.h  |2 +
  virt/kvm/kvm_main.c

Re: [PATCH 2/5] Expand the steal time msr to also contain the consigned time.

2012-11-28 Thread Michael Wolf


On 11/27/2012 03:03 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Nov 26, 2012 at 02:36:45PM -0600, Michael Wolf wrote:

Add a consigned field.  This field will hold the time lost due to capping or 
overcommit.
The rest of the time will still show up in the steal-time field.

Signed-off-by: Michael Wolf 
---
  arch/x86/include/asm/paravirt.h   |4 ++--
  arch/x86/include/asm/paravirt_types.h |2 +-
  arch/x86/kernel/kvm.c |7 ++-
  kernel/sched/core.c   |   10 +-
  kernel/sched/cputime.c|2 +-
  5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a0facf3..a5f9f30 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
  extern struct static_key paravirt_steal_enabled;
  extern struct static_key paravirt_steal_rq_enabled;
  
-static inline u64 paravirt_steal_clock(int cpu)

+static inline u64 paravirt_steal_clock(int cpu, u64 *steal)

So its u64 here.

  {
-   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
  }
  
  static inline unsigned long long paravirt_read_pmc(int counter)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 142236e..5d4fc8b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,7 +95,7 @@ struct pv_lazy_ops {
  
  struct pv_time_ops {

unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
+   void (*steal_clock)(int cpu, unsigned long long *steal);

But not u64 here? Any particular reason?

It should be void everywhere, thanks for catching that I will change the 
code.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] Alter steal time reporting in KVM

2012-11-28 Thread Michael Wolf


On 11/27/2012 05:24 PM, Marcelo Tosatti wrote:

On Mon, Nov 26, 2012 at 02:36:24PM -0600, Michael Wolf wrote:

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.

The definition of stolen time is 'time during which the virtual CPU is
runnable to not running'. Overcommit is the main scenario which steal
time helps to detect.

Can you describe the 'capped' case?
In the capped case, the time that the guest spends waiting due to it 
having used its full allottment of time shows up as steal time.  The way 
my patchset currently stands is that you would set up the
bandwidth control and you would have to pass it a  matching value from 
qemu.  In the future, it would
be possible to have something parse the bandwidth setting and 
automatically adjust the setting in the

host used for steal time reporting.



  This can
cause confusion for the end user.  To ease the confusion this patch set
adds the idea of consigned (expected steal) time.  The host will separate
the consigned time from the steal time.  The consignment limit passed to the
host will be the amount of steal time expected within a fixed period of
time.  Any other steal time accruing during that period will show as the
traditional steal time.

---

Michael Wolf (5):
   Alter the amount of steal time reported by the guest.
   Expand the steal time msr to also contain the consigned time.
   Add the code to send the consigned time from the host to the guest
   Add a timer to allow the separation of consigned from steal time.
   Add an ioctl to communicate the consign limit to the host.


  arch/x86/include/asm/kvm_host.h   |   11 +++
  arch/x86/include/asm/kvm_para.h   |3 +-
  arch/x86/include/asm/paravirt.h   |4 +--
  arch/x86/include/asm/paravirt_types.h |2 +
  arch/x86/kernel/kvm.c |8 ++---
  arch/x86/kernel/paravirt.c|4 +--
  arch/x86/kvm/x86.c|   50 -
  fs/proc/stat.c|9 +-
  include/linux/kernel_stat.h   |2 +
  include/linux/kvm_host.h  |2 +
  include/uapi/linux/kvm.h  |2 +
  kernel/sched/core.c   |   10 ++-
  kernel/sched/cputime.c|   21 +-
  kernel/sched/sched.h  |2 +
  virt/kvm/kvm_main.c   |7 +
  15 files changed, 120 insertions(+), 17 deletions(-)

--
Signature

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] Alter steal time reporting in KVM

2012-11-28 Thread Michael Wolf


On 11/28/2012 02:45 AM, Glauber Costa wrote:

On 11/27/2012 07:10 PM, Michael Wolf wrote:

On 11/27/2012 02:48 AM, Glauber Costa wrote:

Hi,

On 11/27/2012 12:36 AM, Michael Wolf wrote:

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.  To ease the confusion this patch set
adds the idea of consigned (expected steal) time.  The host will
separate
the consigned time from the steal time.  The consignment limit passed
to the
host will be the amount of steal time expected within a fixed period of
time.  Any other steal time accruing during that period will show as the
traditional steal time.

If you submit this again, please include a version number in your series.

Will do.  The patchset was sent twice yesterday by mistake.  Got an
error the first time and didn't
think the patches went out.  This has been corrected.

It would also be helpful to include a small changelog about what changed
between last version and this version, so we could focus on that.

yes, will do that.  When I took the RFC off the patches I was looking at
it as a new patchset which was
a mistake.  I will make sure to add a changelog when I submit again.

As for the rest, I answered your previous two submissions saying I don't
agree with the concept. If you hadn't changed anything, resending it
won't change my mind.

I could of course, be mistaken or misguided. But I had also not seen any
wave of support in favor of this previously, so basically I have no new
data to make me believe I should see it any differently.

Let's try this again:

* Rik asked you in your last submission how does ppc handle this. You
said, and I quote: "In the case of lpar on POWER systems they simply
report steal time and do not alter it in any way.
They do however report how much processor is assigned to the partition
and that information is in /proc/ppc64/lparcfg."

Yes, but we still get questions from users asking what is steal time?
why am I seeing this?

Now, that is a *way* more sensible thing to do. Much more. "Confusing
users" is something extremely subjective. This is specially true about
concepts that are know for quite some time, like steal time. If you out
of a sudden change the meaning of this, it is sure to confuse a lot more
users than it would clarify.

Something like this could certainly be done.  But when I was submitting
the patch set as
an RFC then qemu was passing a cpu percentage that would be used by the
guest kernel
to adjust the steal time. This percentage was being stored on the guest
as a sysctl value.
Avi stated he didn't like that kind of coupling, and that the value
could get out of sync.  Anthony stated "The guest shouldn't need to know
it's entitlement. Or at least, it's up to a management tool to report
that in a way that's meaningful for the guest."

So perhaps I misunderstood what they were suggesting, but I took it to
mean that they did not
want the guest to know what the entitlement was.  That the host should
take care of it and just
report the already adjusted data to the guest.  So in this version of
the code the host would use a set
period for a timer and be passed essentially a number of ticks of
expected steal time.  The host
would then use the timer to break out the steal time into consigned and
steal buckets which would be
reported to the guest.

Both the consigned and the steal would be reported via /proc/stat. So
anyone needing to see total
time away could add the two fields together.  The user, however, when
using tools like top or vmstat
would see the usage based on what the guest is entitled to.

Do you have suggestions for how I can build consensus around one of the
two approaches?


Before I answer this, can you please detail which mechanism are you
using to enforce the entitlement? Is it the cgroup cpu controller, or
something else?
It is setup using cpu overcommit.  But the request was for something 
that would work in both

the overcommit environment as well as when hard capping is being used.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] Alter steal time reporting in KVM

2012-11-29 Thread Michael Wolf


On 11/28/2012 02:55 PM, Glauber Costa wrote:

On 11/28/2012 10:43 PM, Michael Wolf wrote:

On 11/27/2012 05:24 PM, Marcelo Tosatti wrote:

On Mon, Nov 26, 2012 at 02:36:24PM -0600, Michael Wolf wrote:

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.

The definition of stolen time is 'time during which the virtual CPU is
runnable to not running'. Overcommit is the main scenario which steal
time helps to detect.

Can you describe the 'capped' case?

In the capped case, the time that the guest spends waiting due to it
having used its full allottment of time shows up as steal time.  The way
my patchset currently stands is that you would set up the
bandwidth control and you would have to pass it a  matching value from
qemu.  In the future, it would
be possible to have something parse the bandwidth setting and
automatically adjust the setting in the
host used for steal time reporting.

Ok, so correct me if I am wrong, but I believe you would be using
something like the bandwidth capper in the cpu cgroup to set those
entitlements, right?

Yes, in the context above I'm referring to the cfs bandwidth control.


Some time has passed since I last looked into it, but IIRC, after you
get are out of your quota, you should be out of the runqueue. In the
lovely world of KVM, we approximate steal time as runqueue time:

arch/x86/kvm/x86.c:
delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
vcpu->arch.st.accum_steal = delta;

include/linux/sched.h:
unsigned long long run_delay; /* time spent waiting on a runqueue */

So if you are out of the runqueue, you won't get steal time accounted,
and then I truly fail to understand what you are doing.
So I looked at something like this in the past.  To make sure things 
haven't changed
I set up a cgroup on my test server running a kernel built from the 
latest tip tree.


[root]# cat cpu.cfs_quota_us
5
[root]# cat cpu.cfs_period_us
10
[root]# cat cpuset.cpus
1
[root]# cat cpuset.mems
0

Next I put the PID from the cpu thread into tasks.  When I start a 
script that will hog the cpu I see the

following in top on the guest
Cpu(s):  1.9%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa, 48.3%hi, 0.0%si, 
49.8%st


So the steal time here is in line with the bandwidth control settings.


In case I am wrong, and run_delay also includes the time you can't run
because you are out of capacity, then maybe what we should do, is to
just subtract it from run_delay in kvm/x86.c before we pass it on. In
summary:
About a year ago I was playing with this patch.  It is out of date now 
but will give you

an idea of what I was looking at.

 kernel/sched_fair.c  |4 ++--
 kernel/sched_stats.h |7 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5c9e679..a837e4e 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -707,7 +707,7 @@ account_entity_dequeue(struct cfs_rq *cfs_rq,
struct sched_entity *se)

 #ifdef CONFIG_FAIR_GROUP_SCHED
 /* we need this in update_cfs_load and load-balance functions below */
-static inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
+inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
 # ifdef CONFIG_SMP
 static void update_cfs_rq_load_contribution(struct cfs_rq *cfs_rq,
 int global_update)
@@ -1420,7 +1420,7 @@ static inline int cfs_rq_throttled(struct
cfs_rq *cfs_rq)
 }

 /* check whether cfs_rq, or any parent, is throttled */
-static inline int throttled_hierarchy(struct cfs_rq *cfs_rq)
+inline int throttled_hierarchy(struct cfs_rq *cfs_rq)
 {
 return cfs_rq->throttle_count;
 }
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 87f9e36..e30ff26 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -213,14 +213,19 @@ static inline void sched_info_queued(struct
task_struct *t)
  * sched_info_queued() to mark that it has now again started waiting on
  * the runqueue.
  */
+extern inline int throttled_hierarchy(struct cfs_rq *cfs_rq);
 static inline void sched_info_depart(struct task_struct *t)
 {
+struct task_group *tg = task_group(t);
+struct cfs_rq *cfs_rq;
 unsigned long long delta = task_rq(t)->clock -
 t->sched_info.last_arrival;

+cfs_rq = tg->cfs_rq[smp_processor_id()];
 rq_sched_info_depart(task_rq(t), delta);

-if (t->state == TASK_RUNNING)
+
+if (t->state == TASK_RUNNING && !throttled_hierarchy(cfs_rq))
 sched_info_queued(t);
 }


So then the steal time did not show on the guest.  You have no value 
that needs to be passed

around.  What I did not like about this approach was
* only works for cfs bandwidth control.  If another type of hard limit 
was added to the kernel

   the code wou

[PATCH 0/4] Alter steal-time reporting in the guest

2013-02-05 Thread Michael Wolf

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.  To ease the confusion this patch set
adds the idea of consigned (expected steal) time.  The host will separate
the consigned time from the steal time.  Tthe steal time will only be altered
if hard limits (cfs bandwidth control) is used.  The period and the quota used
to separate the consigned time (expected steal) from the steal time are taken
from the cfs bandwidth control settings. Any other steal time accruing during
that period will show as the traditional steal time.

Changes from V2:
* Dropped the ioctl that allowed qemu to send the entitlement value to
  the guest.
* Added code to get the entitlement period and quota from cfs bandwidth.

Changes from V1:
* Removed the steal time allowed percentage from the guest
* Moved the separation of consigned (expected steal) and steal time to the
  host.
* No longer include a sysctl interface.

---

Michael Wolf (4):
  Alter the amount of steal time reported by the guest.
  Expand the steal time msr to also contain the consigned time.
  Add the code to send the consigned time from the host to the guest
  Add a timer to allow the separation of consigned from steal time.


 arch/x86/include/asm/kvm_host.h   |   10 +
 arch/x86/include/asm/paravirt.h   |4 +-
 arch/x86/include/asm/paravirt_types.h |2 +
 arch/x86/include/uapi/asm/kvm_para.h  |3 +-
 arch/x86/kernel/kvm.c |8 ++--
 arch/x86/kernel/paravirt.c|4 +-
 arch/x86/kvm/x86.c|   64 -
 fs/proc/stat.c|9 -
 include/linux/kernel_stat.h   |2 +
 kernel/sched/core.c   |   30 +++
 kernel/sched/cputime.c|   21 ++-
 kernel/sched/sched.h  |2 +
 12 files changed, 142 insertions(+), 17 deletions(-)

-- 
Signature

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] Alter the amount of steal time reported by the guest.

2013-02-05 Thread Michael Wolf

Modify the amount of stealtime that the kernel reports via the /proc interface.
Steal time will now be broken down into steal_time and consigned_time.
Consigned_time will represent the amount of time that is expected to be lost
due to overcommitment of the physical cpu or by using cpu hard capping.

Signed-off-by: Michael Wolf 
---
 fs/proc/stat.c  |9 +++--
 include/linux/kernel_stat.h |1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index e296572..cb7fe80 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -82,7 +82,7 @@ static int show_stat(struct seq_file *p, void *v)
int i, j;
unsigned long jif;
u64 user, nice, system, idle, iowait, irq, softirq, steal;
-   u64 guest, guest_nice;
+   u64 guest, guest_nice, consign;
u64 sum = 0;
u64 sum_softirq = 0;
unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
@@ -90,10 +90,11 @@ static int show_stat(struct seq_file *p, void *v)
 
user = nice = system = idle = iowait =
irq = softirq = steal = 0;
-   guest = guest_nice = 0;
+   guest = guest_nice = consign = 0;
getboottime(&boottime);
jif = boottime.tv_sec;
 
+
for_each_possible_cpu(i) {
user += kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE];
@@ -105,6 +106,7 @@ static int show_stat(struct seq_file *p, void *v)
steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign += kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
sum += kstat_cpu_irqs_sum(i);
sum += arch_irq_stat_cpu(i);
 
@@ -128,6 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
 
for_each_online_cpu(i) {
@@ -142,6 +145,7 @@ static int show_stat(struct seq_file *p, void *v)
steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign = kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
seq_printf(p, "cpu%d", i);
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice));
@@ -153,6 +157,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
}
seq_printf(p, "intr %llu", (unsigned long long)sum);
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 66b7078..e352052 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -28,6 +28,7 @@ enum cpu_usage_stat {
CPUTIME_STEAL,
CPUTIME_GUEST,
CPUTIME_GUEST_NICE,
+   CPUTIME_CONSIGN,
NR_STATS,
 };
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] Expand the steal time msr to also contain the consigned time.

2013-02-05 Thread Michael Wolf

Expand the steal time msr to also contain the consigned time.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/paravirt.h   |4 ++--
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/kvm.c |7 ++-
 kernel/sched/core.c   |   10 +-
 kernel/sched/cputime.c|2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5edd174..9b753ea 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu)
+static inline void paravirt_steal_clock(int cpu, u64 *steal)
 {
-   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 142236e..5d4fc8b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,7 +95,7 @@ struct pv_lazy_ops {
 
 struct pv_time_ops {
unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
+   void (*steal_clock)(int cpu, unsigned long long *steal);
unsigned long (*get_tsc_khz)(void);
 };
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index fe75a28..89e5468 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -386,9 +386,8 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static u64 kvm_steal_clock(int cpu)
+static void kvm_steal_clock(int cpu, u64 *steal)
 {
-   u64 steal;
struct kvm_steal_time *src;
int version;
 
@@ -396,11 +395,9 @@ static u64 kvm_steal_clock(int cpu)
do {
version = src->version;
rmb();
-   steal = src->steal;
+   *steal = src->steal;
rmb();
} while ((version & 1) || (version != src->version));
-
-   return steal;
 }
 
 void kvm_disable_steal_time(void)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..efc2652 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -757,6 +757,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
  */
 #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || 
defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
s64 steal = 0, irq_delta = 0;
+   u64 consigned = 0;
 #endif
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
@@ -785,8 +786,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
u64 st;
+   u64 cs;
 
-   steal = paravirt_steal_clock(cpu_of(rq));
+   paravirt_steal_clock(cpu_of(rq), &steal, &consigned);
+   /*
+* since we are not assigning the steal time to cpustats
+* here, just combine the steal and consigned times to
+* do the rest of the calculations.
+*/
+   steal += consigned;
steal -= rq->prev_steal_time_rq;
 
if (unlikely(steal > delta))
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 825a956..0b4f1ec 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -275,7 +275,7 @@ static __always_inline bool steal_account_process_tick(void)
if (static_key_false(¶virt_steal_enabled)) {
u64 steal, st = 0;
 
-   steal = paravirt_steal_clock(smp_processor_id());
+   paravirt_steal_clock(smp_processor_id(), &steal);
steal -= this_rq()->prev_steal_time;
 
st = steal_ticks(steal);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] Add the code to send the consigned time from the host to the guest

2013-02-05 Thread Michael Wolf

Change the paravirt calls that retrieve the steal-time information
from the host.  Add to it getting the consigned value as well as
the steal time.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h  |1 +
 arch/x86/include/asm/paravirt.h  |4 ++--
 arch/x86/include/uapi/asm/kvm_para.h |3 ++-
 arch/x86/kernel/kvm.c|3 ++-
 arch/x86/kernel/paravirt.c   |4 ++--
 arch/x86/kvm/x86.c   |2 ++
 include/linux/kernel_stat.h  |1 +
 kernel/sched/cputime.c   |   21 +++--
 kernel/sched/sched.h |2 ++
 9 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dc87b65..fe5a37b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -428,6 +428,7 @@ struct kvm_vcpu_arch {
u64 msr_val;
u64 last_steal;
u64 accum_steal;
+   u64 accum_consigned;
struct gfn_to_hva_cache stime;
struct kvm_steal_time steal;
} st;
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 9b753ea..77f05e7 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline void paravirt_steal_clock(int cpu, u64 *steal)
+static inline void paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
-   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
+   PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..55d617f 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -42,9 +42,10 @@
 
 struct kvm_steal_time {
__u64 steal;
+   __u64 consigned;
__u32 version;
__u32 flags;
-   __u32 pad[12];
+   __u32 pad[10];
 };
 
 #define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 89e5468..fb52f8a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -386,7 +386,7 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static void kvm_steal_clock(int cpu, u64 *steal)
+static void kvm_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
struct kvm_steal_time *src;
int version;
@@ -396,6 +396,7 @@ static void kvm_steal_clock(int cpu, u64 *steal)
version = src->version;
rmb();
*steal = src->steal;
+   *consigned = src->consigned;
rmb();
} while ((version & 1) || (version != src->version));
 }
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 17fff18..3797683 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -207,9 +207,9 @@ static void native_flush_tlb_single(unsigned long addr)
 struct static_key paravirt_steal_enabled;
 struct static_key paravirt_steal_rq_enabled;
 
-static u64 native_steal_clock(int cpu)
+static void native_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
-   return 0;
+   *steal = *consigned = 0;
 }
 
 /* These are in entry.S */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c243b81..51b63d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1867,8 +1867,10 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
return;
 
vcpu->arch.st.steal.steal += vcpu->arch.st.accum_steal;
+   vcpu->arch.st.steal.consigned += vcpu->arch.st.accum_consigned;
vcpu->arch.st.steal.version += 2;
vcpu->arch.st.accum_steal = 0;
+   vcpu->arch.st.accum_consigned = 0;
 
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index e352052..f58ed0f 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -126,6 +126,7 @@ extern unsigned long long task_delta_exec(struct 
task_struct *);
 extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t, 
cputime_t);
 extern void account_steal_time(cputime_t);
+extern void account_consigned_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0b4f1ec..2a2d4be 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -244,6 +244,18 @@ void account_

[PATCH 4/4] Add a timer to allow the separation of consigned from steal time.

2013-02-05 Thread Michael Wolf

Add a helper routine to scheduler/core.c to allow the kvm module
to retrieve the cpu hardlimit settings.  The values will be used
to set up a timer that is used to separate the consigned from the
steal time.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h |9 ++
 arch/x86/kvm/x86.c  |   62 ++-
 kernel/sched/core.c |   20 +
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fe5a37b..9518613 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -355,6 +355,15 @@ struct kvm_vcpu_arch {
bool tpr_access_reporting;
 
/*
+* timer used to determine if the time should be counted as
+* steal time or consigned time.
+*/
+   struct hrtimer steal_timer;
+   u64 current_consigned;
+   s64 consigned_quota;
+   s64 consigned_period;
+
+   /*
 * Paging state of the vcpu
 *
 * If the vcpu runs in guest mode with two level paging this still saves
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 51b63d1..79d144d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1848,13 +1848,32 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 static void accumulate_steal_time(struct kvm_vcpu *vcpu)
 {
u64 delta;
+   u64 steal_delta;
+   u64 consigned_delta;
 
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
-   vcpu->arch.st.accum_steal = delta;
+
+   /* split the delta into steal and consigned */
+   if (vcpu->arch.current_consigned < vcpu->arch.consigned_quota) {
+   vcpu->arch.current_consigned += delta;
+   if (vcpu->arch.current_consigned > vcpu->arch.consigned_quota) {
+   steal_delta = vcpu->arch.current_consigned
+   -  vcpu->arch.consigned_quota;
+   consigned_delta = delta - steal_delta;
+   } else {
+   consigned_delta = delta;
+   steal_delta = 0;
+   }
+   } else {
+   consigned_delta = 0;
+   steal_delta = delta;
+   }
+   vcpu->arch.st.accum_steal = steal_delta;
+   vcpu->arch.st.accum_consigned = consigned_delta;
 }
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
@@ -2629,8 +2648,35 @@ static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
!(vcpu->kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY);
 }
 
+extern int sched_use_hard_capping(int cpuid, int num_vcpus, s64 *quota,
+   s64 *period);
+enum hrtimer_restart steal_timer_fn(struct hrtimer *data)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm *kvm;
+   int num_vcpus;
+   ktime_t now;
+
+   vcpu = container_of(data, struct kvm_vcpu, arch.steal_timer);
+   kvm = vcpu->kvm;
+   num_vcpus = atomic_read(&kvm->online_vcpus);
+   sched_use_hard_capping(vcpu->cpu, num_vcpus,
+   &vcpu->arch.consigned_quota,
+   &vcpu->arch.consigned_period);
+   vcpu->arch.current_consigned = 0;
+   now = ktime_get();
+   hrtimer_forward(&vcpu->arch.steal_timer, now,
+   ktime_set(0, vcpu->arch.consigned_period));
+
+   return HRTIMER_RESTART;
+}
+
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+   struct kvm *kvm;
+   int num_vcpus;
+   ktime_t ktime;
+
/* Address WBINVD may be executed by guest */
if (need_emulate_wbinvd(vcpu)) {
if (kvm_x86_ops->has_wbinvd_exit())
@@ -2670,6 +2716,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_migrate_timers(vcpu);
vcpu->cpu = cpu;
}
+   /* Initialize and start a timer to capture steal and consigned time */
+   kvm = vcpu->kvm;
+   num_vcpus = atomic_read(&kvm->online_vcpus);
+   num_vcpus = (num_vcpus == 0) ? 1 : num_vcpus;
+   sched_use_hard_capping(vcpu->cpu, num_vcpus,
+   &vcpu->arch.consigned_quota,
+   &vcpu->arch.consigned_period);
+   hrtimer_init(&vcpu->arch.steal_timer, CLOCK_MONOTONIC,
+   HRTIMER_MODE_REL);
+   vcpu->arch.steal_timer.function = &steal_timer_fn;
+   ktime = ktime_set(0, vcpu->arch.consigned_period);
+   hrtimer_start(&vcpu->arch.steal_timer, ktime, HRTIMER_MODE_REL);
 
accumulate_steal_time(vcpu);
kvm_make_request(KV

Re: [PATCH 4/4] Add a timer to allow the separation of consigned from steal time.

2013-02-06 Thread Michael Wolf


On 02/06/2013 08:36 AM, Glauber Costa wrote:

On 02/06/2013 01:49 AM, Michael Wolf wrote:

Add a helper routine to scheduler/core.c to allow the kvm module
to retrieve the cpu hardlimit settings.  The values will be used
to set up a timer that is used to separate the consigned from the
steal time.

Sorry: What is the business of a timer in here?
Whenever we read steal time, we know how much time has passed and with
that information we can know the entitlement for the period. This breaks
if we suspend, but we know that we suspended, so this is not a problem.
I may be missing something, but how do we know how much time has 
passed?  That is why
I had the timer in there.  I will go look again at the code but I 
thought the data was collected
as ticks and passed at random times.  The ticks are also accumulating so 
we are looking at the

difference in the count between reads.



Everything bigger the entitlement is steal time.
I agree provided I know the amount of total time that the steal time was 
accumulated.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] Expand the steal time msr to also contain the consigned time.

2013-02-07 Thread Michael Wolf


On 02/06/2013 03:14 PM, Rik van Riel wrote:

On 02/05/2013 04:49 PM, Michael Wolf wrote:

Expand the steal time msr to also contain the consigned time.

Signed-off-by: Michael Wolf 
---
  arch/x86/include/asm/paravirt.h   |4 ++--
  arch/x86/include/asm/paravirt_types.h |2 +-
  arch/x86/kernel/kvm.c |7 ++-
  kernel/sched/core.c   |   10 +-
  kernel/sched/cputime.c|2 +-
  5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h 
b/arch/x86/include/asm/paravirt.h

index 5edd174..9b753ea 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
  extern struct static_key paravirt_steal_enabled;
  extern struct static_key paravirt_steal_rq_enabled;

-static inline u64 paravirt_steal_clock(int cpu)
+static inline void paravirt_steal_clock(int cpu, u64 *steal)
  {
-return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
  }


This may be a stupid question, but what happens if a KVM
guest with this change, runs on a kernel that still has
the old steal time interface?

What happens if the host has the new steal time interface,
but the guest uses the old interface?

Will both cases continue to work as expected with your
patch series?

If so, could you document (in the source code) why things
continue to work?


I will test the scenarios you suggest and will report back the results.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] Add the code to send the consigned time from the host to the guest

2013-02-07 Thread Michael Wolf


On 02/06/2013 03:18 PM, Rik van Riel wrote:

On 02/05/2013 04:49 PM, Michael Wolf wrote:

Change the paravirt calls that retrieve the steal-time information
from the host.  Add to it getting the consigned value as well as
the steal time.

Signed-off-by: Michael Wolf 


diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h

index 06fdbd9..55d617f 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -42,9 +42,10 @@

  struct kvm_steal_time {
  __u64 steal;
+__u64 consigned;
  __u32 version;
  __u32 flags;
-__u32 pad[12];
+__u32 pad[10];
  };


The function kvm_register_steal_time passes the address of such
a structure to the host kernel, which then does something with
it.

Could running a guest with the above patch, on top of a host
with the old code, result in the values for "version" and
"flags" being written into "consigned"?

yes, good point.


Could that result in confusing the guest kernel to no end,
and generally breaking things?


Ok I will move the consigned field to be after the flags.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] Add a timer to allow the separation of consigned from steal time.

2013-02-07 Thread Michael Wolf


On 02/07/2013 02:46 AM, Glauber Costa wrote:

On 02/06/2013 10:07 PM, Michael Wolf wrote:

On 02/06/2013 08:36 AM, Glauber Costa wrote:

On 02/06/2013 01:49 AM, Michael Wolf wrote:

Add a helper routine to scheduler/core.c to allow the kvm module
to retrieve the cpu hardlimit settings.  The values will be used
to set up a timer that is used to separate the consigned from the
steal time.

Sorry: What is the business of a timer in here?
Whenever we read steal time, we know how much time has passed and with
that information we can know the entitlement for the period. This breaks
if we suspend, but we know that we suspended, so this is not a problem.

I may be missing something, but how do we know how much time has
passed?  That is why
I had the timer in there.  I will go look again at the code but I
thought the data was collected
as ticks and passed at random times.  The ticks are also accumulating so
we are looking at the
difference in the count between reads.

They can be collected at random times, but you can of course record the
time in which it happened.


ok.  Let me add a previous_read field and take out the timer.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-10-17 Thread Michael Wolf

On Wed, 2012-10-17 at 21:14 +0400, Glauber Costa wrote:
> On 10/17/2012 06:23 AM, Michael Wolf wrote:
> > In the case of where you have a system that is running in a
> > capped or overcommitted environment the user may see steal time
> > being reported in accounting tools such as top or vmstat.  This can
> > cause confusion for the end user.  To ease the confusion this patch set
> > adds the idea of consigned (expected steal) time.  The host will separate
> > the consigned time from the steal time.  The consignment limit passed to the
> > host will be the amount of steal time expected within a fixed period of
> > time.  Any other steal time accruing during that period will show as the
> > traditional steal time.
> > 
> > TODO:
> > * Change native_clock to take params and not return a value
> > * Change update_rq_clock_task
> > 
> > Changes from V1:
> > * Removed the steal time allowed percentage from the guest
> > * Moved the separation of consigned (expected steal) and steal time to the
> >   host.
> > * No longer include a sysctl interface.
> > 
> 
> You are showing this in the guest somewhere, but tools like top will
> still not show it. So for quite a while, it achieves nothing.
> 
> Of course this is a barrier that any new statistic has to go through. So
> while annoying, this is per-se ultimately not a blocker.
> 
> What I still fail to see, is how this is useful information to be shown
> in the guest. Honestly, if I'm in a guest VM or container, any time
> during which I am not running is time I lost. It doesn't matter if this
> was expected or not. This still seems to me as a host-side problem, to
> be solved entirely by tooling.
> 

What tools like top and vmstat will show is altered.  When I put time in
the consign bucket it does not show up in steal.  So now as long as the
system is performing as expected the user will see 100% and 0% steal.  I
added the consign field to /proc/stat so that all time accrued in the
period is accounted for and also for debugging purposes.  The user wont
care about consign and will not see it.  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] Alter steal time reporting in KVM

2012-12-07 Thread Michael Wolf


On 12/05/2012 06:46 AM, Glauber Costa wrote:

I am deeply sorry.

I was busy first time I read this, so I postponed answering and ended up
forgetting.

Sorry

include/linux/sched.h:
unsigned long long run_delay; /* time spent waiting on a runqueue */

So if you are out of the runqueue, you won't get steal time accounted,
and then I truly fail to understand what you are doing.

So I looked at something like this in the past.  To make sure things
haven't changed
I set up a cgroup on my test server running a kernel built from the
latest tip tree.

[root]# cat cpu.cfs_quota_us
5
[root]# cat cpu.cfs_period_us
10
[root]# cat cpuset.cpus
1
[root]# cat cpuset.mems
0

Next I put the PID from the cpu thread into tasks.  When I start a
script that will hog the cpu I see the
following in top on the guest
Cpu(s):  1.9%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa, 48.3%hi, 0.0%si,
49.8%st

So the steal time here is in line with the bandwidth control settings.

Ok. So I was wrong in my hunch that it would be outside the runqueue,
therefore work automatically. Still, the host kernel has all the
information in cgroups.


So then the steal time did not show on the guest.  You have no value
that needs to be passed
around.  What I did not like about this approach was
* only works for cfs bandwidth control.  If another type of hard limit
was added to the kernel
the code would potentially need to change.

This is true for almost everything we have in the kernel!
It is *very* unlikely for other bandwidth control mechanism to ever
appear. If it ever does, it's *their* burden to make sure it works for
steal time (provided it is merged). Code in tree gets precedence.


Ok,  I will work on a patch that uses the cgroup information for 
bandwidth control

to separate out the time.




* This approach doesn't help if the limits are set by overcommitting the
cpus.  It is my understanding
that this is a common approach.


I can't say anything about commonality, but common or not, it is a
*crazy* approach.

When you simply overcommit, you have no way to differentiate between
intended steal time and non-intended steal time. Moreover, when you
overcommit, your cpu usage will vary over time. If two guests use the
cpu to their full power, you will have 50 % each. But if one of them
slows down, the other gets more. What is your entitlement value? How do
you define this?

And then after you define it, you end up using more than this, what is
your cpu usage? 130 %?


yes exactly you would ideally show a boosted amount of cpu.  However to 
do that
you would need to either create a new tool or modify the current 
accounting tools

such as top.

My understanding is that you are not capping in this case as much as you 
are

guaranteeing a minimum level of performance.




The only sane way to do it, is to communicate this value to the kernel
somehow. The bandwidth controller is the interface we have for that. So
everybody that wants to *intentionally* overcommit needs to communicate
this to the controller. IOW: Any sane configuration should be explicit
about your capping.


 Add an ioctl to communicate the consign limit to the host.

This definitely should go away.

More specifically, *whatever* way we use to cap the processor, the host
system will have all the information at all times.

I'm not understanding that comment.  If you are capping by simply
controlling the amount of
overcommit on the host then wouldn't you still need some value to
indicate the desired amount.

No, that is just crazy, and I don't like it a single bit.

So in the light of it: Whatever capping mechanism we have, we need to be
explicit about the expected entitlement. At this point, the kernel
already knows what it is, and needs no extra ioctls or anything like that.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] Add a timer to allow the separation of consigned from steal time.

2013-03-05 Thread Michael Wolf

Sorry for the delay in the response.  I did not see your question.

On Mon, 2013-02-18 at 20:57 -0300, Marcelo Tosatti wrote:
> On Tue, Feb 05, 2013 at 03:49:41PM -0600, Michael Wolf wrote:
> > Add a helper routine to scheduler/core.c to allow the kvm module
> > to retrieve the cpu hardlimit settings.  The values will be used
> > to set up a timer that is used to separate the consigned from the
> > steal time.
> 
> 1) Can you please describe, in english, the mechanics of subtracting cpu
> hardlimit values from steal time reported via run_delay supposed to
> work?
> 
> "The period and the quota used to separate the consigned time 
> (expected steal) from the steal time are taken
> from the cfs bandwidth control settings. Any other steal time
> accruing during that period will show as the traditional steal time."
> 
> There is no "expected steal time" over a fixed period of real time.
There is expected steal time in the sense that the administrator of the
system sets up guests on the host so that there will be cpu
overcommitment.  The end user who is using the guest does not know this,
they only know they have been guaranteed a certain level of performance.
So if steal time shows up the end user typically thinks they are not
getting their guaranteed performance. So this patchset is meant to allow
top to show 100% utilization and ONLY show steal time if it is over the
level of steal time that the host administrator setup.  So take a simple
example of a host with 1 cpu and two guest on it.  If each guest is
fully utilized a user will see 50% utilization and 50% steal in either
of the guests.  In this case the amount of steal time that the host 
administrator would expect to see is 50%.  As long as the steal in the
guest does not exceed 50% the guest is running as expected.  If for some
reason the steal increases to 60%, now something is wrong and the steal
time needs to be reported and the end user will make inquiries?

> 
> 2) From the description of patch 1: "In the case of where you have
> a system that is running in a capped or overcommitted environment 
> the user may see steal time being reported in accounting tools 
> such as top or vmstat." 
> 
> This is outdated, right? Because overcommitted environment is exactly
> what steal time should report.

I hope I'm not missing your point here.  But again this comes down to
the point of view.  The end user is guaranteed a capability/level of
performance that may not be a whole cpu.  So only show steal time if the
amount of steal time exceeds what the host admin expected when the guest
was set up.
> 
> 
> Thanks

thanks
Mike Wolf

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-05 Thread Michael Wolf

Sorry for the delay in the response.  I did not see the email
right away.

On Mon, 2013-02-18 at 22:11 -0300, Marcelo Tosatti wrote:
> On Mon, Feb 18, 2013 at 05:43:47PM +0100, Frederic Weisbecker wrote:
> > 2013/2/5 Michael Wolf :
> > > In the case of where you have a system that is running in a
> > > capped or overcommitted environment the user may see steal time
> > > being reported in accounting tools such as top or vmstat.  This can
> > > cause confusion for the end user.
> > 
> > Sorry, I'm no expert in this area. But I don't really understand what
> > is confusing for the end user here.
> 
> I suppose that what is wanted is to subtract stolen time due to 'known
> reasons' from steal time reporting. 'Known reasons' being, for example,
> hard caps. So a vcpu executing instructions with no halt, but limited to
> 80% of available bandwidth, would not have 20% of stolen time reported.

Yes exactly and the end user many times did not set up the guest and is
not aware of the capping.  The end user is only aware of the performance
level that they were told they would get with the guest.  

> 
> But yes, a description of the scenario that is being dealt with, with
> details, is important.

I will add more detail to the description next time I submit the
patches.  How about something like,"In a cloud environment the user of a
kvm guest is not aware of the underlying hardware or how many other
guests are running on it.  The end user is only aware of a level of
performance that they should see."   or does that just muddy the picture
more??
> 
> > > To ease the confusion this patch set
> > > adds the idea of consigned (expected steal) time.  The host will separate
> > > the consigned time from the steal time.  Tthe steal time will only be 
> > > altered
> > > if hard limits (cfs bandwidth control) is used.  The period and the quota 
> > > used
> > > to separate the consigned time (expected steal) from the steal time are 
> > > taken
> > > from the cfs bandwidth control settings. Any other steal time accruing 
> > > during
> > > that period will show as the traditional steal time.
> > 
> > I'm also a bit confused here. steal time will then only account the
> > cpu time lost due to quotas from cfs bandwidth control? Also what do
> > you exactly mean by "expected steal time"? Is it steal time due to
> > overcommitting minus scheduler quotas?
> > 
> > Thanks.
> 

Thanks
Mike Wolf

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-06 Thread Michael Wolf

On Tue, 2013-03-05 at 22:41 -0300, Marcelo Tosatti wrote:
> On Tue, Mar 05, 2013 at 02:22:08PM -0600, Michael Wolf wrote:
> > Sorry for the delay in the response.  I did not see the email
> > right away.
> > 
> > On Mon, 2013-02-18 at 22:11 -0300, Marcelo Tosatti wrote:
> > > On Mon, Feb 18, 2013 at 05:43:47PM +0100, Frederic Weisbecker wrote:
> > > > 2013/2/5 Michael Wolf :
> > > > > In the case of where you have a system that is running in a
> > > > > capped or overcommitted environment the user may see steal time
> > > > > being reported in accounting tools such as top or vmstat.  This can
> > > > > cause confusion for the end user.
> > > > 
> > > > Sorry, I'm no expert in this area. But I don't really understand what
> > > > is confusing for the end user here.
> > > 
> > > I suppose that what is wanted is to subtract stolen time due to 'known
> > > reasons' from steal time reporting. 'Known reasons' being, for example,
> > > hard caps. So a vcpu executing instructions with no halt, but limited to
> > > 80% of available bandwidth, would not have 20% of stolen time reported.
> > 
> > Yes exactly and the end user many times did not set up the guest and is
> > not aware of the capping.  The end user is only aware of the performance
> > level that they were told they would get with the guest.  
> > > But yes, a description of the scenario that is being dealt with, with
> > > details, is important.
> > 
> > I will add more detail to the description next time I submit the
> > patches.  How about something like,"In a cloud environment the user of a
> > kvm guest is not aware of the underlying hardware or how many other
> > guests are running on it.  The end user is only aware of a level of
> > performance that they should see."   or does that just muddy the picture
> > more??
> 
> So the feature aims for is to report stolen time relative to hard
> capping. That is: stolen time should be counted as time stolen from
> the guest _beyond_ hard capping. Yes?
Yes, that is the goal.
> 
> Probably don't need to report new data to the guest for that.
Not sure I understand what you are saying here. Do you mean that I don't
need to report the expected steal from the guest?  If I don't do that
then I'm not reporting all of the time and changing /proc/stat in a
bigger way than adding another catagory.  Also I thought I would need to
provide the consigned time and the steal time for debugging purposes.
Maybe I'm missing your point.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-06 Thread Michael Wolf

On Wed, 2013-03-06 at 12:13 +0400, Glauber Costa wrote:
> On 03/06/2013 05:41 AM, Marcelo Tosatti wrote:
> > On Tue, Mar 05, 2013 at 02:22:08PM -0600, Michael Wolf wrote:
> >> Sorry for the delay in the response.  I did not see the email
> >> right away.
> >>
> >> On Mon, 2013-02-18 at 22:11 -0300, Marcelo Tosatti wrote:
> >>> On Mon, Feb 18, 2013 at 05:43:47PM +0100, Frederic Weisbecker wrote:
> >>>> 2013/2/5 Michael Wolf :
> >>>>> In the case of where you have a system that is running in a
> >>>>> capped or overcommitted environment the user may see steal time
> >>>>> being reported in accounting tools such as top or vmstat.  This can
> >>>>> cause confusion for the end user.
> >>>>
> >>>> Sorry, I'm no expert in this area. But I don't really understand what
> >>>> is confusing for the end user here.
> >>>
> >>> I suppose that what is wanted is to subtract stolen time due to 'known
> >>> reasons' from steal time reporting. 'Known reasons' being, for example,
> >>> hard caps. So a vcpu executing instructions with no halt, but limited to
> >>> 80% of available bandwidth, would not have 20% of stolen time reported.
> >>
> >> Yes exactly and the end user many times did not set up the guest and is
> >> not aware of the capping.  The end user is only aware of the performance
> >> level that they were told they would get with the guest.  
> >>> But yes, a description of the scenario that is being dealt with, with
> >>> details, is important.
> >>
> >> I will add more detail to the description next time I submit the
> >> patches.  How about something like,"In a cloud environment the user of a
> >> kvm guest is not aware of the underlying hardware or how many other
> >> guests are running on it.  The end user is only aware of a level of
> >> performance that they should see."   or does that just muddy the picture
> >> more??
> > 
> > So the feature aims for is to report stolen time relative to hard
> > capping. That is: stolen time should be counted as time stolen from
> > the guest _beyond_ hard capping. Yes?
> > 
> > Probably don't need to report new data to the guest for that.
> > 
> If we take into account that 1 second always have one second, I believe
> that you can just subtract the consigned time from the steal time the
> host passes to the guest.
> 
> During each second, the numbers won't sum up to 100. The delta to 100 is
> the consigned time, if anyone cares.
> 
> Adopting this would simplify this a lot. All you need to do, really, is
> to get your calculation right from the bandwidth given by the cpu
> controller. Subtract it in the host, and voila.

I looked at doing that once but was told that I was changing the
interface in an unacceptable way, because now I was not reporting all of
the elapsed time.  I agree it would make things simpler.
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-06 Thread Michael Wolf

On Wed, 2013-03-06 at 14:34 +0100, Frederic Weisbecker wrote:
> 2013/3/5 Michael Wolf :
> > Sorry for the delay in the response.  I did not see the email
> > right away.
> >
> > On Mon, 2013-02-18 at 22:11 -0300, Marcelo Tosatti wrote:
> >> On Mon, Feb 18, 2013 at 05:43:47PM +0100, Frederic Weisbecker wrote:
> >> > 2013/2/5 Michael Wolf :
> >> > > In the case of where you have a system that is running in a
> >> > > capped or overcommitted environment the user may see steal time
> >> > > being reported in accounting tools such as top or vmstat.  This can
> >> > > cause confusion for the end user.
> >> >
> >> > Sorry, I'm no expert in this area. But I don't really understand what
> >> > is confusing for the end user here.
> >>
> >> I suppose that what is wanted is to subtract stolen time due to 'known
> >> reasons' from steal time reporting. 'Known reasons' being, for example,
> >> hard caps. So a vcpu executing instructions with no halt, but limited to
> >> 80% of available bandwidth, would not have 20% of stolen time reported.
> >
> > Yes exactly and the end user many times did not set up the guest and is
> > not aware of the capping.  The end user is only aware of the performance
> > level that they were told they would get with the guest.
> >
> >>
> >> But yes, a description of the scenario that is being dealt with, with
> >> details, is important.
> >
> > I will add more detail to the description next time I submit the
> > patches.  How about something like,"In a cloud environment the user of a
> > kvm guest is not aware of the underlying hardware or how many other
> > guests are running on it.  The end user is only aware of a level of
> > performance that they should see."   or does that just muddy the picture
> > more??
> 
> That alone is probably not enough. But yeah, make sure you clearly
> state the difference between expected (caps, sched bandwidth...) and
> unexpected (overcommitting, competing load...) stolen time. Then add a
> practical example as you made above that explains why it matters to
> make that distinction and why you want to report it.
> 

Ok, I understand what you are requesting.  I will make sure to add it to
the description the next time I submit the patches.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-07 Thread Michael Wolf

On Thu, 2013-03-07 at 14:11 +1100, Paul Mackerras wrote:
> On Wed, Mar 06, 2013 at 09:52:16PM -0300, Marcelo Tosatti wrote:
> > On Wed, Mar 06, 2013 at 10:29:12AM -0600, Michael Wolf wrote:
> > > I looked at doing that once but was told that I was changing the
> > > interface in an unacceptable way, because now I was not reporting all of
> > > the elapsed time.  I agree it would make things simpler.
> > 
> > Pointer to that claim, please?
> 
> Back in about 2004 or 2005 or so I was looking at changing how user
> and system times were calculated (in the context of trying to find a
> better way to report resources used by a thread in an SMT processor).
> I found that utilities such as top expected the deltas in the
> /proc/stat numbers to add up to elapsed time, and would report strange
> and inconsistent results if that wasn't the case.  Unfortunately at
> this distance I don't recall the exact details.  I don't know whether
> the expectation that the deltas in the /proc/stat numbers over a
> period of time add up to the elapsed real time is documented anywhere,
> but I wouldn't be at all surprised if some programs depend on it, so
> it's better to maintain that property.

I will have to look at this again.  When looking at the cpu data where
steal time is reported there isn't a problem today.  I will have to run
it and see if there is anything incorrect with the time being reported
for the individual processes.

My real concern here was that in changing the /proc/stat interface am I
going to mess private tools that look at that information.  When I've
looked at vmstat and top they report the cpu information fine, but I may
end up creating problems for home grown scripts and tools.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-07 Thread Michael Wolf

On Wed, 2013-03-06 at 23:30 -0300, Marcelo Tosatti wrote:
> On Wed, Mar 06, 2013 at 10:27:13AM -0600, Michael Wolf wrote:
> > On Tue, 2013-03-05 at 22:41 -0300, Marcelo Tosatti wrote:
> > > On Tue, Mar 05, 2013 at 02:22:08PM -0600, Michael Wolf wrote:
> > > > Sorry for the delay in the response.  I did not see the email
> > > > right away.
> > > > 
> > > > On Mon, 2013-02-18 at 22:11 -0300, Marcelo Tosatti wrote:
> > > > > On Mon, Feb 18, 2013 at 05:43:47PM +0100, Frederic Weisbecker wrote:
> > > > > > 2013/2/5 Michael Wolf :
> > > > > > > In the case of where you have a system that is running in a
> > > > > > > capped or overcommitted environment the user may see steal time
> > > > > > > being reported in accounting tools such as top or vmstat.  This 
> > > > > > > can
> > > > > > > cause confusion for the end user.
> > > > > > 
> > > > > > Sorry, I'm no expert in this area. But I don't really understand 
> > > > > > what
> > > > > > is confusing for the end user here.
> > > > > 
> > > > > I suppose that what is wanted is to subtract stolen time due to 'known
> > > > > reasons' from steal time reporting. 'Known reasons' being, for 
> > > > > example,
> > > > > hard caps. So a vcpu executing instructions with no halt, but limited 
> > > > > to
> > > > > 80% of available bandwidth, would not have 20% of stolen time 
> > > > > reported.
> > > > 
> > > > Yes exactly and the end user many times did not set up the guest and is
> > > > not aware of the capping.  The end user is only aware of the performance
> > > > level that they were told they would get with the guest.  
> > > > > But yes, a description of the scenario that is being dealt with, with
> > > > > details, is important.
> > > > 
> > > > I will add more detail to the description next time I submit the
> > > > patches.  How about something like,"In a cloud environment the user of a
> > > > kvm guest is not aware of the underlying hardware or how many other
> > > > guests are running on it.  The end user is only aware of a level of
> > > > performance that they should see."   or does that just muddy the picture
> > > > more??
> > > 
> > > So the feature aims for is to report stolen time relative to hard
> > > capping. That is: stolen time should be counted as time stolen from
> > > the guest _beyond_ hard capping. Yes?
> > Yes, that is the goal.
> > > 
> > > Probably don't need to report new data to the guest for that.
> > Not sure I understand what you are saying here. Do you mean that I don't
> > need to report the expected steal from the guest?  If I don't do that
> > then I'm not reporting all of the time and changing /proc/stat in a
> > bigger way than adding another catagory.  Also I thought I would need to
> > provide the consigned time and the steal time for debugging purposes.
> > Maybe I'm missing your point.
> 
> OK so the usefulness of steal time comes from the ability to measure 
> CPU cycles that the guest is being deprived of, relative to some unit
> (implicitly the CPU frequency presented to the VM). That way, it becomes
> easier to properly allocate resources.
> 
> From top man page:
> st : time stolen from this vm by the hypervisor
> 
> Not only its a problem for the lender, it is also confusing for the user
> (who has to subtract from the reported value himself), the hardcapping 
> from reported steal time.
> 
> 
> The problem with the algorithm in the patchset is the following
> (practical example):
> 
> - Hard capping set to 80% of available CPU.
> - vcpu does not exceed its threshold, say workload with 40%
> CPU utilization.
> - Under this scenario it is possible for vcpu to be deprived
> of cycles (because out of the 40% that workload uses, only 30% of
> actual CPU time are being provided).
> - The algorithm in this patchset will not report any stolen time
> because it assumes 20% of stolen time reported via 'run_delay'
> is fixed at all times (which is false), therefore any valid 
> stolen time below 20% will not be reported.
> 
> Makes sense?

I understand the scenerio.  I will have to go back and look at the 
CFS bandwidth code and run some tests. The question I have to look at is
how is everything reported in your

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-07 Thread Michael Wolf

On Wed, 2013-03-06 at 23:30 -0300, Marcelo Tosatti wrote:
> On Wed, Mar 06, 2013 at 10:27:13AM -0600, Michael Wolf wrote:
> > On Tue, 2013-03-05 at 22:41 -0300, Marcelo Tosatti wrote:
> > > On Tue, Mar 05, 2013 at 02:22:08PM -0600, Michael Wolf wrote:
> > > > Sorry for the delay in the response.  I did not see the email
> > > > right away.
> > > > 
> > > > On Mon, 2013-02-18 at 22:11 -0300, Marcelo Tosatti wrote:
> > > > > On Mon, Feb 18, 2013 at 05:43:47PM +0100, Frederic Weisbecker wrote:
> > > > > > 2013/2/5 Michael Wolf :
> > > > > > > In the case of where you have a system that is running in a
> > > > > > > capped or overcommitted environment the user may see steal time
> > > > > > > being reported in accounting tools such as top or vmstat.  This 
> > > > > > > can
> > > > > > > cause confusion for the end user.
> > > > > > 
> > > > > > Sorry, I'm no expert in this area. But I don't really understand 
> > > > > > what
> > > > > > is confusing for the end user here.
> > > > > 
> > > > > I suppose that what is wanted is to subtract stolen time due to 'known
> > > > > reasons' from steal time reporting. 'Known reasons' being, for 
> > > > > example,
> > > > > hard caps. So a vcpu executing instructions with no halt, but limited 
> > > > > to
> > > > > 80% of available bandwidth, would not have 20% of stolen time 
> > > > > reported.
> > > > 
> > > > Yes exactly and the end user many times did not set up the guest and is
> > > > not aware of the capping.  The end user is only aware of the performance
> > > > level that they were told they would get with the guest.  
> > > > > But yes, a description of the scenario that is being dealt with, with
> > > > > details, is important.
> > > > 
> > > > I will add more detail to the description next time I submit the
> > > > patches.  How about something like,"In a cloud environment the user of a
> > > > kvm guest is not aware of the underlying hardware or how many other
> > > > guests are running on it.  The end user is only aware of a level of
> > > > performance that they should see."   or does that just muddy the picture
> > > > more??
> > > 
> > > So the feature aims for is to report stolen time relative to hard
> > > capping. That is: stolen time should be counted as time stolen from
> > > the guest _beyond_ hard capping. Yes?
> > Yes, that is the goal.
> > > 
> > > Probably don't need to report new data to the guest for that.
> > Not sure I understand what you are saying here. Do you mean that I don't
> > need to report the expected steal from the guest?  If I don't do that
> > then I'm not reporting all of the time and changing /proc/stat in a
> > bigger way than adding another catagory.  Also I thought I would need to
> > provide the consigned time and the steal time for debugging purposes.
> > Maybe I'm missing your point.
> 
> OK so the usefulness of steal time comes from the ability to measure 
> CPU cycles that the guest is being deprived of, relative to some unit
> (implicitly the CPU frequency presented to the VM). That way, it becomes
> easier to properly allocate resources.
> 
> From top man page:
> st : time stolen from this vm by the hypervisor
> 
> Not only its a problem for the lender, it is also confusing for the user
> (who has to subtract from the reported value himself), the hardcapping 
> from reported steal time.
> 
> 
> The problem with the algorithm in the patchset is the following
> (practical example):
> 
> - Hard capping set to 80% of available CPU.
> - vcpu does not exceed its threshold, say workload with 40%
> CPU utilization.
> - Under this scenario it is possible for vcpu to be deprived
> of cycles (because out of the 40% that workload uses, only 30% of
> actual CPU time are being provided).
> - The algorithm in this patchset will not report any stolen time
> because it assumes 20% of stolen time reported via 'run_delay'
> is fixed at all times (which is false), therefore any valid 
> stolen time below 20% will not be reported.
> 
> Makes sense?
> 
> Not sure what the concrete way to report stolen time relative to hard
> capping is (probably easier inside the scheduler, where run_delay is
> calculated).
>

Re: [PATCH 0/4] Alter steal-time reporting in the guest

2013-03-07 Thread Michael Wolf

On Thu, 2013-03-07 at 18:25 -0300, Marcelo Tosatti wrote:
> On Thu, Mar 07, 2013 at 03:15:09PM -0600, Michael Wolf wrote:
> > > 
> > > Makes sense?
> > > 
> > > Not sure what the concrete way to report stolen time relative to hard
> > > capping is (probably easier inside the scheduler, where run_delay is
> > > calculated).
> > > 
> > > Reporting the hard capping to the guest is a good idea (which saves the
> > > user from having to measure it themselves), but better done separately
> > > via new field.
> > 
> > didnt respond to this in the previous response.  I'm not sure I'm
> > following you here.  I thought this is what I was doing by having a
> > consigned (expected steal) field add to the /proc/stat output.  Are you
> > looking for something else or a better naming convention?
> 
> Expected steal is not a good measure to use (because as mentioned in the
> previous email there is no expected steal over a fixed period of time).
> 
> It is fine to report 'maximum percentage of underlying physical CPU'
> (what percentage of the physical CPU time guest VM is allowed to make
> use of).
> 
> And then steal time is relative to maximum percentage of underlying
> physical CPU time allowed.
> 

So last August I had sent out an RFC set of patches to do this.  That
patchset was meant to handle the general overcommit case as well as the
capping case by having qemu pass a percentage to the host that would
then be passed onto the guest and used to adjust the steal time.
Here is the link to the discussion
http://lkml.indiana.edu/hypermail/linux/kernel/1208.3/01458.html

As you will see there Avi didn't like the idea of a percentage down in
the guest, among other reasons he was concerned about migration.  Also
in the email thread you will see that Anthony Liguori was opposed to the
idea of just changing the steal time, he wanted it split out.

What Glauber has suggested and I am working on implementing is taking
out the timer and adding a last read field in the host.  So in the host
I can determine the total time that has passed and compute a percentage
and apply that percentage to the steal time while the info is still on
the host.  Then pass the steal and consigned time to the guest.

Does that address your concerns?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-23 Thread Michael Wolf

This is an RFC regarding the reporting of stealtime.  In the case of
where you have a system that is running with partial processors such as
KVM the user may see steal time being reported in accounting tools such
as top or vmstat.  This can cause confusion for the end user.  To
ease the confusion this patch set adds a sysctl interface to set the
cpu entitlement.  This is the percentage of cpu that the guest system is
 expected to receive.  As long as the steal time is within its expected
range it will show up as 0 in /proc/stat.  The user will then see in the
accounting tools that they are getting a full utilization of the cpu
resources assigned to them.

This patchset is changing the contents/output of /proc/stat and could affect 
user tools.  However the default setting is that the cpu is entitled to 100% 
so the code will act as before.  Also another field could be added to the 
/proc/stat output and show the unaltered steal time. Since this additional 
field could cause more confusion than it would clear up I have left it out 
for now.

Michael Wolf (3):
  Add a sysctl interface to control the cpu entitlement setting.
  Add a hypercall to retrieve the cpu entitlement value from the host.
  Modify the amount of stealtime that the kernel reports via the /proc

---
Documentation/sysctl/kernel.txt |   14 +++
 arch/x86/kvm/x86.c  |8 
 fs/proc/stat.c  |   80 ++-
 include/linux/kernel_stat.h |2 +
 include/linux/kvm.h |3 ++
 include/linux/kvm_host.h|3 ++
 include/linux/kvm_para.h|1 +
 kernel/sysctl.c |   10 +
 virt/kvm/kvm_main.c |7 
 9 files changed, 126 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 1/3] Add a sysctl interface to control and report the cpu entitlement setting.

2012-08-23 Thread Michael Wolf

This setting will later be used to compute an expected steal time.

Signed-off-by: Michael Wolf 
---
 Documentation/sysctl/kernel.txt |   14 ++
 fs/proc/stat.c  |1 +
 kernel/sysctl.c |   10 ++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 6d78841..0f617dc 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -28,6 +28,7 @@ show up in /proc/sys/kernel:
 - core_pattern
 - core_pipe_limit
 - core_uses_pid
+- cpu_entitlement
 - ctrl-alt-del
 - dmesg_restrict
 - domainname
@@ -226,6 +227,19 @@ the filename.
 
 ==
 
+cpu_entitlement:
+
+The cpu_entitlement is the percentage of cpu utilization that
+the system expects to receive.  By default this is set to 100,
+in a guest system this could be set to a value between 0 and 100.
+This value is used to adjust the amount of steal time that
+process accounting code in the guest will display. The end effect
+will be is that steal time will only be reported if the
+percentage of steal time is greater than 100 - cpu_entitlement
+value.
+
+==
+
 ctrl-alt-del:
 
 When the value in this file is 0, ctrl-alt-del is trapped and
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 64c3b31..14e26c8 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 
+int cpu_entitlement = 100;
 #ifndef arch_irq_stat_cpu
 #define arch_irq_stat_cpu(cpu) 0
 #endif
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87174ef..85efbc2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -109,6 +109,7 @@ extern int percpu_pagelist_fraction;
 extern int compat_log;
 extern int latencytop_enabled;
 extern int sysctl_nr_open_min, sysctl_nr_open_max;
+extern int cpu_entitlement;
 #ifndef CONFIG_MMU
 extern int sysctl_nr_trim_pages;
 #endif
@@ -673,6 +674,15 @@ static struct ctl_table kern_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec,
},
+   {
+   .procname   = "cpu_entitlement",
+   .data   = &cpu_entitlement,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = &zero,
+   .extra2 = &one_hundred,
+   },
 #if defined CONFIG_PRINTK
{
.procname   = "printk",

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC 3/3] Modify the amount of stealtime that the kernel reports via the /proc interface.

2012-08-23 Thread Michael Wolf

Stealtime will be adjusted based on the cpu entitlement setting.  The user
will supply the cpu_entitlement which is the percentage of cpu the guest can
expect to receive.  The expected steal time is based on the expected steal
percentage which is 100 - cpu_entitlement.  If steal_time is less than the
expected steal time that is reported steal_time is changed to 0 no other fields
are changed.  If the steal_time is greater than the expected_steal then the
difference is reported.  By default the cpu_entitlement will be 100% and the
steal time will be reported without any modification.

Signed-off-by: Michael Wolf 
---
 fs/proc/stat.c  |   70 ++-
 include/linux/kernel_stat.h |2 +
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index cf5..efbaa03 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -73,6 +73,68 @@ static u64 get_iowait_time(int cpu)
 
 #endif
 
+/*
+ * This function will alter the steal time value that is written out
+ * to /proc/stat.  The cpu_entitlement is set by the user/admin and is
+ * meant to reflect the percentage of the processor that is expected to
+ * be used.  So as long as the amount of steal time is less than the
+ * expected steal time (based on cpu_entitlement) then report steal time
+ * as zero.
+ */
+static void kstat_adjust_steal_time(int currcpu)
+{
+   int j;
+   u64 cpustat_delta[NR_STATS];
+   u64 total_elapsed_time;
+   int expected_steal_pct;
+   u64 expected_steal;
+   u64 *currstat, *prevstat;
+
+   /*
+* if cpu_entitlement = 100% then the expected steal time is 0
+* so we don't need to do any adjustments to the fields.
+*/
+   if (cpu_entitlement == 100) {
+   kcpustat_cpu(currcpu).cpustat[CPUTIME_ADJ_STEAL] =
+   kcpustat_cpu(currcpu).cpustat[CPUTIME_STEAL];
+   return;
+   }
+   /*
+* For the user it is more intuitive to think in terms of
+* cpu entitlement.  To do the calculations it is easier to
+* think in terms of allowed steal time.  So convert the percentage
+* from cpu_entitlement to expected_steal_percent.
+*/
+   expected_steal_pct = 100 - cpu_entitlement;
+
+   total_elapsed_time = 0;
+   /* determine the total time elapsed between calls */
+   currstat = kcpustat_cpu(currcpu).cpustat;
+   prevstat = kcpustat_cpu(currcpu).prev_cpustat;
+   for (j = CPUTIME_USER; j < CPUTIME_GUEST; j++) {
+   cpustat_delta[j] = currstat[j] - prevstat[j];
+   prevstat[j] = currstat[j];
+   total_elapsed_time = total_elapsed_time + cpustat_delta[j];
+   }
+
+   /*
+* calculate the amount of expected steal time.  Add 5 as a
+* rounding factor.
+*/
+
+   expected_steal = (total_elapsed_time * expected_steal_pct + 5) / 100;
+   if (cpustat_delta[CPUTIME_STEAL] < expected_steal)
+   cpustat_delta[CPUTIME_STEAL] = 0;
+   else
+   cpustat_delta[CPUTIME_STEAL] -= expected_steal;
+
+   /* Adjust the steal time accordingly */
+   currstat[CPUTIME_ADJ_STEAL] = prevstat[CPUTIME_ADJ_STEAL]
+   + cpustat_delta[CPUTIME_STEAL];
+   prevstat[CPUTIME_ADJ_STEAL] = currstat[CPUTIME_ADJ_STEAL];
+}
+
+
 static int show_stat(struct seq_file *p, void *v)
 {
int i, j;
@@ -90,7 +152,11 @@ static int show_stat(struct seq_file *p, void *v)
getboottime(&boottime);
jif = boottime.tv_sec;
 
+
for_each_possible_cpu(i) {
+   /* adjust the steal time based on the processor entitlement */
+   kstat_adjust_steal_time(i);
+
user += kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE];
system += kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM];
@@ -98,7 +164,7 @@ static int show_stat(struct seq_file *p, void *v)
iowait += get_iowait_time(i);
irq += kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
softirq += kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ];
-   steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
+   steal += kcpustat_cpu(i).cpustat[CPUTIME_ADJ_STEAL];
guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
sum += kstat_cpu_irqs_sum(i);
@@ -135,7 +201,7 @@ static int show_stat(struct seq_file *p, void *v)
iowait = get_iowait_time(i);
irq = kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
softirq = kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ];
-   steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
+   steal = kcpustat_cpu(i).cpustat[CPUTIME_ADJ_STEAL];
guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];

[PATCH RFC 2/3] Add a hypercall to retrieve the cpu entitlement value from the host.

2012-08-23 Thread Michael Wolf

If the hypercall is not implemented on the host a default value of
100 will be used. This value will be stored in 
/proc/sys/kernel/cpu_entitlements.

Signed-off-by: Michael Wolf 
---
 arch/x86/kvm/x86.c   |8 
 fs/proc/stat.c   |9 +
 include/linux/kvm.h  |3 +++
 include/linux/kvm_host.h |3 +++
 include/linux/kvm_para.h |1 +
 virt/kvm/kvm_main.c  |7 +++
 6 files changed, 31 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 42bce48..734bc3d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5052,6 +5052,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_VAPIC_POLL_IRQ:
ret = 0;
break;
+   case KVM_HC_CPU_ENTITLEMENT:
+   ret = vcpu->kvm->vcpu_entitlement;
+   break;
default:
ret = -KVM_ENOSYS;
break;
@@ -5897,6 +5900,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm *kvm, long entitlement)
+{
+   kvm->vcpu_entitlement = (int) entitlement;
+   return 0;
+}
 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
struct i387_fxsave_struct *fxsave =
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 14e26c8..cf5 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int cpu_entitlement = 100;
 #ifndef arch_irq_stat_cpu
@@ -213,6 +214,14 @@ static const struct file_operations proc_stat_operations = 
{
 
 static int __init proc_stat_init(void)
 {
+   long ret;
+
+   if (kvm_para_available()) {
+   ret = kvm_hypercall0(KVM_HC_CPU_ENTITLEMENT);
+   if (ret > 0)
+   cpu_entitlement = (int) ret;
+   }
+
proc_create("stat", 0, NULL, &proc_stat_operations);
return 0;
 }
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..fccd08e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -904,6 +904,9 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL_IO(KVMIO,   0xad)
+/* Set the cpu entitlement this will be used to adjust stealtime reporting */
+#define KVM_SET_ENTITLEMENT  _IOW(KVMIO, 0xae, unsigned long)
+
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b70b48b..71e3d73 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -276,6 +276,7 @@ struct kvm {
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
atomic_t online_vcpus;
int last_boosted_vcpu;
+   int vcpu_entitlement;
struct list_head vm_list;
struct mutex lock;
struct kvm_io_bus *buses[KVM_NR_BUSES];
@@ -538,6 +539,8 @@ void kvm_arch_hardware_unsetup(void);
 void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm *kvm,
+   long entitlement);
 
 void kvm_free_physmem(struct kvm *kvm);
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index ff476dd..95f3387 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_CPU_ENTITLEMENT  5
 
 /*
  * hypercalls use architecture specific
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2468523..a0a4939 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2093,6 +2093,13 @@ static long kvm_vm_ioctl(struct file *filp,
break;
}
 #endif
+   case KVM_SET_ENTITLEMENT: {
+   r = kvm_arch_vcpu_ioctl_set_entitlement(kvm, arg);
+   if (r)
+   goto out;
+   r = 0;
+   break;
+   }
default:
r = kvm_arch_vm_ioctl(filp, ioctl, arg);
if (r == -ENOTTY)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-24 Thread Michael Wolf

On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote:
> On 08/24/2012 03:14 AM, Michael Wolf wrote:
> > This is an RFC regarding the reporting of stealtime.  In the case of
> > where you have a system that is running with partial processors such as
> > KVM the user may see steal time being reported in accounting tools such
> > as top or vmstat.  This can cause confusion for the end user.  To
> > ease the confusion this patch set adds a sysctl interface to set the
> > cpu entitlement.  This is the percentage of cpu that the guest system is
> >  expected to receive.  As long as the steal time is within its expected
> > range it will show up as 0 in /proc/stat.  The user will then see in the
> > accounting tools that they are getting a full utilization of the cpu
> > resources assigned to them.
> > 
> 
> And how is such a knob not confusing?
> 
> Steal time is pretty well defined in meaning and is shown in top for
> ages. I really don't see the point for this.

Currently you can see the steal time but you have no way of knowing if
the cpu utilization you are seeing on the guest is the expected amount.
I decided on making it a knob because a guest could be migrated to
another system and it's entitlement could change because of hardware or 
load differences.  It could simply be a /proc file and report the
current entitlement if needed.   As things are currently implemented I 
don't see how someone knows if the guest is running as expected or
whether there is a problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf

On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote:
> On 08/24/2012 11:11 AM, Michael Wolf wrote:
> > On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote:
> >> On 08/24/2012 03:14 AM, Michael Wolf wrote:
> >>> This is an RFC regarding the reporting of stealtime.  In the case of
> >>> where you have a system that is running with partial processors such as
> >>> KVM the user may see steal time being reported in accounting tools such
> >>> as top or vmstat.  This can cause confusion for the end user.  To
> >>> ease the confusion this patch set adds a sysctl interface to set the
> >>> cpu entitlement.  This is the percentage of cpu that the guest system is
> >>>  expected to receive.  As long as the steal time is within its expected
> >>> range it will show up as 0 in /proc/stat.  The user will then see in the
> >>> accounting tools that they are getting a full utilization of the cpu
> >>> resources assigned to them.
> >>>
> >>
> >> And how is such a knob not confusing?
> >>
> >> Steal time is pretty well defined in meaning and is shown in top for
> >> ages. I really don't see the point for this.
> > 
> > Currently you can see the steal time but you have no way of knowing if
> > the cpu utilization you are seeing on the guest is the expected amount.
> > I decided on making it a knob because a guest could be migrated to
> > another system and it's entitlement could change because of hardware or 
> > load differences.  It could simply be a /proc file and report the
> > current entitlement if needed.   As things are currently implemented I 
> > don't see how someone knows if the guest is running as expected or
> > whether there is a problem.
> > 
> 
> Turning off steal time display won't get even close to displaying the
> information you want. What you probably want is a guest-visible way to
> say how many miliseconds you are expected to run each second. Right?

It is not clear to me how knowing how many milliseconds you are
expecting to run will help the user.  Currently the users will run top
to see how well the guest is running.  If they see _any_ steal time some
users think they are not getting the full use of their processor
entitlement.

Maybe I'm missing what you are proposing, but even if you knew the
milliseconds that you were expecting for your processor you would have
to adjust the top output in your head so to speak.  You would see the
utilization and then say 'ok that matches the number of milliseconds I
expected to run..."   If we take away the steal time (as long as it is
equal to or less than the expected amount of steal time) then the user
running top will see the 100% utilization.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf

On Mon, 2012-08-27 at 11:50 -0700, Glauber Costa wrote:
> On 08/27/2012 08:50 AM, Michael Wolf wrote:
> > On Sat, 2012-08-25 at 19:36 -0400, Glauber Costa wrote:
> >> On 08/24/2012 11:11 AM, Michael Wolf wrote:
> >>> On Fri, 2012-08-24 at 08:53 +0400, Glauber Costa wrote:
> >>>> On 08/24/2012 03:14 AM, Michael Wolf wrote:
> >>>>> This is an RFC regarding the reporting of stealtime.  In the case of
> >>>>> where you have a system that is running with partial processors such as
> >>>>> KVM the user may see steal time being reported in accounting tools such
> >>>>> as top or vmstat.  This can cause confusion for the end user.  To
> >>>>> ease the confusion this patch set adds a sysctl interface to set the
> >>>>> cpu entitlement.  This is the percentage of cpu that the guest system is
> >>>>>  expected to receive.  As long as the steal time is within its expected
> >>>>> range it will show up as 0 in /proc/stat.  The user will then see in the
> >>>>> accounting tools that they are getting a full utilization of the cpu
> >>>>> resources assigned to them.
> >>>>>
> >>>>
> >>>> And how is such a knob not confusing?
> >>>>
> >>>> Steal time is pretty well defined in meaning and is shown in top for
> >>>> ages. I really don't see the point for this.
> >>>
> >>> Currently you can see the steal time but you have no way of knowing if
> >>> the cpu utilization you are seeing on the guest is the expected amount.
> >>> I decided on making it a knob because a guest could be migrated to
> >>> another system and it's entitlement could change because of hardware or 
> >>> load differences.  It could simply be a /proc file and report the
> >>> current entitlement if needed.   As things are currently implemented I 
> >>> don't see how someone knows if the guest is running as expected or
> >>> whether there is a problem.
> >>>
> >>
> >> Turning off steal time display won't get even close to displaying the
> >> information you want. What you probably want is a guest-visible way to
> >> say how many miliseconds you are expected to run each second. Right?
> > 
> > It is not clear to me how knowing how many milliseconds you are
> > expecting to run will help the user.  Currently the users will run top
> > to see how well the guest is running.  If they see _any_ steal time some
> > users think they are not getting the full use of their processor
> > entitlement.
> >
> 
> And your plan is just to selectively lie about it, but disabling it with
> a knob?

It is about making it very obvious to the end user whether they are
receiving their cpu entitlement.  If there is more steal time than
expected that will still show up.  I have experimented, and it seems to
work, to put the raw stealtime at the end of each cpu line
in /proc/stat.  That way the raw data is there as well.   

Do you have another suggestion to communicate to the user whether they
are receiving their full entitlement?  At the very least shouldn't the
entitlement reside in a /proc file somewhere so that the user could look
up the value and "do the math"?

> 
> > Maybe I'm missing what you are proposing, but even if you knew the
> > milliseconds that you were expecting for your processor you would have
> > to adjust the top output in your head so to speak.  You would see the
> > utilization and then say 'ok that matches the number of milliseconds I
> > expected to run..."   If we take away the steal time (as long as it is
> > equal to or less than the expected amount of steal time) then the user
> > running top will see the 100% utilization.
> > 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf

On Mon, 2012-08-27 at 11:55 -0700, Avi Kivity wrote:
> On 08/23/2012 04:14 PM, Michael Wolf wrote:
> > This is an RFC regarding the reporting of stealtime.  In the case of
> > where you have a system that is running with partial processors such as
> > KVM the user may see steal time being reported in accounting tools such
> > as top or vmstat.  This can cause confusion for the end user.  To
> > ease the confusion this patch set adds a sysctl interface to set the
> > cpu entitlement.  This is the percentage of cpu that the guest system is
> >  expected to receive.  As long as the steal time is within its expected
> > range it will show up as 0 in /proc/stat.  The user will then see in the
> > accounting tools that they are getting a full utilization of the cpu
> > resources assigned to them.
> >
> > This patchset is changing the contents/output of /proc/stat and could 
> > affect 
> > user tools.  However the default setting is that the cpu is entitled to 
> > 100% 
> > so the code will act as before.  Also another field could be added to the 
> > /proc/stat output and show the unaltered steal time. Since this additional 
> > field could cause more confusion than it would clear up I have left it out 
> > for now.
> > 
> 
> How would a guest know what its entitlement is?
> 
> 

Currently the Admin/management tool setting up the guests will put it on
the qemu commandline.  From this it is passed via an ioctl to the host.
The guest will get the value from the host via a hypercall.

In the future the host could try and do some of it automatically in some
cases. 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf

On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote:
> On 08/27/2012 01:23 PM, Michael Wolf wrote:
> > > 
> > > How would a guest know what its entitlement is?
> > > 
> > > 
> >
> > Currently the Admin/management tool setting up the guests will put it on
> > the qemu commandline.  From this it is passed via an ioctl to the host.
> > The guest will get the value from the host via a hypercall.
> >
> > In the future the host could try and do some of it automatically in some
> > cases. 
> 
> Seems to me it's a meaningless value for the guest.  Suppose it is
> migrated to a host that is more powerful, and as a result its relative
> entitlement is reduced.  The value needs to be adjusted.

This is why I chose to manage the value from the sysctl interface rather
than just have it stored as a value in /proc.  Whatever tool was used to
migrate the vm could hopefully adjust the sysctl value on the guest.
> 
> This is best taken care of from the host side.

Not sure what you are getting at here.  If you are running in a cloud
environment, you purchase a VM with the understanding that you are
getting certain resources.  As this type of user I don't believe you
have any access to the host to see this type of information.  So the
user still wouldnt have a way to confirm that they are receiving what
they should be in the way of processor resources.

Would you please elaborate a little more on this?
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 0/3] Add guest cpu_entitlement reporting

2012-08-27 Thread Michael Wolf

On Mon, 2012-08-27 at 14:41 -0700, Glauber Costa wrote:
> On 08/27/2012 02:27 PM, Michael Wolf wrote:
> > On Mon, 2012-08-27 at 13:31 -0700, Avi Kivity wrote:
> >> On 08/27/2012 01:23 PM, Michael Wolf wrote:
> >>>>
> >>>> How would a guest know what its entitlement is?
> >>>>
> >>>>
> >>>
> >>> Currently the Admin/management tool setting up the guests will put it on
> >>> the qemu commandline.  From this it is passed via an ioctl to the host.
> >>> The guest will get the value from the host via a hypercall.
> >>>
> >>> In the future the host could try and do some of it automatically in some
> >>> cases. 
> >>
> >> Seems to me it's a meaningless value for the guest.  Suppose it is
> >> migrated to a host that is more powerful, and as a result its relative
> >> entitlement is reduced.  The value needs to be adjusted.
> > 
> > This is why I chose to manage the value from the sysctl interface rather
> > than just have it stored as a value in /proc.  Whatever tool was used to
> > migrate the vm could hopefully adjust the sysctl value on the guest.
> >>
> >> This is best taken care of from the host side.
> > 
> > Not sure what you are getting at here.  If you are running in a cloud
> > environment, you purchase a VM with the understanding that you are
> > getting certain resources.  As this type of user I don't believe you
> > have any access to the host to see this type of information.  So the
> > user still wouldnt have a way to confirm that they are receiving what
> > they should be in the way of processor resources.
> > 
> > Would you please elaborate a little more on this?
> 
> What do you mean they have no access to the host?
> They have access to all sorts of tools that display information from the
> host. Speaking of a view-only resource, those are strictly equivalent.
> 
> 
> 

ok.  I will go look at those resources. 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-10-16 Thread Michael Wolf

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.  To ease the confusion this patch set
adds the idea of consigned (expected steal) time.  The host will separate
the consigned time from the steal time.  The consignment limit passed to the
host will be the amount of steal time expected within a fixed period of
time.  Any other steal time accruing during that period will show as the
traditional steal time.

TODO:
* Change native_clock to take params and not return a value
* Change update_rq_clock_task

Changes from V1:
* Removed the steal time allowed percentage from the guest
* Moved the separation of consigned (expected steal) and steal time to the
  host.
* No longer include a sysctl interface.

---

Michael Wolf (5):
  Alter the amount of steal time reported by the guest.
  Expand the steal time msr to also contain the consigned time.
  Add the code to send the consigned time from the host to the guest
  Add a timer to allow the separation of consigned from steal time.
  Add an ioctl to communicate the consign limit to the host.


 arch/x86/include/asm/kvm_host.h |   11 +
 arch/x86/include/asm/kvm_para.h |3 ++
 arch/x86/include/asm/paravirt.h |4 ++-
 arch/x86/kernel/kvm.c   |8 ++
 arch/x86/kvm/x86.c  |   50 ++-
 fs/proc/stat.c  |9 +--
 include/linux/kernel_stat.h |2 ++
 include/linux/kvm.h |2 ++
 include/linux/kvm_host.h|2 ++
 kernel/sched/cputime.c  |   21 +++-
 kernel/sched/sched.h|2 ++
 virt/kvm/kvm_main.c |7 +
 12 files changed, 108 insertions(+), 13 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC V2 1/5] Alter the amount of steal time reported by the guest.

2012-10-16 Thread Michael Wolf

Modify the amount of stealtime that the kernel reports via the /proc interface.
Steal time will now be broken down into steal_time and consigned_time.
Consigned_time will represent the amount of time that is expected to be lost
due to overcommitment of the physical cpu or by using cpu capping.  The amount
consigned_time will be passed in using an ioctl.  The time will be expressed in
the number of nanoseconds to be lost in during the fixed period.  The fixed 
period
is currently 1/10th of a second.

Signed-off-by: Michael Wolf 
---
 fs/proc/stat.c  |9 +++--
 include/linux/kernel_stat.h |1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index e296572..cb7fe80 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -82,7 +82,7 @@ static int show_stat(struct seq_file *p, void *v)
int i, j;
unsigned long jif;
u64 user, nice, system, idle, iowait, irq, softirq, steal;
-   u64 guest, guest_nice;
+   u64 guest, guest_nice, consign;
u64 sum = 0;
u64 sum_softirq = 0;
unsigned int per_softirq_sums[NR_SOFTIRQS] = {0};
@@ -90,10 +90,11 @@ static int show_stat(struct seq_file *p, void *v)
 
user = nice = system = idle = iowait =
irq = softirq = steal = 0;
-   guest = guest_nice = 0;
+   guest = guest_nice = consign = 0;
getboottime(&boottime);
jif = boottime.tv_sec;
 
+
for_each_possible_cpu(i) {
user += kcpustat_cpu(i).cpustat[CPUTIME_USER];
nice += kcpustat_cpu(i).cpustat[CPUTIME_NICE];
@@ -105,6 +106,7 @@ static int show_stat(struct seq_file *p, void *v)
steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign += kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
sum += kstat_cpu_irqs_sum(i);
sum += arch_irq_stat_cpu(i);
 
@@ -128,6 +130,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
 
for_each_online_cpu(i) {
@@ -142,6 +145,7 @@ static int show_stat(struct seq_file *p, void *v)
steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
+   consign = kcpustat_cpu(i).cpustat[CPUTIME_CONSIGN];
seq_printf(p, "cpu%d", i);
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(user));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(nice));
@@ -153,6 +157,7 @@ static int show_stat(struct seq_file *p, void *v)
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(steal));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest));
seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(guest_nice));
+   seq_put_decimal_ull(p, ' ', cputime64_to_clock_t(consign));
seq_putc(p, '\n');
}
seq_printf(p, "intr %llu", (unsigned long long)sum);
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 36d12f0..c0b0095 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -27,6 +27,7 @@ enum cpu_usage_stat {
CPUTIME_STEAL,
CPUTIME_GUEST,
CPUTIME_GUEST_NICE,
+   CPUTIME_CONSIGN,
NR_STATS,
 };
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC V2 2/5] Expand the steal time msr to also contain the consigned time.

2012-10-16 Thread Michael Wolf

Add a consigned field.  This field will hold the time lost due to capping or 
overcommit.
The rest of the time will still show up in the steal-time field.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/paravirt.h |4 ++--
 arch/x86/kernel/kvm.c   |7 ++-
 kernel/sched/cputime.c  |2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a0facf3..a5f9f30 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
 {
-   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index c1d61ee..91b3b2a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -368,9 +368,8 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static u64 kvm_steal_clock(int cpu)
+static u64 kvm_steal_clock(int cpu, u64 *steal)
 {
-   u64 steal;
struct kvm_steal_time *src;
int version;
 
@@ -378,11 +377,9 @@ static u64 kvm_steal_clock(int cpu)
do {
version = src->version;
rmb();
-   steal = src->steal;
+   *steal = src->steal;
rmb();
} while ((version & 1) || (version != src->version));
-
-   return steal;
 }
 
 void kvm_disable_steal_time(void)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 81b763b..dd3fd46 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -275,7 +275,7 @@ static __always_inline bool steal_account_process_tick(void)
if (static_key_false(¶virt_steal_enabled)) {
u64 steal, st = 0;
 
-   steal = paravirt_steal_clock(smp_processor_id());
+   paravirt_steal_clock(smp_processor_id(), &steal);
steal -= this_rq()->prev_steal_time;
 
st = steal_ticks(steal);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RFC V2 3/5] Add the code to send the consigned time from the host to the guest

2012-10-16 Thread Michael Wolf

Add the code to send the consigned time from the host to the guest

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/include/asm/kvm_para.h |3 ++-
 arch/x86/include/asm/paravirt.h |4 ++--
 arch/x86/kernel/kvm.c   |3 ++-
 arch/x86/kvm/x86.c  |2 ++
 include/linux/kernel_stat.h |1 +
 kernel/sched/cputime.c  |   21 +++--
 kernel/sched/sched.h|2 ++
 8 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1eaa6b0..bd4e412 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -409,6 +409,7 @@ struct kvm_vcpu_arch {
u64 msr_val;
u64 last_steal;
u64 accum_steal;
+   u64 accum_consigned;
struct gfn_to_hva_cache stime;
struct kvm_steal_time steal;
} st;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2f7712e..debe72e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -42,9 +42,10 @@
 
 struct kvm_steal_time {
__u64 steal;
+   __u64 consigned;
__u32 version;
__u32 flags;
-   __u32 pad[12];
+   __u32 pad[10];
 };
 
 #define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index a5f9f30..d39e8d0 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,9 +196,9 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu, u64 *steal)
+static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
-   PVOP_VCALL2(pv_time_ops.steal_clock, cpu, steal);
+   PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
 }
 
 static inline unsigned long long paravirt_read_pmc(int counter)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 91b3b2a..4e5582a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -368,7 +368,7 @@ static struct notifier_block kvm_pv_reboot_nb = {
.notifier_call = kvm_pv_reboot_notify,
 };
 
-static u64 kvm_steal_clock(int cpu, u64 *steal)
+static u64 kvm_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
struct kvm_steal_time *src;
int version;
@@ -378,6 +378,7 @@ static u64 kvm_steal_clock(int cpu, u64 *steal)
version = src->version;
rmb();
*steal = src->steal;
+   *consigned = src->consigned;
rmb();
} while ((version & 1) || (version != src->version));
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f09552..801cfa8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1554,8 +1554,10 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
return;
 
vcpu->arch.st.steal.steal += vcpu->arch.st.accum_steal;
+   vcpu->arch.st.steal.consigned += vcpu->arch.st.accum_consigned;
vcpu->arch.st.steal.version += 2;
vcpu->arch.st.accum_steal = 0;
+   vcpu->arch.st.accum_consigned = 0;
 
kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index c0b0095..253fdce 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -125,6 +125,7 @@ extern unsigned long long task_delta_exec(struct 
task_struct *);
 extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
 extern void account_system_time(struct task_struct *, int, cputime_t, 
cputime_t);
 extern void account_steal_time(cputime_t);
+extern void account_consigned_time(cputime_t);
 extern void account_idle_time(cputime_t);
 
 extern void account_process_tick(struct task_struct *, int user);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index dd3fd46..bf2025a 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -244,6 +244,18 @@ void account_system_time(struct task_struct *p, int 
hardirq_offset,
 }
 
 /*
+ * This accounts for the time that is split out of steal time.
+ * Consigned time represents the amount of time that the cpu was
+ * expected to be somewhere else.
+ */
+void account_consigned_time(cputime_t cputime)
+{
+   u64 *cpustat =  kcpustat_this_cpu->cpustat;
+
+   cpustat[CPUTIME_CONSIGN] += (__force u64) cputime;
+}
+
+/*
  * Account for involuntary wait time.
  * @cputime: the cpu time spent in involuntary wait
  */
@@ -274,15 +286,20 @@ static __always_inline bool 
steal_account_process_tick(void)
 #ifdef CONFIG_PARAVIRT
if (static_key_false(¶virt_steal_enabled)) {

[PATCH RFC V2 4/5] Add a timer to allow the separation of consigned from steal time.

2012-10-16 Thread Michael Wolf

Add a timer to the host.  This will define the period.  During a period
the first n ticks will go into the consigned bucket.  Any other ticks that
occur within the period will be placed in the stealtime bucket.

Signed-off-by: Michael Wolf 
---
 arch/x86/include/asm/kvm_host.h |   10 +
 arch/x86/include/asm/paravirt.h |2 +-
 arch/x86/kvm/x86.c  |   42 ++-
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bd4e412..d700850 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -41,6 +41,8 @@
 #define KVM_PIO_PAGE_OFFSET 1
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 2
 
+#define KVM_STEAL_TIMER_DELAY 1UL
+
 #define CR0_RESERVED_BITS   \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
  | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM \
@@ -339,6 +341,14 @@ struct kvm_vcpu_arch {
bool tpr_access_reporting;
 
/*
+* timer used to determine if the time should be counted as
+* steal time or consigned time.
+*/
+   struct hrtimer steal_timer;
+   u64 current_consigned;
+   u64 consigned_limit;
+
+   /*
 * Paging state of the vcpu
 *
 * If the vcpu runs in guest mode with two level paging this still saves
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d39e8d0..6db79f9 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -196,7 +196,7 @@ struct static_key;
 extern struct static_key paravirt_steal_enabled;
 extern struct static_key paravirt_steal_rq_enabled;
 
-static inline u64 paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
+static inline void paravirt_steal_clock(int cpu, u64 *steal, u64 *consigned)
 {
PVOP_VCALL3(pv_time_ops.steal_clock, cpu, steal, consigned);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 801cfa8..469e748 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1535,13 +1535,32 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 static void accumulate_steal_time(struct kvm_vcpu *vcpu)
 {
u64 delta;
+   u64 steal_delta;
+   u64 consigned_delta;
 
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;
 
delta = current->sched_info.run_delay - vcpu->arch.st.last_steal;
vcpu->arch.st.last_steal = current->sched_info.run_delay;
-   vcpu->arch.st.accum_steal = delta;
+
+   /* split the delta into steal and consigned */
+   if (vcpu->arch.current_consigned < vcpu->arch.consigned_limit) {
+   vcpu->arch.current_consigned += delta;
+   if (vcpu->arch.current_consigned > vcpu->arch.consigned_limit) {
+   steal_delta = vcpu->arch.current_consigned
+   -  vcpu->arch.consigned_limit;
+   consigned_delta = delta - steal_delta;
+   } else {
+   consigned_delta = delta;
+   steal_delta = 0;
+   }
+   } else {
+   consigned_delta = 0;
+   steal_delta = delta;
+   }
+   vcpu->arch.st.accum_steal = steal_delta;
+   vcpu->arch.st.accum_consigned = consigned_delta;
 }
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
@@ -6187,11 +6206,25 @@ bool kvm_vcpu_compatible(struct kvm_vcpu *vcpu)
return irqchip_in_kernel(vcpu->kvm) == (vcpu->arch.apic != NULL);
 }
 
+enum hrtimer_restart steal_timer_fn(struct hrtimer *data)
+{
+   struct kvm_vcpu *vcpu;
+   ktime_t now;
+
+   vcpu = container_of(data, struct kvm_vcpu, arch.steal_timer);
+   vcpu->arch.current_consigned = 0;
+   now = ktime_get();
+   hrtimer_forward(&vcpu->arch.steal_timer, now,
+   ktime_set(0, KVM_STEAL_TIMER_DELAY));
+   return HRTIMER_RESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
struct page *page;
struct kvm *kvm;
int r;
+   ktime_t ktime;
 
BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;
@@ -6234,6 +6267,12 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
+   /* Initialize and start a timer to capture steal and consigned time */
+   hrtimer_init(&vcpu->arch.steal_timer, CLOCK_MONOTONIC,
+   HRTIMER_MODE_REL);
+   vcpu->arch.steal_timer.function = &steal_timer_fn;
+   ktime = ktime_set(0, KVM_STEAL_TIMER_DELAY);
+   hrtimer_start(&vcpu->arch.steal_timer, ktime, HRTIMER_MODE_REL);
 
return 0;
 fail_free_mce_banks:
@@ -6252,6 +6291,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)

[PATCH RFC V2 5/5] Add an ioctl to communicate the consign limit to the host.

2012-10-16 Thread Michael Wolf

Add an ioctl to communicate the consign limit to the host.

Signed-off-by: Michael Wolf 
---
 arch/x86/kvm/x86.c   |6 ++
 include/linux/kvm.h  |2 ++
 include/linux/kvm_host.h |2 ++
 virt/kvm/kvm_main.c  |7 +++
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 469e748..5a4b8db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5928,6 +5928,12 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return 0;
 }
 
+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm_vcpu *vcpu, long 
entitlement)
+{
+   vcpu->arch.consigned_limit = entitlement;
+   return 0;
+}
+
 int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
 {
struct i387_fxsave_struct *fxsave =
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..6b211fb 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -904,6 +904,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL_IO(KVMIO,   0xad)
+/* Set the consignment limit which will be used to separete steal time */
+#define KVM_SET_ENTITLEMENT  _IOW(KVMIO, 0xae, unsigned long)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8a59e0a..9a0a791 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -538,6 +538,8 @@ void kvm_arch_hardware_unsetup(void);
 void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
+int kvm_arch_vcpu_ioctl_set_entitlement(struct kvm_vcpu *vcpu,
+   long entitlement);
 
 void kvm_free_physmem(struct kvm *kvm);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d617f69..ccf0aec 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1936,6 +1936,13 @@ out_free2:
r = 0;
break;
}
+   case KVM_SET_ENTITLEMENT: {
+   r = kvm_arch_vcpu_ioctl_set_entitlement(vcpu, arg);
+   if (r)
+   goto out;
+   r = 0;
+   break;
+   }
default:
r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

OT: ROCK Linux 1.3.11 and 2.4.0-test kernels

2000-10-24 Thread Clifford Wolf


Hi,

sorry for posting something offtopic - but this might be for interrest to
everyone who would like to try a Linux Kernel 2.4.0 - based system.

A new development release of the ROCK Linux Distribution came out 
yesterday. ROCK Linux 1.3.11 is based on the Linux Kernel 2.4.0-test9,
GNU Libc 2.1.3, GCC 2.95.2 and Binutils 2.10.0.26.

That means that every singe package has been built with this software
components, on a system running this software components. We had some
minor problems with the new kernel header files, but no real troubles.

Here is the full announcement for those of you who are interested:


ANNOUNCE: ROCK Linux 1.3.11 [ 2000-10-23 ]

The ROCK Linux Developers are pleased to announce version 1.3.11 of the
ROCK Linux Distribution - a Linux Distribution for Admins, Hackers, Geeks
and skilled unix users.

ROCK Linux 1.3.11 is a development release and not intended for production
environments. It's based on the Linux Kernel 2.4.0-test9, GNU Libc 2.1.3,
GCC 2.95.2 and Binutils 2.10.0.26. The 1.3.11 release supports Intel
IA-32, Alpha AXP and PowerPC Systems. ROCK Linux uses DevFS, XFree 4.0.1,
GNOME 1.2, Message Passing for Beowulf Clusters and other bleeding edge
technology.

ROCK Linux is based on a simple build system using shell scripts which
allow you to re-build the entire distribution using full optimization for
your CPU or some special adaptions for your needs.  It also makes porting
ROCK Linux to new architectures a simple task.

More information on ROCK Linux can be found on the Homepage at

  [ http://www.rocklinux.org/ ]

or the "ROCK Linux Guide" at

  [ http://www.rocklinux.org/projects/doc/GUIDE/ ]

A complete CHANGELOG is available at

  [ http://www.rocklinux.org/changelog/ ]

The ROCK Linux Sources are available for download on the ROCK Linux
Homepage. The package sources used by ROCK Linux will show up on the ROCK
Linux mirrors withing the next couple of days.

A lot of fun and happy hacking,

 - Clifford Wolf (Project Leader)

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
Clifford Wolf  www.clifford.at IRC: http://opirc.nu
The ROCK Projects Workgroup .. www.rock-projects.com  Tel: +43-699-10063494
The ROCK Linux Workgroup . www.rocklinux.org  Fax: +43-2235-42788-4
The NTx Consulting Group . www.ntx.atemail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Scheduler behaviour

2007-12-07 Thread Holger Wolf


Arjan van de Ven wrote:

On Wed, 05 Dec 2007 21:15:30 +0100
Holger Wolf <[EMAIL PROTECTED]> wrote:

  

We discovered performance degradation with dbench when using kernel
2.6.23 compared to kernel 2.6.22.

In our case we booted a Linux in a IBM System z9 LPAR with 256MB of
ram with 4 CPU's. This system uses a striped LV with 16 disks on a
Storage Server connected via 8 4GBit links.
A dbench was started on that system performing I/O operations on the
striped LV. dbench runs were performed with 1 to 62 processes.
Measurements with a 2.6.22 kernel were compared to measurements with
a 2.6.23 kernel. We saw a throughput degradation from 7.2 to 23.4



this is good news!
dbench rewards unfair behavior... so higher dbench usually means a
worse kernel ;)


  

tests with 2.6.22 including CFS  show the same results.
This means the pressure on page cache is much higher when all processes 
run in parallel.
We see this behavior as well with iozone when writing on many disks with 
many threads and just 256 MB memory.


This means the scheduler schedules as it should - fair.

regards Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] powerpc systbl.h broken

2008-01-02 Thread Clifford Wolf

Hi,

In current 2.6.23 (I have checked 2.6.23.12 and 2.6.23.9) the end of
include/asm-powerpc/systbl.h reads:

--snip--
SYSCALL_SPU(getcpu)
COMPAT_SYS(epoll_pwait)
COMPAT_SYS_SPU(utimensat)
COMPAT_SYS(fallocate)
COMPAT_SYS_SPU(signalfd)
COMPAT_SYS_SPU(timerfd)
SYSCALL_SPU(eventfd)
COMPAT_SYS_SPU(sync_file_range2)
--snap--

This obviously does not match the definitions in asm-powerpc/unistd.h:

--snip--
#define __NR_getcpu 302
#define __NR_epoll_pwait303
#define __NR_utimensat  304
#define __NR_signalfd   305
#define __NR_timerfd306
#define __NR_eventfd307
#define __NR_sync_file_range2   308
#define __NR_fallocate  309
--snap--

which breaks the system calls 305 to 309 inclusive.

I've attached the simple patch which fixes the problem.

yours,
 - clifford

-- 
"Any sufficiently advanced technology is indistinguishable from magic."
 - Arthur C. Clarke
--- linux-2.6.23.12/include/asm-powerpc/systbl.h.orig   2008-01-02 
15:09:04.0 +0100
+++ linux-2.6.23.12/include/asm-powerpc/systbl.h2008-01-02 
15:09:29.0 +0100
@@ -308,8 +308,8 @@
 SYSCALL_SPU(getcpu)
 COMPAT_SYS(epoll_pwait)
 COMPAT_SYS_SPU(utimensat)
-COMPAT_SYS(fallocate)
 COMPAT_SYS_SPU(signalfd)
 COMPAT_SYS_SPU(timerfd)
 SYSCALL_SPU(eventfd)
 COMPAT_SYS_SPU(sync_file_range2)
+COMPAT_SYS(fallocate)

Scheduler behaviour

2007-12-05 Thread Holger Wolf

 
  1   10.96   3.84% 

  41.081.05   2.97% 

  81.081.07   1.10% 

 121.09 1.1  -1.30% 

 16 1.11.12  -1.90% 

 201.121.15  -2.61% 

 261.181.17   0.86% 

 321.241.36  -9.50% 

 40 1.31.52 -16.97% 

 461.29 1.6 -23.92% 

 501.371.64 -20.12% 

 541.281.66 -29.60% 

 62 1.5    1.69 -12.84% 


regards

Holger Wolf

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] rlim in proc//status

2008-01-15 Thread Clifford Wolf

Hi,

because I needed it already twice in two different projects this week: the
following patch adds rlim (ulimits) output to /proc//status.

Please let me know if there is another (already existing) way of accessing
this information easy (i.e. connecting with gdb to the process in question
and 'injecting' a getrlimit() call does not count.. ;-).

yours,
 - clifford

Signed-off-by: Clifford Wolf <[EMAIL PROTECTED]>

--- linux/fs/proc/array.c   (revision 757)
+++ linux/fs/proc/array.c   (working copy)
@@ -239,6 +239,55 @@
}
 }
 
+static char *rlim_names[RLIM_NLIMITS] = {
+   [RLIMIT_CPU]= "CPU",
+   [RLIMIT_FSIZE]  = "FSize",
+   [RLIMIT_DATA]   = "Data",
+   [RLIMIT_STACK]  = "Stack",
+   [RLIMIT_CORE]   = "Core",
+   [RLIMIT_RSS]= "RSS",
+   [RLIMIT_NPROC]  = "NProc",
+   [RLIMIT_NOFILE] = "NoFile",
+   [RLIMIT_MEMLOCK]= "MemLock",
+   [RLIMIT_AS] = "AddrSpace",
+   [RLIMIT_LOCKS]  = "Locks",
+   [RLIMIT_SIGPENDING] = "SigPending",
+   [RLIMIT_MSGQUEUE]   = "MsgQueue",
+   [RLIMIT_NICE]   = "Nice",
+   [RLIMIT_RTPRIO] = "RTPrio"
+};
+
+static inline char *task_rlim(struct task_struct *p, char *buffer)
+{
+   unsigned long flags;
+   struct rlimit rlim[RLIM_NLIMITS];
+   int i;
+   
+   rcu_read_lock();
+   if (lock_task_sighand(p, &flags)) {
+   for (i=0; isignal->rlim[i];
+   }
+   rcu_read_unlock();
+
+   for (i=0; ihttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: rlim in proc//status

2008-01-15 Thread Clifford Wolf

Hi,

On Tue, Jan 15, 2008 at 07:47:22PM +0900, KOSAKI Motohiro wrote:
> sound good for me.
> a few question please.
> 
> > +   for (i=0; i > +   if (rlim_names[i])
> > +   buffer += sprintf(buffer, "Rlim%s:\t", rlim_names[i]);
> > +   else
> > +   buffer += sprintf(buffer, "Rlim%d:\t", i);
> 
> this else is really necessary?

no. not with the current sources. maybe something like the following would
be better:

#if RLIM_NLIMITS != 15
#  error New RLIM_NLIMITS add mising entries to rlim_names[]
#endif

> > +   if (rlim[i].rlim_cur != ~0)
> > +   buffer += sprintf(buffer, "%lu\t", rlim[i].rlim_cur);
> > +   else
> > +   buffer += sprintf(buffer, "-\t");
> > +   if (rlim[i].rlim_max != ~0)
> > +   buffer += sprintf(buffer, "%lu\n", rlim[i].rlim_max);
> > +   else
> > +   buffer += sprintf(buffer, "-\n");
> 
> Why do you don't use RLIM_INFINITY?

because I'm blind and didn't see it... ;-)

maybe it would also be better to output 'inf' instead of '-' in this case?

yours,
 - clifford

-- 
The number of the beast - vi vi vi.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] rlim in proc//status (2nd rev.)

2008-01-15 Thread Clifford Wolf

Hi,

On Tue, Jan 15, 2008 at 02:36:59PM -0600, [EMAIL PROTECTED] wrote:
> > +   rcu_read_lock();
> > +   if (lock_task_sighand(p, &flags)) {
> > +   for (i=0; i > +   rlim[i] = p->signal->rlim[i];
> 
> I'm confused - where do you unlock_task_sighand()?

oh fsck! thanks for that pointer..

Here is a new version of the patch which solves this issue and the issues
adressed earlier in this thread by kosaki.

yours,
 - clifford

Signed-off-by: Clifford Wolf <[EMAIL PROTECTED]>

--- linux/fs/proc/array.c   (revision 750)
+++ linux/fs/proc/array.c   (revision 764)
@@ -239,6 +239,58 @@
}
 }
 
+static char *rlim_names[RLIM_NLIMITS] = {
+   [RLIMIT_CPU]= "CPU",
+   [RLIMIT_FSIZE]  = "FSize",
+   [RLIMIT_DATA]   = "Data",
+   [RLIMIT_STACK]  = "Stack",
+   [RLIMIT_CORE]   = "Core",
+   [RLIMIT_RSS]= "RSS",
+   [RLIMIT_NPROC]  = "NProc",
+   [RLIMIT_NOFILE] = "NoFile",
+   [RLIMIT_MEMLOCK]= "MemLock",
+   [RLIMIT_AS] = "AddrSpace",
+   [RLIMIT_LOCKS]  = "Locks",
+   [RLIMIT_SIGPENDING] = "SigPending",
+   [RLIMIT_MSGQUEUE]   = "MsgQueue",
+   [RLIMIT_NICE]   = "Nice",
+   [RLIMIT_RTPRIO] = "RTPrio"
+};
+
+#if RLIM_NLIMITS != 15
+#  error Value of RLIM_NLIMITS changed. \
+ Please update rlim_names in fs/proc/array.c
+#endif
+
+static inline char *task_rlim(struct task_struct *p, char *buffer)
+{
+   unsigned long flags;
+   struct rlimit rlim[RLIM_NLIMITS];
+   int i;
+   
+   rcu_read_lock();
+   if (lock_task_sighand(p, &flags)) {
+   for (i=0; isignal->rlim[i];
+   unlock_task_sighand(p, &flags);
+   }
+   rcu_read_unlock();
+
+   for (i=0; ihttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] rlim in proc//status (2nd rev.)

2008-01-16 Thread Clifford Wolf

Hi,

On Wed, Jan 16, 2008 at 04:33:12PM +0900, KOSAKI Motohiro wrote:
> Hi Clifford,
> 
> > +static inline char *task_rlim(struct task_struct *p, char *buffer)
> > +{
> > +   unsigned long flags;
> > +   struct rlimit rlim[RLIM_NLIMITS];
> > +   int i;
> > +   
> > +   rcu_read_lock();
> > +   if (lock_task_sighand(p, &flags)) {
> > +   for (i=0; i > +   rlim[i] = p->signal->rlim[i];
> > +   unlock_task_sighand(p, &flags);
> > +   }
> 
> lock_task_sighand is possible return NULL?
> if so, rlim is uninitialized when NULL.

hmm.. can p->sighand be NULL here? If so, there also is a problem in the
task_sig() function in the same file.

In fact that code is copy&paste from task_sig(). In fact I'm not even sure
if I actually need to lock anything to access p->signal->rlim.. I've
searched the kernel code a bit and it looks like most acesses to the rlimits
are done unlocked, which is no problem imo given the typical change-pattern
of this values..

anyone a clue on this issues?

yours,
 - clifford

-- 
2B OR (NOT 2B) That is the question. The answer is FF.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Problem with SMC Etherpower II + kernel newer 2.4.2

2001-07-02 Thread Juergen Wolf


Hi everybody,

currently I experience some strange problems with every kernels newer
than 2.4.2 and my SMC Etherpower II network card. While running such a
kernel, the network hangs and I get lots of errors like these listed
below:

Jul  2 13:06:59 localhost kernel: eth0: Too much work at interrupt,
IntrStatus=0x008d0004.
Jul  2 13:07:06 localhost last message repeated 5 times
Jul  2 13:07:20 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed
out
Jul  2 13:07:20 localhost kernel: eth0: Transmit timeout using MII
device, Tx status 4003.
Jul  2 13:07:22 localhost kernel: eth0: Too much work at interrupt,
IntrStatus=0x008d0004.


The /proc/pci lists the following system components:

PCI devices found:
  Bus  0, device   0, function  0:
Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev
3).
  Master Capable.  Latency=32.  
  Prefetchable 32 bit memory at 0xd800 [0xdbff].
  Bus  0, device   1, function  0:
PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
(rev 0).
  Master Capable.  No bursts.  Min Gnt=12.
  Bus  0, device   7, function  0:
ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 34).
  Bus  0, device   7, function  1:
IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 16).
  Master Capable.  Latency=32.  
  I/O at 0xc000 [0xc00f].
  Bus  0, device   7, function  2:
USB Controller: VIA Technologies, Inc. UHCI USB (rev 16).
  IRQ 9.
  Master Capable.  Latency=32.  
  I/O at 0xc400 [0xc41f].
  Bus  0, device   7, function  3:
USB Controller: VIA Technologies, Inc. UHCI USB (#2) (rev 16).
  IRQ 9.
  Master Capable.  Latency=32.  
  I/O at 0xc800 [0xc81f].
  Bus  0, device   7, function  4:
Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
(rev 48).
  Bus  0, device   7, function  5:
Multimedia audio controller: VIA Technologies, Inc. AC97 Audio
Controller (rev 32).
  IRQ 11.
  I/O at 0xcc00 [0xccff].
  I/O at 0xd000 [0xd003].
  I/O at 0xd400 [0xd403].
  Bus  0, device   9, function  0:
Multimedia audio controller: Xilinx, Inc. RME Digi96/8 (rev 4).
  IRQ 10.
  Non-prefetchable 32 bit memory at 0xde00 [0xdeff].
  Bus  0, device  10, function  0:
Multimedia audio controller: Ensoniq 5880 AudioPCI (rev 2).
  IRQ 5.
  Master Capable.  Latency=32.  Min Gnt=12.Max Lat=128.
  I/O at 0xdc00 [0xdc3f].
  Bus  0, device  11, function  0:
Ethernet controller: Standard Microsystems Corp [SMC] 83C170QF (rev
9).
  IRQ 11.
  Master Capable.  Latency=32.  Min Gnt=8.Max Lat=28.
  I/O at 0xe000 [0xe0ff].
  Non-prefetchable 32 bit memory at 0xe000 [0xefff].
  Bus  1, device   0, function  0:
VGA compatible controller: nVidia Corporation NV11 (rev 161).
  IRQ 10.
  Master Capable.  Latency=32.  Min Gnt=5.Max Lat=1.
  Non-prefetchable 32 bit memory at 0xdc00 [0xdcff].
  Prefetchable 32 bit memory at 0xd000 [0xd7ff].


Does anybody else got these errors or knows about a solution for this ??
The 2.2.x kernels and all kernel versions below (including) 2.4.2 work
fine on the same system and I did not find any entries in the changelogs
for the SMC driver code.

Thx 
Juergen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with SMC Etherpower II + kernel newer 2.4.2

2001-07-04 Thread Juergen Wolf

John Jasen wrote:
> 
> a) bad cable? 
> b) not negotiating speed and duplex with the switch correctly? 
> c) bad card? 
>

Hi,

 The same errors show up with the network cable plugged or unplugged on
all computers with the SMC card around here. But all these computers are
equipped with nearly the same hardware (see my first posting of the
/proc/pci file). Also the 2.4.6 kernel does not solve the problem.  

> d) IRQ sharing causing a conflict? 

 I dont think so, at least I dont get a IRQ conflict message and there
is no other device shown as using the same interrupt. If I use the 2.4.2
kernel or a version below everything works fine on the same host.

Another strange effect is, that if I wait for quite some time (5-10
Minutes) while trying to start up the eth0 device with  "ifconfig eth0
up"  I see messages like

Jul  4 09:38:58 localhost kernel: eth0: Setting full-duplex based on MII
#3 link partner capability of 41e1.
Jul  4 09:39:00 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed
out
Jul  4 09:39:00 localhost kernel: eth0: Transmit timeout using MII
device, Tx status 000b.
Jul  4 09:39:02 localhost kernel: eth0: Too much work at interrupt,
IntrStatus=0x008d0004.
Jul  4 09:40:55 localhost kernel: eth0: Setting half-duplex based on MII
#3 link partner capability of 0001.

in between hundreds of "too much work at interrupt" messages. This error
also occures regardles of the network cable is plugged or unplugged.

Regards
Juergen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with SMC Etherpower II + kernel newer 2.4.2

2001-07-06 Thread Juergen Wolf

Francois Romieu wrote:
> 
> Is X or something like a nvidia module enabled ?
> 

Hi,

 the nvidia modul is not loaded or enabled but X is running sometimes.
Anyways, it seems to happen if X is not running too.
Luckily I got a very helpfull hint from Hans-Christian Armingeon in
reply to my questions here on the list. The epic100.c from
http://lrcwww.epfl.ch/~boch/sw/epic100.c.txt fixes the problem in all
the affected kernel versions. 

Thanx for your help guys
Juergen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with SMC Etherpower II + kernel newer 2.4.2

2001-07-06 Thread Juergen Wolf


Francois Romieu wrote:
> 
> Could you try 2.4.6 with just this modification: ?
> 

hm, looks like thats really the point. After applying your diff file
everything works fine.

Regards,
Juergen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with SMC Etherpower II + kernel newer 2.4.2

2001-07-06 Thread Juergen Wolf


Jeff Garzik wrote:
>
> Does it work with this line?
> 
> outl(0x12, ioaddr + MIICfg);
> 

yes, works fine too

Regards,
Juergen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

DMA Engine vs. DMA Transfer API

2007-07-21 Thread Clifford Wolf

Hi Shannon, hi lkml,

I'm writing a driver for the built-in DMA controller of the Freescale
MPC8349E PowerPC based microcontroller. Unfortunately the DMA-Engine
API seams to be the totally wrong thing for that since this DMA controller
has many features which are not available thru dmaengine afaics, such as:

 * Scatter-Gather on source and/or destination
 * Fifo modes for 8-, 16- and 32-bit (on source and dest)
 * Snooping mode, Multiple direct PCI read modes, ...

Also this DMA controller has only 4 channels (I do not know how many
channels I/OAT has). So I needed a spooling subsystem so many drivers
can easily share the available dma channels. The api also supports
more than one backend device and automatically finds out which backend
device fits a dma request best. I've named the entire thing dmatransfer
api. A current devel snapshot can be found at:

http://www.clifford.at/priv/dmatransfer-20070711.diff

(I am on vacation right now and don't have access to my test hardware. So
the entire thing is 'frozen' right now until the second august week when
I'm back in the office.)

The smart infrasturcture adds some overhead: In a testcase copying 1GB of
data from memory to memory in 64kB chunks (300MHz DDR SDRAM, 500MHz PowerPC
CPU) this overhead is about 0.25% over controlling the dma controller
directly without my framework.

Afaics there is only one user of DMA Engine framework right now and only
one driver: The network stack.

any chances that we can migrate I/OAT to my more flexible DMA Engine
framework and modify the network stack to use dmatransfer instead of
dmaengine or in any other way integrate the APIs?

yours,
 - clifford

-- 
"In the future, airplanes will be flown by a dog and a pilot. And the
dog's job will be to make sure that if the pilot tries to touch any of
the buttons, the dog bites him." - Scott Adams (author of Dilbert).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Mem-2-Mem DMA - Generalized API (MPC8349E)

2007-06-23 Thread Clifford Wolf

Hi,

I'm curently working on an embedded project using the Freescale MPC8349E
microcontroller (PPC e300 core + almost everything one needs for
communicating with the rest of the world on one ~100$ chip).

One feature of that microcontroller is a generic memory-2-memory four
channel dma controller. Unlike the PC 8237 DMA controller (which copies
data between the memory and dedicated DMA channels) the memory-2-memory
DMA controller of the MPC8349E can copy between any bus addresses and thus
can be used for copying data from one memory area to another, between
memory and periphery mapped on bus addresses as well as from one peripheral
device on the bus to another. The MPC8349E has four DMA channels which
(unlike the PC 8237 DMA) are not bound to be used exclusively for dma
transfers from or to a dedicated peripheral device.

I doubt that the MPC8349E is the only device out there with such a DMA
controller and so I was looking for a generalized kernel API for working
with such DMA controllers to add MPC8349E support to it (since grepping
the kernel for the register addresses didn't show up any usefull results
I assume that there is no existing implementaion).

Has anyone thought already about an API fur such a generic kernel
framework? I was thinking about a 'copy using DMA if possible' function
which is first sleeping until a DMA channel gets free, then reprograms
this channel and again sleeps until the DMA transfer has been finished.

The MPC8349E has scatter-gather features for source and destination and can
eighter increment the source/destination addresses during the transfer or
run in (8, 16 or 32 bit) fifo mode. So it is a bit more complicated than
a 'memcpy on bus addresses which may sleep'.

Extending the request_dma() - free_dma() API looks like the wrong way to go
on this issue. It is afaics designed for a compleatly diffrent kind of DMA
controller.

Any thoughts? Am I completely missing something?

yours,
 - clifford

-- 
For extra security, this message has been encrypted with double-ROT13.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: coding style

2007-06-16 Thread Clifford Wolf

Hey,

On Fri, Jun 15, 2007 at 11:16:08AM +0200, Jan Engelhardt wrote:
> >> so which one is preferred for the kernel?
> 
> err = very_long_function_name(lots_of_arguments,
>   less,
>   less,
>   less,
>   less,
>   even_more_arguments,
>   more_of_this,
>   more_of_that,
>   more,
>   more,
>   more);
> 
> IMO, preferred:
> 
> err = very_long_function_name(lots_of_arguments, less, less, less, less,
>   even_more_arguments, more_of_this, more_of_that, more, more, more);

Looking at e.g. the fuction declarations in fs/namespace.c shows very well
that there seams to be no 'preferred in the kernel source' for this
question.

I presonally prefer indenting the continuation of a line with TWO
additional tabs so it is good to distinguish from a normally indented
command block. E.g.:

static int function_with_long_name(int and_many_arguments,
int not_fitting_in_a_signle_line_anymore)
{
if (and_many_arguments > not_fitting_in_a_signle_line_anymore &&
not_fitting_in_a_signle_line_anymore > 0)
and_many_arguments += not_fitting_in_a_signle_line_anymore;
else
not_fitting_in_a_signle_line_anymore *=
not_fitting_in_a_signle_line_anymore;
return and_many_arguments ^ not_fitting_in_a_signle_line_anymore;
}

maybe this won't win a design contest but it is a simple and non-ambiguous
coding style and afaics does not conflict with the CodingStyle document.

yours,
 - clifford

-- 
When your hammer is C++, everything begins to look like a thumb.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] spi_mpc83xx.c underclocking hotfix

2007-06-06 Thread Clifford Wolf

Hi kumar (spi_mpc83xx maintainer), Hi list,

the MPC83xx SPI controller clock divider can divide the system clock by not
more then 1024. The spi_mpc83xx driver does not check this and silently
writes garbage to the SPI controller registers when asked to run at lower
frequencies. I've tried to run the SPI on a 266MHz MPC8349E with 100kHz for
debugging a bus problem and suddenly was confronted with a 2nd problem to
debug..  ;-)

The attached patch adds an additional check which avoids writing garbage to
the SPI controller registers and warn the user about it. This might help
others to avoid simmilar problems.

yours,
 - clifford

-- 
Hardware /nm./: the part of the computer that you can kick.
Index: drivers/spi/spi_mpc83xx.c
===
--- drivers/spi/spi_mpc83xx.c   (revision 170)
+++ drivers/spi/spi_mpc83xx.c   (working copy)
@@ -158,6 +158,12 @@
 
if ((mpc83xx_spi->sysclk / spi->max_speed_hz) >= 64) {
u8 pm = mpc83xx_spi->sysclk / (spi->max_speed_hz * 64);
+   if (pm > 0x0f) {
+   printk(KERN_WARNING "MPC83xx SPI: SPICLK can't 
be less then a SYSCLK/1024!\n"
+   "Requested SPICLK is %d Hz. 
Will use %d Hz instead.\n",
+   spi->max_speed_hz, 
mpc83xx_spi->sysclk / 1024);
+   pm = 0x0f;
+   }
regval |= SPMODE_PM(pm) | SPMODE_DIV16;
} else {
u8 pm = mpc83xx_spi->sysclk / (spi->max_speed_hz * 4);

Interrupt-Handling in User Space

2007-05-05 Thread Clifford Wolf

Hi,

afaics there is no mechanism in the official kernel tree to handle
interrupts in user space yet. I'd need that for an embedded project I'm
woking on atm and are so far not sure if I'm going to implement such a
generic interface or just write a simple driver that does the job for my
application. So I'd like to find out what the chances are that such a
feature will get accepted in the official kernel..

Rationale: I'm not a fried of 'device drivers in userland'. But sometimes
one wants to talk to the hardware directly without anything like a 'device
driver' at all (e.g. for hardwaretesting, prototyping and debugging). In
my application I have my hardware block memory mmapped in the application
using /dev/mem to avoid additional copying steps between kernel and
userland. All I need now is an easy way to handle interrupts in my
application.

My plan is to extend drivers/char/mem.c to implement an additional /dev/irq
device file with the following ICOTLs:

IOCTL_IRQ_REGISTER
Integer argument: Interrupt Number
Must be the first ioctl issued after opening the device file. It
registeres an interrupt handler which is associated with the fd
and gets unregisteredwhen the fd is closed.

IOCTL_IRQ_AUTHORITIVE
Argument is ignored
Usually the interrupt handler returns IRQ_NONE. After this ioctl
has been called it returns IRQ_HANDLED instead.

IOCTL_IRQ_FLUSH
Argument is ignored.
Returns the number of interrupts since the last IOCTL_IRQ_FLUSH,
IOCTL_IRQ_WAIT or read() call.

IOCTL_IRQ_WAIT
Argument is ignored.
Returns without blocking if there have been any interrupts since
the last IOCTL_IRQ_FLUSH or IOCTL_IRQ_WAIT call or blocks until
the next interrupt otherwise.
Returns the number of interrupts since the last IOCTL_IRQ_FLUSH,
IOCTL_IRQ_WAIT or read() call.

Reading from the device file has the same effect as calling IOCTL_IRQ_WAIT
when reading in blocking or IOCTL_IRQ_FLUSH when reading in non-blocking
mode. When the read buffer len is 4 or more and there have been any
interrupts the first 4 bytes are filled with the number of interrupts
since the last IOCTL_IRQ_FLUSH, IOCTL_IRQ_WAIT or read() call and 4 is
returned. Using select() or epoll() will also be possible. Reading returns
zero (EOF) when there haven't been any interrupts and the read is
non-blocking.

What do you think? Any arguments against approach? Does anyone else also
feel the need for something like that in the kernel?

yours,
 - clifford

-- 
Once you have something that grows faster than education grows, you're
always going to get a pop culture. -- Alan Kay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: staging: pi433: Possible bug in rf69.c

2017-11-10 Thread Marcus Wolf

Hi Dan,

I checked it on my local SVN. You are right. I submitted the code with '&'.
Accodring to a check-in message on my SVN, there was a bugreport end of
July and most probably a patch - either from me, you, Joseph Wright,
Colin King or Julia Lawall, changing '&' to '|'. I guess the patch for
some reason wasn't accepted, but fortunatley I introduced the change to
my SVN.

So from my point of view, we need a change from '&' to '|'.

I could prepare such a patch, but I am still unsure, which repo to use.

Shortly befor I fell ill, you proposed me to use Gregs staging for my
further development. But Colin yesterday was working on a repo, called
linux-next.

Can you (or anyone else) please tell me, when (or for which kind of
patches) to use the Gregs staging and wehen (or for which kind of
patches) to use the linux-next? Sorry for not being familiar with that
stuff!

Thanks a lot,

Marcus

Am 10.11.2017 um 20:32 schrieb Dan Carpenter:
> On Fri, Nov 10, 2017 at 06:23:32PM +0100, Marcus Wolf wrote:
>> Hi everybody!
>>
>> Just comparing the master of Gregs statging of pi433 with my local SVN
>> to review all changes, that were done the last monthes.
>>
>> I am not sure, but maybe we imported a bug in rf69.c lines 378 and
>> following:
>>
>> Gregs repo:
>>  case automatic:  return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) & 
>> ~MASK_LNA_GAIN) & LNA_GAIN_AUTO) );
>> my repo:
>>  case automatic:  return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) & 
>> ~MASK_LNA_GAIN) | LNA_GAIN_AUTO) );
> 
> I edited the lines for clarity.  The difference is that your repo does
> a bitwise OR "| LNA_GAIN_AUTO" and the kernel.org code does a bitwise
> "& LNA_GAIN_AUTO".
> 
> The kernel repo hasn't changed since you sent us the driver in commit
> 874bcba65f9a ('staging: pi433: New driver').  I agree that & doesn't
> seem to make sense and I'm disapointed that it doesn't cause a Smatch
> warning.
> 
> But LNA_GAIN_AUTO is zero so maybe | BIT(LNA_GAIN_AUTO) was intended
> instead of | LNA_GAIN_AUTO.  I don't know...  No one on this list knows
> the answer probably.  :/
> 
> regards,
> dan caprenter
>

Re: staging: pi433: Possible bug in rf69.c

2017-11-11 Thread Marcus Wolf


Hi Dan,

thanks fot the link. I can't remeber, why and what I wanted to redo. 
Maybe there was a complaint about the format of the patch...


In that patch, we also have the topic with the '>> 3', we were 
discussing a few days ago!


I'd suggest, not to invest the history any more. I'm ok with preparing a 
new patch/new patches, so we can import the fixes.


I also have several improvements for the rf69.c, I'd like to offer.

But I still need to know when to use staging and when to use linux-next.
I don't want to prepare patches for the wrong tree.

Cheers,

Marcus


Am 11.11.2017 um 10:45 schrieb Dan Carpenter:

On Sat, Nov 11, 2017 at 08:55:30AM +0100, Marcus Wolf wrote:

Hi Dan,

I checked it on my local SVN. You are right. I submitted the code with '&'.
Accodring to a check-in message on my SVN, there was a bugreport end of
July and most probably a patch - either from me, you, Joseph Wright,
Colin King or Julia Lawall, changing '&' to '|'. I guess the patch for
some reason wasn't accepted, but fortunatley I introduced the change to
my SVN.


You sent the patch, but then talked about sending a new version so
that's why it wasn't merged.  Greg probably would have merged it as-is
if it hadn't sounded like you were going to redo it.

http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2017-July/108821.html

regards,
dan carpenter

Re: staging: pi433: Possible bug in rf69.c

2017-11-11 Thread Marcus Wolf


Hi Greg,

that's fine.

Is this the right URL:
 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git

Is there already an aprox. date, when 4.15rc1 will be out and 
backintegration will be done?


Thx,

Marcus


Am 11.11.2017 um 13:18 schrieb Greg Kroah-Hartman:

On Sat, Nov 11, 2017 at 11:42:09AM +0200, Marcus Wolf wrote:

Hi Dan,

thanks fot the link. I can't remeber, why and what I wanted to redo. Maybe
there was a complaint about the format of the patch...

In that patch, we also have the topic with the '>> 3', we were discussing a
few days ago!

I'd suggest, not to invest the history any more. I'm ok with preparing a new
patch/new patches, so we can import the fixes.

I also have several improvements for the rf69.c, I'd like to offer.

But I still need to know when to use staging and when to use linux-next.
I don't want to prepare patches for the wrong tree.


I recommend waiting for 4.15-rc1 to come out, all of the different trees
will be merged, and then you can just work off of the staging-next tree
and I can take the patches.

thanks,

greg k-h

Re: staging: pi433: Possible bug in rf69.c

2017-11-11 Thread Marcus Wolf


Hi Greg,

ok.

I'll postpone all my work until then. Give me a hook, when I can start :-)

Thanks,

Marcus


Am 11.11.2017 um 13:49 schrieb Greg Kroah-Hartman:

On Sat, Nov 11, 2017 at 01:42:27PM +0200, Marcus Wolf wrote:

Hi Greg,

that's fine.

Is this the right URL:
  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git


Yes.


Is there already an aprox. date, when 4.15rc1 will be out and
backintegration will be done?


Should be 2 weeks from now.

thanks,

greg k-h

Re: staging: pi433: Possible bug in rf69.c

2017-11-11 Thread Marcus Wolf


Hi Joe,

thank you for your suggestion.

The enums are necessary for the (old fashioned) ioctl interface, too.
So the user space uses these enums in order to configure the driver.
If we want to completely remove rf69_enum.h, we need to find a solution
for that, too.

From the optics/readability, I like your idea with the Macro for the cases.

On the other hand, I have already prepared a patch, that uses setbit, 
resetbit

and readmodifywrite inline fuctions instead of the macros WRITE_REG, ...
That was an idea of Walter Harms in order to increase readability and
reduce macros, because Walter prefers inline functions to macros.

As discussed with Greg, I will provide the patch, as soon as 4.15rc1 is out.
Maybe we should move the discussion to then, so you can have a look to that?

Cheers,

Marcus


Am 11.11.2017 um 18:02 schrieb Joe Perches:

On Fri, 2017-11-10 at 18:23 +0100, Marcus Wolf wrote:

Hi everybody!

Just comparing the master of Gregs statging of pi433 with my local SVN
to review all changes, that were done the last monthes.

I am not sure, but maybe we imported a bug in rf69.c lines 378 and
following:

Gregs repo:
case automatic:  return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) & LNA_GAIN_AUTO) );
case max:return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) & LNA_GAIN_MAX) );
case maxMinus6:  return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) & LNA_GAIN_MAX_MINUS_6) );
case maxMinus12: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) & LNA_GAIN_MAX_MINUS_12) );
case maxMinus24: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) & LNA_GAIN_MAX_MINUS_24) );
case maxMinus36: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) & LNA_GAIN_MAX_MINUS_36) );
case maxMinus48: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) & LNA_GAIN_MAX_MINUS_48) );

my repo:
case automatic:  return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) | LNA_GAIN_AUTO) );
case max:return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) | LNA_GAIN_MAX) );
case maxMinus6:  return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) | LNA_GAIN_MAX_MINUS_6) );
case maxMinus12: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) | LNA_GAIN_MAX_MINUS_12) );
case maxMinus24: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) | LNA_GAIN_MAX_MINUS_24) );
case maxMinus36: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) | LNA_GAIN_MAX_MINUS_36) );
case maxMinus48: return WRITE_REG(REG_LNA, ( (READ_REG(REG_LNA) &
~MASK_LNA_GAIN) | LNA_GAIN_MAX_MINUS_48) );

Up to my opinion, my (old) version is better then Gregs (new) version.
If you agree, I'll prepare a patch, to revert the modification.


There seems to be a lot of enum/#define duplication in this driver.

For instance:

drivers/staging/pi433/rf69_registers.h

#define  LNA_GAIN_AUTO  0x00 /* default */
#define  LNA_GAIN_MAX   0x01
#define  LNA_GA
IN_MAX_MINUS_6  0x02
#define  LNA_GAIN_MAX_MINUS_12
0x03
#define  LNA_GAIN_MAX_MINUS_24  
0x04
#define  LNA_GAIN_MAX_MINUS_36  0x05
#d
efine  LNA_GAIN_MAX_MINUS_480x06

vs

drivers/staging/pi433/rf69_enum.h

enum lnaGain
{
 automatic,
 max,
 maxMinus6,
 maxMinus12,
 maxM
inus24,
 maxMinus36,
 maxMinus48,
 undefined
};

My suggestion would be to remove drivers/staging/pi433/rf69_enum.h
where possible and convert all these switch/case entries into
macros like

#define GAIN_CASE(type) \
case type: return WRITE_REG(REG_LNA,\
(READ_REG(REG_LNA) & ~MASK_LNA_GAIN) | 
(type));

so for example this switch becomes

switch (lnaGain) {
GAIN_CASE(LNA_GAIN_AUTO);
...
}

Re: pi433: initialization of tx config in pi433_open()

2018-06-22 Thread Marcus Wolf

Hi Hugo,

thank you for all your work on Pi433 driver.

For a better understanding some info about Pi433 and the ideas behind it.
Pi433 was developed by me in order to have a simple to mount
CE-compliant 433MHz shield for the Raspberry Pi. I wanted to put it on
sale on the one side and develop a further product for home automation,
using the Pi433.

After three years of development and hard trying to find sales partner,
I almost gave up, since after three years still earn on those topics by
far do not cover the spendings. If nothing changes, I'll have to close
my company at the end of this year.

At a distinct point, the work on trying to sell exceeded the technical
work, so no time was left for the driver development. And now I started
over in freelancing to earn money. That's why all of you nearly hear
nothing from me - very sad about that!

Back to technics:
There was already the idea of equipping the driver with default values.
I see no benefit with setting the default values from the data sheet,
since they do not lead to a good, maybe even not working setup of the
rf69. Idea was to choose settings, that allow to use pipeoperators on
the command sehll for transmitting and receiving telegrams. There was a
longer discussion about one year ago with Marcin Ciupak about that topic.
Most important point from my side was, that the defaults should be in a
way, that CE recommandations are meat. You can find a lot of
configurations, making Pi433 work in a way, that it isn't CE-compliant
any more.
The 4711 sound like just beeing a place holder.
Since - like I told before - I inteded to use Pi433 mainly for OOK/ASK,
my idea was to use an good OOK/ASK setup as default. I could provide you
with a source code, I used the Pi433 with - but I think attachments are
unwanted here.

As far as I can remember, there were some more details to modify on the
driver, to use it with pipes on command line. But I had a rough
implementation. At least send was working... To long ago to remeber :-(


Since it might happen, that Pi433 will go off the market in a few
monthes (tears in my eyes), I think it's a good idea to modify the
driver to become a generic hope-RF driver.
I already did investigations on different hope-RF chips and asked
hope-RF for a little support (e.g. partnership), but they were of
opinion, that they don't need such a driver. It would be easy to cover
up to five/six chips of them - even their brand new LoRaWan-chip. RFM-12
will be a little bit harder, since it works a little bit different.
There were already preparations to support several chips - e. g. by
capsulating the HW layer. But even hard discussions one year ago didn't
help - according to kernel guide lines, it wasn't allowed to keep this
capsulations. So the strict capsulation was broken and some of the
basics for supporting more chips are lost now.
Also on this topic I had several discussions with Marcin Ciupak, how to
realise the support of the next chips.
Maybe you can search the mailing list? If not, I can try to find the
discussions in my inbox.


I would love to help you with these extending topics, but since I am
almost out of money, at the moment there is no margin for complimentary
developments any more :-/

But if you like, I can discuss some stuff with you from time to time.

Thank you so much for your interest in Pi433 and my driver!!

If you need hradware for testing, let me know.

Marcus



Am 22.06.2018 um 04:47 schrieb Hugo Lefeuvre:
> Hi Marcus,
> 
> I'm currently working on the following TODO:
> 
>  966 /* setup instance data*/
>  967 instance->device = device;
>  968 instance->tx_cfg.bit_rate = 4711;
>  969 // TODO: fill instance->tx_cfg;
> 
> If a user calls write() right after open()-ing an instance, the driver
> might try to setup the device with uninitialized garbage. In fact
> nothing really bad will happen because the rf69 interface abstraction
> will filter out wrong values, but this might be a confusing behavior
> for the user.
> 
> What do you think about initializing instance->tx_cfg with the default
> values of the rf69 datasheet[0] ?
> 
> Also, is there a specific reason why you chose 4711 as a default value
> for the bit rate ? I couldn't find it anywhere in the datasheet nor on
> the internet.
> 
> Thanks !
> 
> Regards,
>  Hugo
> 
> [0] http://www.hoperf.com/upload/rf/RFM69CW-V1.1.pdf
>

Re: [PATCH v3] staging: pi433: replace simple switch statements

2018-06-25 Thread marcus . wolf

Reviewed-by: Marcus Wolf 

Thank you Valentin, very nice patch :-)

Valentin Vidic schrieb am 24.06.2018 18:31:

Use const array to map switch cases to resulting values.

Signed-off-by: Valentin Vidic 
---
v2: use correct type for const arrays
v3: add missing static keyword for af_map

 drivers/staging/pi433/rf69.c | 233 ++-
 1 file changed, 93 insertions(+), 140 deletions(-)

diff --git a/drivers/staging/pi433/rf69.c b/drivers/staging/pi433/rf69.c
index 90280e9b006d..341d6901a801 100644
--- a/drivers/staging/pi433/rf69.c
+++ b/drivers/staging/pi433/rf69.c
@@ -111,27 +111,22 @@ static inline int rf69_read_mod_write(struct spi_device
*spi, u8 reg,
 
 int rf69_set_mode(struct spi_device *spi, enum mode mode)
 {
-   switch (mode) {
-   case transmit:
-   return rf69_read_mod_write(spi, REG_OPMODE, MASK_OPMODE_MODE,
-  OPMODE_MODE_TRANSMIT);
-   case receive:
-   return rf69_read_mod_write(spi, REG_OPMODE, MASK_OPMODE_MODE,
-  OPMODE_MODE_RECEIVE);
-   case synthesizer:
-   return rf69_read_mod_write(spi, REG_OPMODE, MASK_OPMODE_MODE,
-  OPMODE_MODE_SYNTHESIZER);
-   case standby:
-   return rf69_read_mod_write(spi, REG_OPMODE, MASK_OPMODE_MODE,
-  OPMODE_MODE_STANDBY);
-   case mode_sleep:
-   return rf69_read_mod_write(spi, REG_OPMODE, MASK_OPMODE_MODE,
-  OPMODE_MODE_SLEEP);
-   default:
+   static const u8 mode_map[] = {
+   [transmit] = OPMODE_MODE_TRANSMIT,
+   [receive] = OPMODE_MODE_RECEIVE,
+   [synthesizer] = OPMODE_MODE_SYNTHESIZER,
+   [standby] = OPMODE_MODE_STANDBY,
+   [mode_sleep] = OPMODE_MODE_SLEEP,
+   };
+
+   if (unlikely(mode >= ARRAY_SIZE(mode_map))) {
dev_dbg(&spi->dev, "set: illegal input param");
return -EINVAL;
}
 
+   return rf69_read_mod_write(spi, REG_OPMODE, MASK_OPMODE_MODE,
+  mode_map[mode]);
+
// we are using packet mode, so this check is not really needed
// but waiting for mode ready is necessary when going from sleep 
because the
FIFO may not be immediately available from previous mode
//while (_mode == RF69_MODE_SLEEP && (READ_REG(REG_IRQFLAGS1) &
RF_IRQFLAGS1_MODEREADY) == 0x00); // Wait for ModeReady
@@ -145,19 +140,19 @@ int rf69_set_data_mode(struct spi_device *spi, u8
data_mode)
 
 int rf69_set_modulation(struct spi_device *spi, enum modulation modulation)
 {
-   switch (modulation) {
-   case OOK:
-   return rf69_read_mod_write(spi, REG_DATAMODUL,
-  MASK_DATAMODUL_MODULATION_TYPE,
-  DATAMODUL_MODULATION_TYPE_OOK);
-   case FSK:
-   return rf69_read_mod_write(spi, REG_DATAMODUL,
-  MASK_DATAMODUL_MODULATION_TYPE,
-  DATAMODUL_MODULATION_TYPE_FSK);
-   default:
+   static const u8 modulation_map[] = {
+   [OOK] = DATAMODUL_MODULATION_TYPE_OOK,
+   [FSK] = DATAMODUL_MODULATION_TYPE_FSK,
+   };
+
+   if (unlikely(modulation >= ARRAY_SIZE(modulation_map))) {
dev_dbg(&spi->dev, "set: illegal input param");
return -EINVAL;
}
+
+   return rf69_read_mod_write(spi, REG_DATAMODUL,
+  MASK_DATAMODUL_MODULATION_TYPE,
+  modulation_map[modulation]);
 }
 
 static enum modulation rf69_get_modulation(struct spi_device *spi)
@@ -373,43 +368,30 @@ int rf69_set_output_power_level(struct spi_device *spi, u8
power_level)
 
 int rf69_set_pa_ramp(struct spi_device *spi, enum pa_ramp pa_ramp)
 {
-   switch (pa_ramp) {
-   case ramp3400:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_3400);
-   case ramp2000:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_2000);
-   case ramp1000:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_1000);
-   case ramp500:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_500);
-   case ramp250:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_250);
-   case ramp125:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_125);
-   case ramp100:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_100);
-   case ramp62:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_62);
-   case ramp50:
-   return rf69_write_reg(spi, REG_PARAMP, PARAMP_50);
-   case ramp40:
-   return rf69_write_reg(spi, REG_PARAMP, PAR

Re: [PATCH v3] staging: pi433: replace simple switch statements

2018-06-25 Thread marcus . wolf

Hi Dan,

I'd like to mention once more, that the idea of the abstraction was to 
support multiple modules of Hope-RF.
If the decision of the "team" of developer of this driver is, that it 
should be reduced to a Pi433 or RFM69CW driver only, I fully agree, 
that the abstraction layer isn't necessary (tough improving readability).

But if the "team" wants to extend the driver - here I explicitly want to 
Name Marcin Ciupak and Hogo Lefeuvre, both were discussing this with me - 
I highly recommend to keep the abstraction layer.

And once again, I have to announce, that - if noone appears, who wants to 
help me with selling Pi433 - I can't effort to let Pi433 on the market 
longer then end of this year. From this Point of view on long term it 
might be senseless to prepare a Pi433-only driver.

Cheers,
Marcus

Dan Carpenter schrieb am 25.06.2018 14:35:
> I'd still prefer if we just removed this abstraction entirely and used
> OPMODE_MODE_TRANSMIT everywhere instead of bringing "transmit" into it.
> 
> I know that every author thinks their abstraction will definitely be
> useful in the future, but generally kernel style is to remove
> abstractions.
> 
> But I guess this code is an improvement over the original so the patch
> is fine.
> 
> regards,
> dan carpenter
> 
>

Re: [PATCH] staging: pi433: remove unnecessary parentheses

2018-01-10 Thread marcus . wolf

Joe Perches schrieb am 10.01.2018 10:05:
> On Wed, 2018-01-10 at 09:44 +0100, Greg Kroah-Hartman wrote:
> > On Tue, Jan 09, 2018 at 11:42:16AM -0800, Joe Perches wrote:
> > > if (a == b && c == d)
> > > is pretty trivial.
> >
> > But again, don't do that.
> 
>  We disagree. Life goes on.
> 
> cheers, Joe
>
>

For me the line above isn't obvious and easy to read. If I would be in doubt, 
whether it really performs correctly, I would have to ask the c-guide, to be 
absolutely shure.
But to be honest: If I need to find a bug arround taht lines, I wouldn't ask 
the c-guide, but simply add some (). Then it would be 100% clear and no one 
would be in doubt any more.

What's the disadvantage of () to emphasise waht is going on. An other Option 
for me would be, to spend a command line and write that info in form of a 
comment.

Just my opinion and the way, I would go on if I am in doubt and need to find a 
bug.

Cheers,

Marcus

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-03-25 Thread Marcus Wolf

Hi Valentin,

> But I have an idea to remove this kfifo_reset call in pi433_write
> handling partial message writes:
> 
>   kfifo_reset(&device->tx_fifo); // TODO: maybe find a solution, not to 
> discard already stored, valid entries
> 
> The writer could acquire the lock and than use kfifo_avail to check if
> there is enough space to write the whole message.  What do you think?

Unfortunaly I can't find the time to have a closer look on the code this
weekend - still busy with tax stuff :-(

Idea sounds great. I'll try to look at the code and think about it
during Easter hollidays.

Cheers,

Marcus

-- 
Smarthome-Wolf UG (haftungsbeschränkt)
Helene-Lange-Weg 23
80637 München
Amtsgericht München, HRB 223529
Umastzsteuer-ID: DE304719911
Geschäftsführer: Marcus Wolf

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-03-25 Thread Marcus Wolf

Hi Valentin,

I am not at home the next two weeks. So I can do a codereading on
Easter, but testing will not take place earlier then mid/end of April :-(

If you are interested, I can provide you an engineering sample of Pi433.

Cheers,

Marcus

Am 25.03.2018 um 15:09 schrieb Valentin Vidic:
> On Sun, Mar 25, 2018 at 03:00:09PM +0200, Marcus Wolf wrote:
>> Unfortunaly I can't find the time to have a closer look on the code this
>> weekend - still busy with tax stuff :-(
>>
>> Idea sounds great. I'll try to look at the code and think about it
>> during Easter hollidays.
> 
> No problem, there is no hurry. But please do test the patch I sent yesterday:
> 
>   [PATCH] staging: pi433: cleanup tx_fifo locking
> 
> As I don't have the hardware this is just compile tested now :)
> 

-- 
Smarthome-Wolf UG (haftungsbeschränkt)
Helene-Lange-Weg 23
80637 München
Amtsgericht München, HRB 223529
Umastzsteuer-ID: DE304719911
Geschäftsführer: Marcus Wolf

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-04-08 Thread Marcus Wolf

Hi Valentin,

The hardware of Pi433 is working with every Raspberry Pi (on zero, you
need to solder the GPIO-pins) and with several other fruits like banana
pi. The only thing is, that you need different versions of the driver,
according to the processor, mounted on your fruit.

If you'd like to test more then ther is no hang up or crash, you will
need a second party. You could use a 433MHz socket for testing TX or a
433 thermometer for testing RX. An example code for communication with a
socket is available on the Pi433 productpage (www.pi433.de). The sample
for the thermometer isn't published yet (as always lack of time).

If you want to test more deeply (using different features of the Rf69
chip or even do some long time testing, you have more options, if you
use to Pis with Pi433.

Just let me know, what you'd like to do and what equipment, you need.

Cheers,

Marcus

Am 25.03.2018 um 16:24 schrieb Valentin Vidic:
> On Sun, Mar 25, 2018 at 03:12:52PM +0200, Marcus Wolf wrote:
>> I am not at home the next two weeks. So I can do a codereading on
>> Easter, but testing will not take place earlier then mid/end of April :-(
>>
>> If you are interested, I can provide you an engineering sample of Pi433.
> 
> Sure, let me know which version of Rpi it supports and if I need two
> to test communication.
>

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-04-08 Thread Marcus Wolf

Hi Valentin,

like promissed, I finally had a deeper look to your proposal with
kfifo_avail and your patch from 24th of March.
In principle, I think it is a very nice idea, and we should try
to implement it.
But there is a snag: There is no guarantee, that kfifo_in will
only fail, if the fifo is full. If there will be any another
reason for kfifo_in to fail, with the new implementation there
will be no handling for that.
But I think the chance of such an situation is low to impossible
and the win in simplicity of the code is really great.

Regarding your patch, I did not understand, why you did not remove
the mutex_lock in pi433_write. Wasn't it the goal to remove it?

Below find a proposal of pi433_write function, I wrote on base
of my outdated (!), private repo. It is not compiled and not tested.
Since there is no more handling in case of an error (as well in the
propsal as in your patch), I removed the error handling completely.
I only do a test to detect proplems while writing to the tx_fifo,
but (like in your patch) do nothing for solving, just printing a line.
If this unexpected situation will occur (most probably never),
the tx_fifo will be (and stay) out of sync until driver gets unloaded.
We have to decide, whether we can stay with that. Like written above,
I thinkt the benefits are great, the chance of such kind of error
very low.
What do you think?

It could be discussed, whether it is better to return EMSGSIZE or
EAGAIN on the first check. On the one hand, there is a problem with
the message size, on the other hand (if message isn't generally too
big) after a while, there should be some more space available in
fifo, so EAGAIN may be better choice.

static ssize_t
pi433_write(struct file *filp, const char __user *buf,
size_t count, loff_t *f_pos)
{
struct pi433_instance   *instance;
struct pi433_device *device;
intrequired, copied, retval;

instance = filp->private_data;
device = instance->device;

/* check whether there is enough space available in tx_fifo */
required = sizeof(instance->tx_cfg) + sizeof(size_t) + count;
if ( num_of_bytes_to_store_in_fifo > kfifo_avail(&device->tx_fifo) ) {
dev_dbg(device->dev, "Not enough space in fifo. Need %d, but 
only have"
   , required
   , 
kfifo_avail(&device->tx_fifo) );
return -EMSGSIZE;
}

/* write the following sequence into fifo:
 * - tx_cfg
 * - size of message
 * - message
 */
retval  = kfifo_in(&device->tx_fifo, &instance->tx_cfg, 
sizeof(instance->tx_cfg));
retval += kfifo_in (&device->tx_fifo, &count, sizeof(size_t));
retval += kfifo_from_user(&device->tx_fifo, buf, count, &copied);

if (retval != required ) {
dev_dbg(device->dev, "write to fifo failed, reason unknown, non 
recoverable.");
return -EAGAIN;
}

/* start transfer */
wake_up_interruptible(&device->tx_wait_queue);
dev_dbg(device->dev, "write: generated new msg with %d bytes.", copied);

return 0;
}

Hope this helps :-)

Marcus

Am 25.03.2018 um 15:09 schrieb Valentin Vidic:
> On Sun, Mar 25, 2018 at 03:00:09PM +0200, Marcus Wolf wrote:
>> Unfortunaly I can't find the time to have a closer look on the code this
>> weekend - still busy with tax stuff :-(
>>
>> Idea sounds great. I'll try to look at the code and think about it
>> during Easter hollidays.
> 
> No problem, there is no hurry. But please do test the patch I sent yesterday:
> 
>   [PATCH] staging: pi433: cleanup tx_fifo locking
> 
> As I don't have the hardware this is just compile tested now :)
>

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-03-23 Thread Marcus Wolf

Hi Valentin,

could you please decribe in short words, why you think, that hte lock
isn't obsolete?

I wasn't sure, but close to remove the lock. That's why I putted the
comment.

Thanks,

Marcus

Am 23.03.2018 um 10:47 schrieb Valentin Vidic:
> Removes TODO for tx_fifo_lock as tx_fifo is modified from
> both pi433_tx_thread and pi433_write.
> 
> Fixes checkpatch warning:
> 
>   CHECK: struct mutex definition without comment
> 
> Signed-off-by: Valentin Vidic 
> ---
>  drivers/staging/pi433/pi433_if.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/pi433/pi433_if.c 
> b/drivers/staging/pi433/pi433_if.c
> index d1e0ddbc79ce..f6f106a3ff8e 100644
> --- a/drivers/staging/pi433/pi433_if.c
> +++ b/drivers/staging/pi433/pi433_if.c
> @@ -87,7 +87,7 @@ struct pi433_device {
>  
>   /* tx related values */
>   STRUCT_KFIFO_REC_1(MSG_FIFO_SIZE) tx_fifo;
> - struct mutextx_fifo_lock; // TODO: check, whether necessary 
> or obsolete
> + struct mutextx_fifo_lock; /* serialize access to tx_fifo */
>   struct task_struct  *tx_task_struct;
>   wait_queue_head_t   tx_wait_queue;
>   u8  free_in_fifo;
> @@ -100,7 +100,7 @@ struct pi433_device {
>   u32 rx_bytes_to_drop;
>   u32 rx_bytes_dropped;
>   unsigned intrx_position;
> - struct mutexrx_lock;
> + struct mutexrx_lock; /* serialize read requests */
>   wait_queue_head_t   rx_wait_queue;
>  
>   /* fifo wait queue */
> 

-- 
Smarthome-Wolf UG (haftungsbeschränkt)
Helene-Lange-Weg 23
80637 München
Amtsgericht München, HRB 223529
Umastzsteuer-ID: DE304719911
Geschäftsführer: Marcus Wolf

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-03-23 Thread Marcus Wolf

Hi Valentin,

I had no time to work on the code for monthes now and the memorisation
of my thoughts when I was programming that (approx. one year ago) is
quite pale.

As far as I remember, I read something, that the fifo has an integrated
protection, so no external protection is needed. But absolutely unsure.

If I will find some time within the next days, I'll have a look at the
code and try to recall.

But the most important thing already took place: We started thinking
about it :-)

Thanks,

Marcus

Am 23.03.2018 um 12:42 schrieb Valentin Vidic:
> On Fri, Mar 23, 2018 at 11:22:39AM +0100, Marcus Wolf wrote:
>> could you please decribe in short words, why you think, that hte lock
>> isn't obsolete?
>>
>> I wasn't sure, but close to remove the lock. That's why I putted the
>> comment.
> 
> Sure, if pi433_tx_thread runs on one CPU it might be possible
> to call pi433_write concurrently on another CPU and they would
> both modify tx_fifo.  But maybe there is some other protection
> in place that would prevent this?
> 

-- 
Smarthome-Wolf UG (haftungsbeschränkt)
Helene-Lange-Weg 23
80637 München
Amtsgericht München, HRB 223529
Umastzsteuer-ID: DE304719911
Geschäftsführer: Marcus Wolf

Re: rf69_set_deviation in rf69.c (pi433 driver)

2018-06-19 Thread Marcus Wolf

Hi Hugo,

sorry for the late response and thank you for all that deep
investigation in the pi433 driver!

> According to the datasheet[0], the deviation should always be smaller
> than 300kHz, and the following equation should be respected:
> 
>   (1) FDA + BRF/2 =< 500 kHz
> 
> Why did you choose 500kHz as max for FDA, instead of 300kHz ? It looks like
> a bug to me.

My focus was always on OOK and ASK. PSK was only used for a few
measurements in the laboratory, I engaged to check CE compliance.
Most probably 500kHz was a value, that's common for PSK and I didn't pay
any attention to the datasheet. So I think, you are right: This is a bug
and could be revised.
Never the less: While using it in the lab, the transmission was fine and
the signals over air have been clean and fitted to the recommondations
of the CE.

> 
> Concerning the TODO, I can see two solutions currently:
> 
> 1. Introduce a new rf69_get_bit_rate function which reads REG_BITRATE_MSB
>and REG_BITRATE_LSB and returns reconstructed BRF.
>We could use this function in rf69_set_deviation to retrieve the BRF.
> 
> + clean
> + intuitive
> - heavy / slow

Why not: I like clean and intuitive implementations. Since it's used
during configuration, we shouldn't be that squeezed in time, that we
need to hurry.
> 
> 2. Store BRF somewhere in rf69.c, initialize it with the default value
>(4.8 kb/s) and update it when rf69_set_bit_rate is called.
> 
> + easy
> - dirty, doesn't fit well with the design of rf69.c (do not store
>   anything, simply expose API)

Up to my experience, storing reg values is always a bit problematic,
since the value may be outdated. And if you update the value every time
you want to use it to be sure, it's correct, there is no win in having
the copy of the reg value.
So this wouldn't be my favourite.
> 
> I'd really prefer going for the first one, but I wanted to have your opinion
> on this.

Agree.

> Thanks for your work !

Your welcome :-)

Marcus

[PATCH 1/2] staging: fbtft: Fix line over 80 characters

2018-07-16 Thread Matthias Wolf

Fix checkpatch line over 80 characters where it seemed appropriate

Signed-off-by: Matthias Wolf 
Signed-off-by: Felix Siegel 
Signed-off-by: Tim Cofala 
---
 drivers/staging/fbtft/fb_s6d02a1.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/fbtft/fb_s6d02a1.c 
b/drivers/staging/fbtft/fb_s6d02a1.c
index 7529576..55513a3 100644
--- a/drivers/staging/fbtft/fb_s6d02a1.c
+++ b/drivers/staging/fbtft/fb_s6d02a1.c
@@ -21,20 +21,27 @@ static const s16 default_init_sequence[] = {
 
-1, 0xfc, 0x5a, 0x5a,
 
-   -1, 0xfa, 0x02, 0x1f, 0x00, 0x10, 0x22, 0x30, 0x38, 0x3A, 0x3A, 0x3A, 
0x3A, 0x3A, 0x3d, 0x02, 0x01,
+   -1, 0xfa, 0x02, 0x1f, 0x00, 0x10, 0x22, 0x30, 0x38,
+   0x3A, 0x3A, 0x3A, 0x3A, 0x3A, 0x3d, 0x02, 0x01,
 
-   -1, 0xfb, 0x21, 0x00, 0x02, 0x04, 0x07, 0x0a, 0x0b, 0x0c, 0x0c, 0x16, 
0x1e, 0x30, 0x3f, 0x01, 0x02,
+   -1, 0xfb, 0x21, 0x00, 0x02, 0x04, 0x07, 0x0a, 0x0b,
+   0x0c, 0x0c, 0x16, 0x1e, 0x30, 0x3f, 0x01, 0x02,
 
/* power setting sequence */
-   -1, 0xfd, 0x00, 0x00, 0x00, 0x17, 0x10, 0x00, 0x01, 0x01, 0x00, 0x1f, 
0x1f,
+   -1, 0xfd, 0x00, 0x00, 0x00, 0x17, 0x10, 0x00, 0x01,
+   0x01, 0x00, 0x1f, 0x1f,
 
-   -1, 0xf4, 0x00, 0x00, 0x00, 0x00, 0x00, 0x3f, 0x3f, 0x07, 0x00, 0x3C, 
0x36, 0x00, 0x3C, 0x36, 0x00,
+   -1, 0xf4, 0x00, 0x00, 0x00, 0x00, 0x00, 0x3f, 0x3f,
+   0x07, 0x00, 0x3C, 0x36, 0x00, 0x3C, 0x36, 0x00,
 
-   -1, 0xf5, 0x00, 0x70, 0x66, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x6d, 0x66, 0x06,
+   -1, 0xf5, 0x00, 0x70, 0x66, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x6d, 0x66, 0x06,
 
-   -1, 0xf6, 0x02, 0x00, 0x3f, 0x00, 0x00, 0x00, 0x02, 0x00, 0x06, 0x01, 
0x00,
+   -1, 0xf6, 0x02, 0x00, 0x3f, 0x00, 0x00, 0x00, 0x02,
+   0x00, 0x06, 0x01, 0x00,
 
-   -1, 0xf2, 0x00, 0x01, 0x03, 0x08, 0x08, 0x04, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x01, 0x00, 0x00, 0x04, 0x08, 0x08,
+   -1, 0xf2, 0x00, 0x01, 0x03, 0x08, 0x08, 0x04, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x04, 0x08, 0x08,
 
-1, 0xf8, 0x11,
 
@@ -54,7 +61,8 @@ static const s16 default_init_sequence[] = {
-1, 0xf3, 0x00, 0x0f,
-2, 50,
 
-   -1, 0xf4, 0x00, 0x04, 0x00, 0x00, 0x00, 0x3f, 0x3f, 0x07, 0x00, 0x3C, 
0x36, 0x00, 0x3C, 0x36, 0x00,
+   -1, 0xf4, 0x00, 0x04, 0x00, 0x00, 0x00, 0x3f, 0x3f,
+   0x07, 0x00, 0x3C, 0x36, 0x00, 0x3C, 0x36, 0x00,
-2, 50,
 
-1, 0xf3, 0x00, 0x1f,
@@ -65,9 +73,11 @@ static const s16 default_init_sequence[] = {
-1, 0xf3, 0x00, 0xff,
-2, 50,
 
-   -1, 0xfd, 0x00, 0x00, 0x00, 0x17, 0x10, 0x00, 0x00, 0x01, 0x00, 0x16, 
0x16,
+   -1, 0xfd, 0x00, 0x00, 0x00, 0x17, 0x10, 0x00, 0x00,
+   0x01, 0x00, 0x16, 0x16,
 
-   -1, 0xf4, 0x00, 0x09, 0x00, 0x00, 0x00, 0x3f, 0x3f, 0x07, 0x00, 0x3C, 
0x36, 0x00, 0x3C, 0x36, 0x00,
+   -1, 0xf4, 0x00, 0x09, 0x00, 0x00, 0x00, 0x3f, 0x3f,
+   0x07, 0x00, 0x3C, 0x36, 0x00, 0x3C, 0x36, 0x00,
 
/* initializing sequence */
 
-- 
2.7.4

[PATCH 2/2] staging: fbtft: Fix line continuation

2018-07-16 Thread Matthias Wolf

Fix checkpatch warning: avoid unnecessary line continuation
to allow grepping of whole error message.

Signed-off-by: Matthias Wolf 
Signed-off-by: Felix Siegel 
Signed-off-by: Tim Cofala 
---
 drivers/staging/fbtft/fb_ssd1351.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/fbtft/fb_ssd1351.c 
b/drivers/staging/fbtft/fb_ssd1351.c
index b8ef75f..1b92691 100644
--- a/drivers/staging/fbtft/fb_ssd1351.c
+++ b/drivers/staging/fbtft/fb_ssd1351.c
@@ -126,16 +126,14 @@ static int set_gamma(struct fbtft_par *par, u32 *curves)
for (i = 0; i < 63; i++) {
if (i > 0 && curves[i] < 2) {
dev_err(par->info->device,
-   "Illegal value in Grayscale Lookup Table at 
index %d. " \
-   "Must be greater than 1\n", i);
+   "Illegal value in Grayscale Lookup Table at 
index %d. Must be greater than 1\n", i);
return -EINVAL;
}
acc += curves[i];
tmp[i] = acc;
if (acc > 180) {
dev_err(par->info->device,
-   "Illegal value(s) in Grayscale Lookup Table. " \
-   "At index=%d, the accumulated value has 
exceeded 180\n", i);
+   "Illegal value(s) in Grayscale Lookup Table. At 
index=%d, the accumulated value has exceeded 180\n", i);
return -EINVAL;
}
}
-- 
2.7.4

Der Ausweg für Ihr Fin anz pro ble m ist ein Ausl andsk redi t

2014-03-15 Thread Joanna Wolf

Hallo, sind Sie ansprechbar auf eine wirklich gute Nachricht?

will Ihre Bank Ihnen auch keinen Kr ed it mehr geben? 
Lassen Sie sich das nicht bieten, denn es gibt auch kulante Banken.
Auch bei laufenden Verpflichtungen und bereits von Ihrer eigenen Hausbank 
abgelehntem Antrag können wir Ihnen ein Darlehen vermitteln zu günstigen 
Konditionen  (z.B. ab 3,2% für Konsumentenkredite, ab 1,8% Zinsen p.a. für 
Grundbesitz und ab 3,1% für Firmen).
Durch die extrem langen Laufzeiten von 10 bzw. 20 Jahren belasten Sie die mtl. 
Zahlungen nicht großartig. 

Ihren Kredit bekommen Sie jetzt aus Lateinamerika ohne viel Formularkram, ohne 
Bankauskunft und schnell. Unsere Kooperationspartner sind mehrere Banken, und 
wir erledigen für Sie alle Formalitäten. Also ideal für Jungunternehmer, 
Angestellte und Rentner. 

Auf diese Weise lösen Sie Ihr Geldproblem.


Alles weitere auf unserer Seite:  RapidCredito und fügen Sie an co m

Mit freundlichem Gruß 

Anton Fischer

Kundenberater




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-04-19 Thread Marcus Wolf


Am 12.04.2018 um 19:15 schrieb Valentin Vidic:
> On Sun, Apr 08, 2018 at 04:15:30PM +0200, Marcus Wolf wrote:
>> The hardware of Pi433 is working with every Raspberry Pi (on zero, you
>> need to solder the GPIO-pins) and with several other fruits like banana
>> pi. The only thing is, that you need different versions of the driver,
>> according to the processor, mounted on your fruit.
>>
>> If you'd like to test more then ther is no hang up or crash, you will
>> need a second party. You could use a 433MHz socket for testing TX or a
>> 433 thermometer for testing RX. An example code for communication with a
>> socket is available on the Pi433 productpage (www.pi433.de). The sample
>> for the thermometer isn't published yet (as always lack of time).
>>
>> If you want to test more deeply (using different features of the Rf69
>> chip or even do some long time testing, you have more options, if you
>> use to Pis with Pi433.
>>
>> Just let me know, what you'd like to do and what equipment, you need.
> 
> At the moment I have Rpi2 but not any other 433MHz equipment, so I
> could only do some basic testing unfortunately.  In case I get the
> new Rpi3 some time soon I would be able to do something better.
> 
Hi Valentin,



let me know, what you like to have. For sure with just one station and
no other 433MHz equipment, options for testing are quite limited.



Marcus

-- 
Smarthome-Wolf UG (haftungsbeschränkt)
Helene-Lange-Weg 23
80637 München
Amtsgericht München, HRB 223529
Umastzsteuer-ID: DE304719911
Geschäftsführer: Marcus Wolf

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-04-19 Thread Marcus Wolf



Am 12.04.2018 um 20:46 schrieb Valentin Vidic:
> On Sun, Apr 08, 2018 at 05:22:46PM +0200, Marcus Wolf wrote:
>> Regarding your patch, I did not understand, why you did not remove
>> the mutex_lock in pi433_write. Wasn't it the goal to remove it?
> 
> Is it possible for more than one userspace program to open the pi433
> device and send messages?  In that case more than one pi433_write
> could be running and it needs to hold a mutex_lock before calling
> kfifo_in.

You are right. The write function is open for multiple userspace programs.
So mutex_lock schould remain.

>> Below find a proposal of pi433_write function, I wrote on base
>> of my outdated (!), private repo. It is not compiled and not tested.
>> Since there is no more handling in case of an error (as well in the
>> propsal as in your patch), I removed the error handling completely.
>> I only do a test to detect proplems while writing to the tx_fifo,
>> but (like in your patch) do nothing for solving, just printing a line.
>> If this unexpected situation will occur (most probably never),
>> the tx_fifo will be (and stay) out of sync until driver gets unloaded.
>> We have to decide, whether we can stay with that. Like written above,
>> I thinkt the benefits are great, the chance of such kind of error
>> very low.
>> What do you think?
> 
> Yes, if there is only one writer and it checks the available size,
> kfifo_in should not fail.  The only problem might be copy_from_user
> but perhaps that is also quite unlikely.  A workaround for that could
> be to copy the data into a temporary kernel buffer first and than
> start kfifo writes using only kernel memory.

For my feeling, that's a safe solution but most probably oversized.

>> It could be discussed, whether it is better to return EMSGSIZE or
>> EAGAIN on the first check. On the one hand, there is a problem with
>> the message size, on the other hand (if message isn't generally too
>> big) after a while, there should be some more space available in
>> fifo, so EAGAIN may be better choice.
> 
> EAGAIN does seem better unless the message is too big to ever fit
> in the kfifo.
> 
>>  if (retval != required ) {
>>  dev_dbg(device->dev, "write to fifo failed, reason unknown, non 
>> recoverable.");
>>  return -EAGAIN;
>>  }
> 
> Maybe this should be dev_warn or even dev_crit if the driver is not
> usable anymore when this happens?  The error message should than also
> be adjusted to EBADF or something similar.

>From my point of veiew, the warning is (in combination of the
size-check) the by far better solution then the fifo reset.

So all in all, I think, we should go with your proposal.

Unfortunaly still no time to setup my test environment with a current
version of the driver in order to give it a try :-(
Sorry,

Marcus

Re: [PATCH] staging: pi433: add descriptions for mutex locks

2018-04-19 Thread Marcus Wolf

Hi!

So if you like, give me your address, and I'll send you two development
samples of Pi433.

Cheers,

Marcus

Am 19.04.2018 um 11:47 schrieb Valentin Vidic:
> On Thu, Apr 19, 2018 at 11:25:19AM +0200, Marcus Wolf wrote:
>> let me know, what you like to have. For sure with just one station and
>> no other 433MHz equipment, options for testing are quite limited.
> 
> I can get Rpi3 and with two shields test 433MHz communication between
> Rpi2 and Rpi3.
> 

-- 
Smarthome-Wolf UG (haftungsbeschränkt)
Helene-Lange-Weg 23
80637 München
Amtsgericht München, HRB 223529
Umastzsteuer-ID: DE304719911
Geschäftsführer: Marcus Wolf

Re: [PATCH v2] staging: pi433: cleanup tx_fifo locking

2018-04-19 Thread Marcus Wolf

Reviewed-by: Marcus Wolf 

Am 19.04.2018 um 12:25 schrieb Valentin Vidic:
> pi433_write requires locking due to multiple writers.  After acquiring
> the lock check if enough free space is available in the kfifo to write
> the whole message. This check should prevent partial writes to tx_fifo
> so kfifo_reset is not needed anymore.
> 
> pi433_tx_thread is the only reader so it does not require locking
> after kfifo_reset is removed.
> 
> Signed-off-by: Valentin Vidic 
> ---
> v2: print a warning if partial fifo write happens
> 
>  drivers/staging/pi433/pi433_if.c | 23 ++-
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/staging/pi433/pi433_if.c 
> b/drivers/staging/pi433/pi433_if.c
> index d1e0ddbc79ce..2a05eff88469 100644
> --- a/drivers/staging/pi433/pi433_if.c
> +++ b/drivers/staging/pi433/pi433_if.c
> @@ -87,7 +87,7 @@ struct pi433_device {
>  
>   /* tx related values */
>   STRUCT_KFIFO_REC_1(MSG_FIFO_SIZE) tx_fifo;
> - struct mutextx_fifo_lock; // TODO: check, whether necessary 
> or obsolete
> + struct mutextx_fifo_lock; /* serialize userspace writers */
>   struct task_struct  *tx_task_struct;
>   wait_queue_head_t   tx_wait_queue;
>   u8  free_in_fifo;
> @@ -589,19 +589,15 @@ pi433_tx_thread(void *data)
>* - size of message
>* - message
>*/
> - mutex_lock(&device->tx_fifo_lock);
> -
>   retval = kfifo_out(&device->tx_fifo, &tx_cfg, sizeof(tx_cfg));
>   if (retval != sizeof(tx_cfg)) {
>   dev_dbg(device->dev, "reading tx_cfg from fifo failed: 
> got %d byte(s), expected %d", retval, (unsigned int)sizeof(tx_cfg));
> - mutex_unlock(&device->tx_fifo_lock);
>   continue;
>   }
>  
>   retval = kfifo_out(&device->tx_fifo, &size, sizeof(size_t));
>   if (retval != sizeof(size_t)) {
>   dev_dbg(device->dev, "reading msg size from fifo 
> failed: got %d, expected %d", retval, (unsigned int)sizeof(size_t));
> - mutex_unlock(&device->tx_fifo_lock);
>   continue;
>   }
>  
> @@ -634,7 +630,6 @@ pi433_tx_thread(void *data)
>  sizeof(device->buffer) - position);
>   dev_dbg(device->dev,
>   "read %d message byte(s) from fifo queue.", retval);
> - mutex_unlock(&device->tx_fifo_lock);
>  
>   /* if rx is active, we need to interrupt the waiting for
>* incoming telegrams, to be able to send something.
> @@ -818,7 +813,7 @@ pi433_write(struct file *filp, const char __user *buf,
>   struct pi433_instance   *instance;
>   struct pi433_device *device;
>   int retval;
> - unsigned intcopied;
> + unsigned intrequired, available, copied;
>  
>   instance = filp->private_data;
>   device = instance->device;
> @@ -833,6 +828,16 @@ pi433_write(struct file *filp, const char __user *buf,
>* - message
>*/
>   mutex_lock(&device->tx_fifo_lock);
> +
> + required = sizeof(instance->tx_cfg) + sizeof(size_t) + count;
> + available = kfifo_avail(&device->tx_fifo);
> + if (required > available) {
> + dev_dbg(device->dev, "write to fifo failed: %d bytes required 
> but %d available",
> + required, available);
> + mutex_unlock(&device->tx_fifo_lock);
> + return -EAGAIN;
> + }
> +
>   retval = kfifo_in(&device->tx_fifo, &instance->tx_cfg,
> sizeof(instance->tx_cfg));
>   if (retval != sizeof(instance->tx_cfg))
> @@ -855,8 +860,8 @@ pi433_write(struct file *filp, const char __user *buf,
>   return copied;
>  
>  abort:
> - dev_dbg(device->dev, "write to fifo failed: 0x%x", retval);
> - kfifo_reset(&device->tx_fifo); // TODO: maybe find a solution, not to 
> discard already stored, valid entries
> + dev_warn(device->dev,
> +  "write to fifo failed, non recoverable: 0x%x", retval);
>   mutex_unlock(&device->tx_fifo_lock);
>   return -EAGAIN;
>  }
>

Re: Submit of a driver for Pi433 - a radio module for Raspberry Pi

2017-07-15 Thread Marcus Wolf

Hi!

It's me again. Seems like I also need help on sending the email :-/

I checked the whitespace/line wrap problem, but couldn't find any suspicious
lines.

What I did:
* Looked into my outbox - the copy of my mail to you seems to be okay...
* I sent the patch once again (just to me) - result: Seems to be fine, too.

Maybe you can bounce back my mail? Maybe I'll get an idea what went wrong, if I
see the smashed code...

Cheers,

Marcus

> Greg KH  hat am 15. Juli 2017 um 13:24
> geschrieben:
>
>
> On Sat, Jul 15, 2017 at 01:15:43PM +0200, Marcus Wolf wrote:
> > Hi Greg,
> >
> > thanks for your reply :-)
> >
> > Today I moved the documentation and header files to drivers/staging/pi433
> > and
> > fromated it as a single patch.
> >
> > Cheers,
> >
> > Marcus
>
> I still need it in a format I can apply it in (i.e. proper subject,
> changelog text, signed-off-by, etc.) Can you do that?
>
> Also, for staging drivers, I need a TODO file, much like the existing
> drivers/staging/*/TODO files saying what needs to be done with the code
> in order to get it out of staging.
>
> And finally, your patch seemed to have the whitespace corrupted and was
> line-wrapped, making it impossible to apply even if I could have :(
>
> thanks,
>
> greg k-h
>

Re: Submit of a driver for Pi433 - a radio module for Raspberry Pi

2017-07-15 Thread Marcus Wolf

Hi!
 
Ok. Thanks for the info. Didn't observe that.
 I'll give the old patch another try.
 
I will send a new patch as soon, as I know. how to produce the 
poper format (with subjects, signing and so on).

Cheers,

Marcus

 
 
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 268d4e6..fdf060c 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -110,4 +110,6 @@ source "drivers/staging/ccree/Kconfig"
 
 source "drivers/staging/typec/Kconfig"
 
+source "drivers/staging/pi433/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index b93e6f5..998f644 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_KS7010)+= ks7010/
 obj-$(CONFIG_GREYBUS)+= greybus/
 obj-$(CONFIG_BCM2835_VCHIQ)+= vc04_services/
 obj-$(CONFIG_CRYPTO_DEV_CCREE)+= ccree/
+obj-$(CONFIG_PI433)+= pi433/
diff --git a/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts
b/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts
new file mode 100644
index 000..004b502
--- /dev/null
+++ b/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts
@@ -0,0 +1,53 @@
+// Definitions for Pi433
+/dts-v1/;
+/plugin/;
+
+/ {
+compatible = "bcm,bcm2835", "bcm,bcm2708", "bcm,bcm2709";
+
+fragment@0 {
+target = <&spi0>;
+__overlay__ {
+status = "okay";
+
+spidev@0{
+status = "disabled";
+};
+
+spidev@1{
+status = "disabled";
+};
+};
+};
+
+fragment@1 {
+target = <&gpio>;
+__overlay__ {
+pi433_pins: pi433_pins {
+brcm,pins = <7 25 24>;
+brcm,function = <0 0 0>; // in in in
+};
+};
+};
+
+fragment@2 {
+target = <&spi0>;
+__overlay__ {
+#address-cells = <1>;
+#size-cells = <0>;
+status = "okay";
+
+pi433: pi433@0 {
+compatible = "Smarthome-Wolf,pi433";
+reg = <0>;
+spi-max-frequency = <1000>;
+status = "okay";
+
+pinctrl-0 = <&pi433_pins>;
+DIO0-gpio = <&gpio 24 0>;
+DIO1-gpio = <&gpio 25 0>;
+DIO2-gpio = <&gpio  7 0>;
+};
+};
+};
+};
diff --git a/drivers/staging/pi433/Documentation/devicetree/pi433.txt
b/drivers/staging/pi433/Documentation/devicetree/pi433.txt
new file mode 100644
index 000..14197c6
--- /dev/null
+++ b/drivers/staging/pi433/Documentation/devicetree/pi433.txt
@@ -0,0 +1,62 @@
+* Smarthome-Wolf Pi433 - a 433MHz radio module/shield for Raspberry Pi (see
www.pi433.de)
+
+Required properties:
+- compatible: must be "Smarthome-Wolf,pi433"
+- reg: chip select of SPI Interface
+- DIOx-gpio must be dedicated to the GPIO, connected with DIOx of the RFM69
module
+
+
+Example:
+
+With the following lines in gpio-section, the gpio pins, connected with pi433
are
+reserved/declared.
+
+&gpio{
+[...]
+
+pi433_pins: pi433_pins {
+brcm,pins = <7 25 24>;
+brcm,function = <0 0 0>; // in in in
+};
+
+[...]
+}
+
+With the following lines in spi section, the device pi433 is declared.
+It consists of the three gpio pins and an spi interface (here chip select 0)
+
+&spi0{
+[...]
+
+pi433: pi433@0 {
+compatible = "Smarthome-Wolf,pi433";
+reg = <0>; /* CE 0 */
+#address-cells = <1>;
+#size-cells = <0>;
+spi-max-frequency = <1000>;
+
+pinctrl-0 = <&pi433_pins>;
+DIO0-gpio = <&gpio 24 0>;
+DIO1-gpio = <&gpio 25 0>;
+DIO2-gpio = <&gpio  7 0>;
+};
+}
+
+
+
+For Raspbian users only
+===
+Since Raspbian supports device tree overlays, you may use and overlay, instead
+of editing your boards device tree.
+For using the overlay, you need to compile the file pi433-overlay.dts you can
+find aside to this documentation.
+The file needs to be compiled - either manually or by integration in your
kernel
+source tree. For a manual compile, you may use a command line like the
following:
+'linux/scripts/dtc/dtc -@ -I dts -O dtb -o pi433.dtbo pi433-overlay.dts'
+
+For compiling inside of the kernel tree, you need to copy pi433-overlay.dts to
+arch/arm/boot/dts/overlays and you need to add the file to the list of files
+in the Makefile over there. Execute '

Re: Submit of a driver for Pi433 - a radio module for Raspberry Pi

2017-07-15 Thread Wolf Entwicklungen

Am Sa, 15.07.2017, 15:47 schrieb Greg KH:
> On Sat, Jul 15, 2017 at 03:40:30PM +0200, Marcus Wolf wrote:
>> Hi Greg,
>>

Hi Greg,

now I added a TODO file and did a manual patchwork with lines of the old patch
(git format-patch master --stdout -p > pi433_patch) and the newer patch (git
diff master > pi433_patch). Stil don't know how to retreive multiple commits
in one single patch directly from git :-/
I did my best to meet a lot of rules of the link, you send me.If
there is stil something essential wrong or missing, please excuse and let me 
know.

Especially I was in doubt about the verbosity of the change log. If you'd like
to see the list of features of the driver over there (please see my first mail),
let me know and I'll add.

Sorry for posting in HTML once again on last mail. Now I changed the frontend 
of my
mailtool. Hope this works better. If not, excuse and let me know.
I Think about moving to gits mail function for my next patches.

Cheers,

Marcus


From: Marcus Wolf 
Date: Tue,15 Jul 2017 17:52:06 +0100
Subject: [PATCH 1/1] drivers/staging/pi433: New driver

Added a driver for the pi433 radio module
(see https://www.pi433.de/en.html for details).
Signed-off-by: Marcus Wolf 
Tested-by: Marcus Wolf  on Raspbian, running 
Kernel v4.12
---
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 268d4e6..fdf060c 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -110,4 +110,6 @@ source "drivers/staging/ccree/Kconfig"

 source "drivers/staging/typec/Kconfig"

+source "drivers/staging/pi433/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index b93e6f5..998f644 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_KS7010)  += ks7010/
 obj-$(CONFIG_GREYBUS)  += greybus/
 obj-$(CONFIG_BCM2835_VCHIQ)+= vc04_services/
 obj-$(CONFIG_CRYPTO_DEV_CCREE) += ccree/
+obj-$(CONFIG_PI433)+= pi433/
diff --git a/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts 
b/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts
new file mode 100644
index 000..004b502
--- /dev/null
+++ b/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts
@@ -0,0 +1,53 @@
+// Definitions for Pi433
+/dts-v1/;
+/plugin/;
+
+/ {
+compatible = "bcm,bcm2835", "bcm,bcm2708", "bcm,bcm2709";
+
+fragment@0 {
+target = <&spi0>;
+__overlay__ {
+status = "okay";
+
+spidev@0{
+status = "disabled";
+};
+
+spidev@1{
+status = "disabled";
+};
+};
+};
+
+   fragment@1 {
+   target = <&gpio>;
+   __overlay__ {
+   pi433_pins: pi433_pins {
+   brcm,pins = <7 25 24>;
+   brcm,function = <0 0 0>; // in in in
+   };
+   };
+   };
+
+   fragment@2 {
+   target = <&spi0>;
+   __overlay__ {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   status = "okay";
+
+   pi433: pi433@0 {
+   compatible = "Smarthome-Wolf,pi433";
+   reg = <0>;
+   spi-max-frequency = <1000>;
+   status = "okay";
+
+   pinctrl-0 = <&pi433_pins>;
+   DIO0-gpio = <&gpio 24 0>;
+   DIO1-gpio = <&gpio 25 0>;
+   DIO2-gpio = <&gpio  7 0>;
+   };
+   };
+   };
+};
diff --git a/drivers/staging/pi433/Documentation/devicetree/pi433.txt 
b/drivers/staging/pi433/Documentation/devicetree/pi433.txt
new file mode 100644
index 000..14197c6
--- /dev/null
+++ b/drivers/staging/pi433/Documentation/devicetree/pi433.txt
@@ -0,0 +1,62 @@
+* Smarthome-Wolf Pi433 - a 433MHz radio module/shield for Raspberry Pi (see 
www.pi433.de)
+
+Required properties:
+- compatible: must be "Smarthome-Wolf,pi433"
+- reg: chip select of SPI Interface
+- DIOx-gpio must be dedicated to the GPIO, connected with DIOx of the RFM69 
module
+
+
+Example:
+
+With the following lines in gpio-section, the gpio pins, connected with pi433 
are
+reserved/declared.
+
+&gpio{
+   [...]
+
+   pi433_pins: pi433_pins {
+   brcm,pins = <7 25 24>;
+   brcm,function = <0 0 0>; // in i

Re: [PATCH 1/1] drivers/staging/pi433: New driver (fwd)

2017-07-16 Thread Marcus Wolf

Hi Julia,

thank you very much for your investigation!

I totally agree - on line 894 device->dev must not be used. Any suggestions? Is
there a kind of generic value, I can pass as first argument of dev_dbg() in such
cases? Or switch to printk()?

Regarding second bug:
line 688 modified as follows:
if (bytes_received > 0)
and line 654 modified as follows:
int bytes_received;
The fix will already be included in my next try, to send a well formatted patch
to Greg.

the position of the brackets is wrong throughout the complete source. The source
was developed against an other style guide. It's one of the todos to adjust the
formatting that it fully complies with the kernel coding style guide.

Cheers,

Marcus


> Julia Lawall  hat am 15. Juli 2017 um 22:50 geschrieben:
>
>
> Please check on lines 894 and 688 (not shown) for the issues mentioned
> below.
>
> Note that the ifs and {s in the following code snippet don't always follow
> the kernel coding style, eg on line 901.
>
> julia
>
> -- Forwarded message --
> Date: Sun, 16 Jul 2017 04:35:49 +0800
> From: kbuild test robot 
> To: kbu...@01.org
> Cc: Julia Lawall 
> Subject: Re: [PATCH 1/1] drivers/staging/pi433: New driver
>
> Hi Wolf,
>
> [auto build test WARNING on staging/staging-testing]
> [also build test WARNING on v4.12 next-20170714]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Wolf-Entwicklungen/drivers-staging-pi433-New-driver/20170716-021625
> :: branch date: 2 hours ago
> :: commit date: 2 hours ago
>
> >> drivers/staging/pi433/pi433_if.c:894:18-21: ERROR: device is NULL but
> >> dereferenced.
> --
> >> drivers/staging/pi433/pi433_if.c:688:5-19: WARNING: Unsigned expression
> >> compared with zero: bytes_received >= 0
>
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 6b5d85fc273ec7c19addf7770155414da647de7e
> vim +894 drivers/staging/pi433/pi433_if.c
>
> 6b5d85fc Wolf Entwicklungen 2017-07-15 883
> 6b5d85fc Wolf Entwicklungen 2017-07-15 884 static int pi433_open(struct inode
> *inode, struct file *filp)
> 6b5d85fc Wolf Entwicklungen 2017-07-15 885 {
> 6b5d85fc Wolf Entwicklungen 2017-07-15 886 struct pi433_device *device;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 887 struct pi433_instance *instance;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 888
> 6b5d85fc Wolf Entwicklungen 2017-07-15 889 mutex_lock(&minor_lock);
> 6b5d85fc Wolf Entwicklungen 2017-07-15 890 device = idr_find(&pi433_idr,
> iminor(inode));
> 6b5d85fc Wolf Entwicklungen 2017-07-15 891
> 6b5d85fc Wolf Entwicklungen 2017-07-15 892 mutex_unlock(&minor_lock);
> 6b5d85fc Wolf Entwicklungen 2017-07-15 893 if (!device) {
> 6b5d85fc Wolf Entwicklungen 2017-07-15 @894 dev_dbg(device->dev, "device:
> minor %d unknown.\n", iminor(inode));
> 6b5d85fc Wolf Entwicklungen 2017-07-15 895 return -ENODEV;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 896 }
> 6b5d85fc Wolf Entwicklungen 2017-07-15 897
> 6b5d85fc Wolf Entwicklungen 2017-07-15 898 if (!device->rx_buffer) {
> 6b5d85fc Wolf Entwicklungen 2017-07-15 899 device->rx_buffer =
> kmalloc(MAX_MSG_SIZE, GFP_KERNEL);
> 6b5d85fc Wolf Entwicklungen 2017-07-15 900 if (!device->rx_buffer)
> 6b5d85fc Wolf Entwicklungen 2017-07-15 901 {
> 6b5d85fc Wolf Entwicklungen 2017-07-15 902 dev_dbg(device->dev,
> "open/ENOMEM\n");
> 6b5d85fc Wolf Entwicklungen 2017-07-15 903 return -ENOMEM;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 904 }
> 6b5d85fc Wolf Entwicklungen 2017-07-15 905 }
> 6b5d85fc Wolf Entwicklungen 2017-07-15 906
> 6b5d85fc Wolf Entwicklungen 2017-07-15 907 device->users++;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 908 instance =
> kzalloc(sizeof(*instance), GFP_KERNEL);
> 6b5d85fc Wolf Entwicklungen 2017-07-15 909 if (!instance)
> 6b5d85fc Wolf Entwicklungen 2017-07-15 910 {
> 6b5d85fc Wolf Entwicklungen 2017-07-15 911 kfree(device->rx_buffer);
> 6b5d85fc Wolf Entwicklungen 2017-07-15 912 device->rx_buffer = NULL;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 913 return -ENOMEM;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 914 }
> 6b5d85fc Wolf Entwicklungen 2017-07-15 915
> 6b5d85fc Wolf Entwicklungen 2017-07-15 916 /* setup instance data*/
> 6b5d85fc Wolf Entwicklungen 2017-07-15 917 instance->device = device;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 918 instance->tx_cfg.bit_rate = 4711;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 919 // TODO: fill instance->tx_cfg;
> 6b5d85fc Wolf Entwicklungen 2017-07-15 920
> 6b5d85fc Wolf Entwicklungen 2017-07-15 921 /* instance data as

[PATCH 1/1] drivers/staging/pi433: New driver

2017-07-16 Thread Wolf Entwicklungen

From: Marcus Wolf 
Date: Tue,16 Jul 2017 11:52:06 +0100
Subject: [PATCH 1/1] drivers/staging/pi433: New driver

Added a driver for the pi433 radio module
(see https://www.pi433.de/en.html for details).

Signed-off-by: Marcus Wolf 
---
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 268d4e6..fdf060c 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -110,4 +110,6 @@ source "drivers/staging/ccree/Kconfig"

 source "drivers/staging/typec/Kconfig"

+source "drivers/staging/pi433/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index b93e6f5..998f644 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_KS7010)  += ks7010/
 obj-$(CONFIG_GREYBUS)  += greybus/
 obj-$(CONFIG_BCM2835_VCHIQ)+= vc04_services/
 obj-$(CONFIG_CRYPTO_DEV_CCREE) += ccree/
+obj-$(CONFIG_PI433)+= pi433/
diff --git a/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts 
b/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts
new file mode 100644
index 000..004b502
--- /dev/null
+++ b/drivers/staging/pi433/Documentation/devicetree/pi433-overlay.dts
@@ -0,0 +1,53 @@
+// Definitions for Pi433
+/dts-v1/;
+/plugin/;
+
+/ {
+compatible = "bcm,bcm2835", "bcm,bcm2708", "bcm,bcm2709";
+
+fragment@0 {
+target = <&spi0>;
+__overlay__ {
+status = "okay";
+
+spidev@0{
+status = "disabled";
+};
+
+spidev@1{
+status = "disabled";
+};
+};
+};
+
+   fragment@1 {
+   target = <&gpio>;
+   __overlay__ {
+   pi433_pins: pi433_pins {
+   brcm,pins = <7 25 24>;
+   brcm,function = <0 0 0>; // in in in
+   };
+   };
+   };
+
+   fragment@2 {
+   target = <&spi0>;
+   __overlay__ {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   status = "okay";
+
+   pi433: pi433@0 {
+   compatible = "Smarthome-Wolf,pi433";
+   reg = <0>;
+   spi-max-frequency = <1000>;
+   status = "okay";
+
+   pinctrl-0 = <&pi433_pins>;
+   DIO0-gpio = <&gpio 24 0>;
+   DIO1-gpio = <&gpio 25 0>;
+   DIO2-gpio = <&gpio  7 0>;
+   };
+   };
+   };
+};
diff --git a/drivers/staging/pi433/Documentation/devicetree/pi433.txt 
b/drivers/staging/pi433/Documentation/devicetree/pi433.txt
new file mode 100644
index 000..14197c6
--- /dev/null
+++ b/drivers/staging/pi433/Documentation/devicetree/pi433.txt
@@ -0,0 +1,62 @@
+* Smarthome-Wolf Pi433 - a 433MHz radio module/shield for Raspberry Pi (see 
www.pi433.de)
+
+Required properties:
+- compatible: must be "Smarthome-Wolf,pi433"
+- reg: chip select of SPI Interface
+- DIOx-gpio must be dedicated to the GPIO, connected with DIOx of the RFM69 
module
+
+
+Example:
+
+With the following lines in gpio-section, the gpio pins, connected with pi433 
are
+reserved/declared.
+
+&gpio{
+   [...]
+
+   pi433_pins: pi433_pins {
+   brcm,pins = <7 25 24>;
+   brcm,function = <0 0 0>; // in in in
+   };
+
+   [...]
+}
+
+With the following lines in spi section, the device pi433 is declared.
+It consists of the three gpio pins and an spi interface (here chip select 0)
+
+&spi0{
+   [...]
+
+   pi433: pi433@0 {
+   compatible = "Smarthome-Wolf,pi433";
+   reg = <0>; /* CE 0 */
+   #address-cells = <1>;
+   #size-cells = <0>;
+   spi-max-frequency = <1000>;
+
+   pinctrl-0 = <&pi433_pins>;
+   DIO0-gpio = <&gpio 24 0>;
+   DIO1-gpio = <&gpio 25 0>;
+   DIO2-gpio = <&gpio  7 0>;
+   };
+}
+
+
+
+For Raspbian users only
+===
+Since Raspbian supports device tree overlays, you may use and overlay, instead
+of editing your boards device tree.
+For using the overlay, you need to compile the file pi433-overlay.dts you can
+find aside to this documentation.
+The file needs to be compiled - either manually or by integration in

1 2 >

1 - 100 of 199 matches

Mail list logo