Великденски бонуси

2019-05-17 Thread Radoslav Dobrev
Здравейте!

Нуждаете ли се от мотивационен пакет за персонала, който е удобен и 
привлекателен, както за работодателя, така и за служителите?

В такъв случай Ви препоръчваме да обмислите използването на все по-популярните 
ваучери за храна - работодателят осигурява ваучери за определена сума, а 
служителят може да я използва в различни вериги хранителни магазини или 
заведения за хранене според своите предпочитания.

Ще се радвам да Ви представя възможностите на ваучерите  – мога ли да Ви се 
обадя, за да обсъдим в детайли?


Радослав Добрев
Head of HR Benefit Team
www.lunchkarty.eu


Re: [PATCH v3 1/2] ftpm: firmware TPM running in TEE

2019-05-17 Thread Sasha Levin

On Wed, May 15, 2019 at 11:12:50AM +0300, Jarkko Sakkinen wrote:

On Mon, Apr 15, 2019 at 11:56:35AM -0400, Sasha Levin wrote:

This patch adds support for a software-only implementation of a TPM
running in TEE.

There is extensive documentation of the design here:
https://www.microsoft.com/en-us/research/publication/ftpm-software-implementation-tpm-chip/
 .

As well as reference code for the firmware available here:
https://github.com/Microsoft/ms-tpm-20-ref/tree/master/Samples/ARM32-FirmwareTPM


The commit message should include at least a brief description what TEE
is.


The whole TEE subsystem is already well documented in our kernel tree
(https://www.kernel.org/doc/Documentation/tee.txt) and beyond. I can add
a reference to the doc here, but I'd rather not add a bunch of TEE
related comments as you suggest later in your review.

The same way a PCI device driver doesn't describe what PCI is in it's
code, we shouldn't be doing the same for TEE here.


+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "tpm.h"
+#include "tpm_ftpm_tee.h"
+
+#define DRIVER_NAME "ftpm-tee"
+
+/* TA_FTPM_UUID: BC50D971-D4C9-42C4-82CB-343FB7F37896 */
+static const uuid_t ftpm_ta_uuid =
+   UUID_INIT(0xBC50D971, 0xD4C9, 0x42C4,
+ 0x82, 0xCB, 0x34, 0x3F, 0xB7, 0xF3, 0x78, 0x96);


Just wondering why prefixes are here in different order in the comment
and code.


No prefixes, this is a completely randomly generated UUID.

I'll address the rest of your comments in the next ver.

--
Thanks,
Sasha


RE: [PATCH v3 0/2] ftpm: a firmware based TPM driver

2019-05-17 Thread Thirupathaiah Annapureddy


> -Original Message-
> From: Sumit Garg 
> Sent: Thursday, May 16, 2019 11:57 PM
> To: Thirupathaiah Annapureddy 
> Cc: Sasha Levin ; Jarkko Sakkinen
> ; peterhu...@gmx.de; j...@ziepe.ca;
> cor...@lwn.net; Linux Kernel Mailing List ;
> linux-doc@vger.kernel.org; linux-integr...@vger.kernel.org; Microsoft Linux
> Kernel List ; Bryan Kelly (CSI)
> ; Rob Herring 
> Subject: Re: [PATCH v3 0/2] ftpm: a firmware based TPM driver
> 
> + Rob
> 
> On Fri, 17 May 2019 at 00:54, Thirupathaiah Annapureddy
>  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Sumit Garg 
> > > Sent: Thursday, May 16, 2019 12:06 AM
> > > To: Thirupathaiah Annapureddy 
> > > Cc: Sasha Levin ; Jarkko Sakkinen
> > > ; peterhu...@gmx.de; j...@ziepe.ca;
> > > cor...@lwn.net; Linux Kernel Mailing List  ker...@vger.kernel.org>;
> > > linux-doc@vger.kernel.org; linux-integr...@vger.kernel.org; Microsoft
> Linux
> > > Kernel List ; Bryan Kelly (CSI)
> > > 
> > > Subject: Re: [PATCH v3 0/2] ftpm: a firmware based TPM driver
> > >
> > > On Thu, 16 May 2019 at 06:30, Thirupathaiah Annapureddy
> > >  wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Sumit Garg 
> > > > > Sent: Tuesday, May 14, 2019 7:02 PM
> > > > > To: Sasha Levin 
> > > > > Cc: Jarkko Sakkinen ;
> > > peterhu...@gmx.de;
> > > > > j...@ziepe.ca; cor...@lwn.net; Linux Kernel Mailing List  > > > > ker...@vger.kernel.org>; linux-doc@vger.kernel.org; linux-
> > > > > integr...@vger.kernel.org; Microsoft Linux Kernel List  > > > > ker...@microsoft.com>; Thirupathaiah Annapureddy
> > > ;
> > > > > Bryan Kelly (CSI) 
> > > > > Subject: Re: [PATCH v3 0/2] ftpm: a firmware based TPM driver
> > > > >
> > > > > On Wed, 15 May 2019 at 01:00, Sasha Levin 
> wrote:
> > > > > >
> > > > > > On Wed, May 08, 2019 at 03:44:36PM +0300, Jarkko Sakkinen wrote:
> > > > > > >On Tue, May 07, 2019 at 01:40:20PM -0400, Sasha Levin wrote:
> > > > > > >> On Mon, Apr 15, 2019 at 11:56:34AM -0400, Sasha Levin wrote:
> > > > > > >> > From: "Sasha Levin (Microsoft)" 
> > > > > > >> >
> > > > > > >> > Changes since v2:
> > > > > > >> >
> > > > > > >> > - Drop the devicetree bindings patch (we don't add any new
> > > ones).
> > > > > > >> > - More code cleanups based on Jason Gunthorpe's review.
> > > > > > >> >
> > > > > > >> > Sasha Levin (2):
> > > > > > >> >  ftpm: firmware TPM running in TEE
> > > > > > >> >  ftpm: add documentation for ftpm driver
> > > > > > >>
> > > > > > >> Ping? Does anyone have any objections to this?
> > > > > > >
> > > > > > >Sorry I've been on vacation week before last week and last week
> > > > > > >I was extremely busy because I had been on vacation. This in
> > > > > > >my TODO list. Will look into it tomorrow in detail.
> > > > > > >
> > > > > > >Apologies for the delay with this!
> > > > > >
> > > > > > Hi Jarkko,
> > > > > >
> > > > > > If there aren't any big objections to this, can we get it merged
> in?
> > > > > > We'll be happy to address any comments that come up.
> > > > >
> > > > > I guess you have missed or ignored this comment [1]. Please address
> it.
> > > > >
> > > > > [1]
> > > > >
> > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%
> > > > >
> > >
> 2Flkml%2F2019%2F5%2F8%2F11&data=01%7C01%7Cthiruan%40microsoft.com%7Cf2a
> > > > >
> > >
> 80c7b94434329eaee08d6d8d962b1%7C72f988bf86f141af91ab2d7cd011db47%7C1&sd
> > > > >
> ata=hyJRc23NwEFLDuaIMkbSCGetd%2BObQWiAg%2BJtMMR6z9U%3D&reserved=0
> > > > >
> > > > > -Sumit
> > > >
> > > > Thanks for reviewing and adding comments.
> > > >
> > > > We tried to use TEE bus framework you suggested for fTPM enumeration.
> > > > We were not able to pass the TCG Logs collected by the boot loaders.
> > > >
> > > > Currently there are 3 ways to pass TCG Logs based on the code
> > > > in drivers/char/tpm/eventlog:
> > > >
> > > > 1. ACPI Table
> > > > 2. EFI Table
> > > > 3. OF Device node properties
> > > >
> > > > Our ARM system is booting using U-boot and Device Tree.
> > > > So ACPI/EFI table mechanism to pass TCG2 logs won't be applicable.
> > > > We needed to use OF device node properties to pass TCG2 Logs.
> > > > TEE bus enumeration framework does not work for our use case due to
> the
> > > above.
> > >
> > > Firstly let me clarify that this framework is intended to communicate
> > > with TEE based services/devices rather than boot loader. And in this
> > > case fTPM being a TEE based service, so this framework should be used.
> > >
> > It does not work for our use case. We gave enough justification so far.
> > TEE bus enumeration needs to be flexible to support our use case and
> > more future use cases.
> >
> 
> TEE based services are virtual devices which could be very well be
> aware about the platform and device driver could easily query these
> devices for platform specific properties. In case of firmware TPM as a
> TEE based service, it could easily store the event logs generated
> during PCR extend operations which could be fetched a

Re: [PATCH v4 03/18] kunit: test: add string_stream a std::stream like string builder

2019-05-17 Thread Stephen Boyd
Quoting Brendan Higgins (2019-05-14 15:16:56)
> A number of test features need to do pretty complicated string printing
> where it may not be possible to rely on a single preallocated string
> with parameters.
> 
> So provide a library for constructing the string as you go similar to
> C++'s std::string.
> 
> Signed-off-by: Brendan Higgins 
> Reviewed-by: Greg Kroah-Hartman 
> Reviewed-by: Logan Gunthorpe 

Is there any reason why we can't use the seqfile API for this? These
both share a similar goal, formatting strings into a buffer to be read
later. Maybe some new APIs would be needed to extract the buffer
differently, but I hope we could share the code.

If it can't be used, can you please add the reasoning to the commit text
here?



Re: [PATCH v4 04/18] kunit: test: add kunit_stream a std::stream like logger

2019-05-17 Thread Stephen Boyd
Quoting Brendan Higgins (2019-05-14 15:16:57)
> diff --git a/kunit/kunit-stream.c b/kunit/kunit-stream.c
> new file mode 100644
> index 0..1884f1b550888
> --- /dev/null
> +++ b/kunit/kunit-stream.c
> @@ -0,0 +1,152 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * C++ stream style string formatter and printer used in KUnit for outputting
> + * KUnit messages.
> + *
> + * Copyright (C) 2019, Google LLC.
> + * Author: Brendan Higgins 
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +static const char *kunit_stream_get_level(struct kunit_stream *this)
> +{
> +   unsigned long flags;
> +   const char *level;
> +
> +   spin_lock_irqsave(&this->lock, flags);
> +   level = this->level;
> +   spin_unlock_irqrestore(&this->lock, flags);
> +
> +   return level;

Please remove this whole function and inline it to the one call-site.

> +}
> +
> +void kunit_stream_set_level(struct kunit_stream *this, const char *level)
> +{
> +   unsigned long flags;
> +
> +   spin_lock_irqsave(&this->lock, flags);
> +   this->level = level;
> +   spin_unlock_irqrestore(&this->lock, flags);

I don't get the locking here. What are we protecting against? Are tests
running in parallel using the same kunit_stream? If so, why is the level
changeable in one call and then adding strings is done in a different
function call? It would make sense to combine the level setting and
string adding so that it's one atomic operation if it's truly a parallel
operation, or remove the locking entirely.

> +}
> +
> +void kunit_stream_add(struct kunit_stream *this, const char *fmt, ...)
> +{
> +   va_list args;
> +   struct string_stream *stream = this->internal_stream;
> +
> +   va_start(args, fmt);
> +
> +   if (string_stream_vadd(stream, fmt, args) < 0)
> +   kunit_err(this->test, "Failed to allocate fragment: %s\n", 
> fmt);
> +
> +   va_end(args);
> +}
> +
> +void kunit_stream_append(struct kunit_stream *this,
> +   struct kunit_stream *other)
> +{
> +   struct string_stream *other_stream = other->internal_stream;
> +   const char *other_content;
> +
> +   other_content = string_stream_get_string(other_stream);
> +
> +   if (!other_content) {
> +   kunit_err(this->test,
> + "Failed to get string from second argument for 
> appending.\n");
> +   return;
> +   }
> +
> +   kunit_stream_add(this, other_content);
> +}
> +
> +void kunit_stream_clear(struct kunit_stream *this)
> +{
> +   string_stream_clear(this->internal_stream);
> +}
> +
> +void kunit_stream_commit(struct kunit_stream *this)

Should this be rather called kunit_stream_flush()?

> +{
> +   struct string_stream *stream = this->internal_stream;
> +   struct string_stream_fragment *fragment;
> +   const char *level;
> +   char *buf;
> +
> +   level = kunit_stream_get_level(this);
> +   if (!level) {
> +   kunit_err(this->test,
> + "Stream was committed without a specified log 
> level.\n");

Drop the full-stop?

> +   level = KERN_ERR;
> +   kunit_stream_set_level(this, level);
> +   }
> +
> +   buf = string_stream_get_string(stream);
> +   if (!buf) {
> +   kunit_err(this->test,

Can you grow a local variable for 'this->test'? It's used many times.

Also, 'this' is not very kernel idiomatic. We usually name variables by
their type instead of 'this' which is a keyword in other languages.
Perhaps it could be named 'kstream'?

> +"Could not allocate buffer, dumping stream:\n");
> +   list_for_each_entry(fragment, &stream->fragments, node) {
> +   kunit_err(this->test, fragment->fragment);
> +   }
> +   kunit_err(this->test, "\n");
> +   goto cleanup;
> +   }
> +
> +   kunit_printk(level, this->test, buf);
> +   kfree(buf);
> +
> +cleanup:
> +   kunit_stream_clear(this);
> +}
> +
> +static int kunit_stream_init(struct kunit_resource *res, void *context)
> +{
> +   struct kunit *test = context;
> +   struct kunit_stream *stream;
> +
> +   stream = kzalloc(sizeof(*stream), GFP_KERNEL);

Of course, here it's called 'stream', so maybe it should be 'kstream'
here too.

> +   if (!stream)
> +   return -ENOMEM;
> +
> +   res->allocation = stream;
> +   stream->test = test;
> +   spin_lock_init(&stream->lock);
> +   stream->internal_stream = new_string_stream();

Can new_string_stream() be renamed to alloc_string_stream()? Sorry, I
just see so much C++ isms in here it's hard to read from the kernel
developer perspective.

> +
> +   if (!stream->internal_stream) {

Nitpick: Please join this to the "allocation" event above instead of
keeping it separated.

> +   kfree(stream);
> +   return -ENOMEM;
> +   }
> +
> +   retur

Re: [PATCH v4 17/18] kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()

2019-05-17 Thread Stephen Boyd
Quoting Brendan Higgins (2019-05-14 15:17:10)
> diff --git a/kernel/sysctl-test.c b/kernel/sysctl-test.c
> new file mode 100644
> index 0..fe0f2bae66085
> --- /dev/null
> +++ b/kernel/sysctl-test.c
> @@ -0,0 +1,293 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KUnit test of proc sysctl.
> + */
> +
> +#include 
> +#include 

Is this include used?

> +#include 
> +#include 

Is this include used?

> +
> +
> +static void sysctl_test_dointvec_happy_single_negative(struct kunit *test)
> +{
> +   struct ctl_table table = {
> +   .procname = "foo",
> +   .data   = &test_data.int_0001,
> +   .maxlen = sizeof(int),
> +   .mode   = 0644,
> +   .proc_handler   = proc_dointvec,
> +   .extra1 = &i_zero,
> +   .extra2 = &i_one_hundred,
> +   };
> +   char input[] = "-9";
> +   size_t len = sizeof(input) - 1;
> +   loff_t pos = 0;
> +
> +   table.data = kunit_kzalloc(test, sizeof(int), GFP_USER);
> +   KUNIT_EXPECT_EQ(test, 0, proc_dointvec(&table, 1, input, &len, &pos));
> +   KUNIT_EXPECT_EQ(test, sizeof(input) - 1, len);
> +   KUNIT_EXPECT_EQ(test, sizeof(input) - 1, pos);
> +   KUNIT_EXPECT_EQ(test, -9, *(int *)table.data);

Is the casting necessary? Or can the macro do a type coercion of the
second parameter based on the first type?

> +}
> +
> +static void sysctl_test_dointvec_single_less_int_min(struct kunit *test)
> +{
> +   struct ctl_table table = {
> +   .procname = "foo",
> +   .data   = &test_data.int_0001,
> +   .maxlen = sizeof(int),
> +   .mode   = 0644,
> +   .proc_handler   = proc_dointvec,
> +   .extra1 = &i_zero,
> +   .extra2 = &i_one_hundred,
> +   };
> +   char input[32];
> +   size_t len = sizeof(input) - 1;
> +   loff_t pos = 0;
> +   unsigned long abs_of_less_than_min = (unsigned long)INT_MAX
> +- (INT_MAX + INT_MIN) + 1;
> +
> +   KUNIT_EXPECT_LT(test,
> +   snprintf(input, sizeof(input), "-%lu",
> +abs_of_less_than_min),
> +   sizeof(input));
> +
> +   table.data = kunit_kzalloc(test, sizeof(int), GFP_USER);
> +   KUNIT_EXPECT_EQ(test, -EINVAL,
> +   proc_dointvec(&table, 1, input, &len, &pos));
> +   KUNIT_EXPECT_EQ(test, sizeof(input) - 1, len);
> +   KUNIT_EXPECT_EQ(test, 0, *(int *)table.data);
> +}
> +
> +static void sysctl_test_dointvec_single_greater_int_max(struct kunit *test)
> +{
> +   struct ctl_table table = {
> +   .procname = "foo",
> +   .data   = &test_data.int_0001,
> +   .maxlen = sizeof(int),
> +   .mode   = 0644,
> +   .proc_handler   = proc_dointvec,
> +   .extra1 = &i_zero,
> +   .extra2 = &i_one_hundred,
> +   };
> +   char input[32];
> +   size_t len = sizeof(input) - 1;
> +   loff_t pos = 0;
> +   unsigned long greater_than_max = (unsigned long)INT_MAX + 1;
> +
> +   KUNIT_EXPECT_GT(test, greater_than_max, INT_MAX);
> +   KUNIT_EXPECT_LT(test, snprintf(input, sizeof(input), "%lu",
> +  greater_than_max),
> +   sizeof(input));
> +   table.data = kunit_kzalloc(test, sizeof(int), GFP_USER);
> +   KUNIT_EXPECT_EQ(test, -EINVAL,
> +   proc_dointvec(&table, 1, input, &len, &pos));
> +   KUNIT_EXPECT_EQ(test, sizeof(input) - 1, len);
> +   KUNIT_EXPECT_EQ(test, 0, *(int *)table.data);
> +}
> +
> +static int sysctl_test_init(struct kunit *test)
> +{
> +   return 0;
> +}
> +
> +/*
> + * This is run once after each test case, see the comment on 
> example_test_module
> + * for more information.
> + */
> +static void sysctl_test_exit(struct kunit *test)
> +{
> +}

Can the above two be omitted? If they can be empty sometimes it would be
nice to avoid the extra symbols and code by letting them be assigned to
NULL in the kunit_module.

> +
> +/*
> + * Here we make a list of all the test cases we want to add to the test 
> module
> + * below.
> + */
> +static struct kunit_case sysctl_test_cases[] = {
> +   /*
> +* This is a helper to create a test case object from a test case
> +* function; its exact function is not important to understand how to
> +* use KUnit, just know that this is how you associate test cases 
> with a
> +* test module.
> +*/
> +   KUNIT_CASE(sysctl_test_dointvec_null_tbl_data),
> +   KUNIT_CASE(sysctl_test_dointvec_table_maxlen_unset),
> +   KUNIT_CASE(sysctl_test_dointvec_table_len_is_zero),
> +   KUNIT_CASE(sysctl_test_dointvec_table_read_but_position_set),
> +   KUNIT_CA

Re: [PATCH v4 01/18] kunit: test: add KUnit test runner core

2019-05-17 Thread Stephen Boyd
Quoting Brendan Higgins (2019-05-14 15:16:54)
> diff --git a/include/kunit/test.h b/include/kunit/test.h
> new file mode 100644
> index 0..e682ea0e1f9a5
> --- /dev/null
> +++ b/include/kunit/test.h
> @@ -0,0 +1,162 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Base unit test (KUnit) API.
> + *
> + * Copyright (C) 2019, Google LLC.
> + * Author: Brendan Higgins 
> + */
> +
> +#ifndef _KUNIT_TEST_H
> +#define _KUNIT_TEST_H
> +
> +#include 
> +#include 

Is this include used here?

> +
> +struct kunit;
> +
> +/**
> + * struct kunit_case - represents an individual test case.
> + * @run_case: the function representing the actual test case.
> + * @name: the name of the test case.
> + *
> + * A test case is a function with the signature, ``void (*)(struct kunit *)``
> + * that makes expectations (see KUNIT_EXPECT_TRUE()) about code under test. 
> Each
> + * test case is associated with a &struct kunit_module and will be run after 
> the
> + * module's init function and followed by the module's exit function.
> + *
> + * A test case should be static and should only be created with the 
> KUNIT_CASE()
> + * macro; additionally, every array of test cases should be terminated with 
> an
> + * empty test case.
> + *
> + * Example:
> + *
> + * .. code-block:: c
> + *
> + * void add_test_basic(struct kunit *test)
> + * {
> + * KUNIT_EXPECT_EQ(test, 1, add(1, 0));
> + * KUNIT_EXPECT_EQ(test, 2, add(1, 1));
> + * KUNIT_EXPECT_EQ(test, 0, add(-1, 1));
> + * KUNIT_EXPECT_EQ(test, INT_MAX, add(0, INT_MAX));
> + * KUNIT_EXPECT_EQ(test, -1, add(INT_MAX, INT_MIN));
> + * }
> + *
> + * static struct kunit_case example_test_cases[] = {
> + * KUNIT_CASE(add_test_basic),
> + * {},

Nitpick: Please drop the comma on the sentinel so nobody gets ideas to
add another entry after it.

> + * };
> + *
> + */
> +struct kunit_case {
> +   void (*run_case)(struct kunit *test);
> +   const char name[256];

Maybe 256 can be a #define KUNIT_NAME_MAX_LEN? Or it could just be a
const char pointer to a literal pool? Are unit tests making up names at
runtime?

> +
> +   /* private: internal use only. */
> +   bool success;
> +};
> +
> +/**
> + * KUNIT_CASE - A helper for creating a &struct kunit_case
> + * @test_name: a reference to a test case function.
> + *
> + * Takes a symbol for a function representing a test case and creates a
> + * &struct kunit_case object from it. See the documentation for
> + * &struct kunit_case for an example on how to use it.
> + */
> +#define KUNIT_CASE(test_name) { .run_case = test_name, .name = #test_name }
> +
> +/**
> + * struct kunit_module - describes a related collection of &struct 
> kunit_case s.
> + * @name: the name of the test. Purely informational.
> + * @init: called before every test case.
> + * @exit: called after every test case.
> + * @test_cases: a null terminated array of test cases.
> + *
> + * A kunit_module is a collection of related &struct kunit_case s, such that
> + * @init is called before every test case and @exit is called after every 
> test
> + * case, similar to the notion of a *test fixture* or a *test class* in other
> + * unit testing frameworks like JUnit or Googletest.
> + *
> + * Every &struct kunit_case must be associated with a kunit_module for KUnit 
> to
> + * run it.
> + */
> +struct kunit_module {
> +   const char name[256];
> +   int (*init)(struct kunit *test);
> +   void (*exit)(struct kunit *test);
> +   struct kunit_case *test_cases;

Can this variable be const? Or we expect test modules to adjust test_cases after
the fact?

> +};
> +
> +/**
> + * struct kunit - represents a running instance of a test.
> + * @priv: for user to store arbitrary data. Commonly used to pass data 
> created
> + * in the init function (see &struct kunit_module).
> + *
> + * Used to store information about the current context under which the test 
> is
> + * running. Most of this data is private and should only be accessed 
> indirectly
> + * via public functions; the one exception is @priv which can be used by the
> + * test writer to store arbitrary data.
> + */
> +struct kunit {
> +   void *priv;
> +
> +   /* private: internal use only. */
> +   const char *name; /* Read only after initialization! */
> +   spinlock_t lock; /* Gaurds all mutable test state. */
> +   bool success; /* Protected by lock. */
> +};
> +
> +void kunit_init_test(struct kunit *test, const char *name);
> +
> +int kunit_run_tests(struct kunit_module *module);
> +
> +/**
> + * module_test() - used to register a &struct kunit_module with KUnit.
> + * @module: a statically allocated &struct kunit_module.
> + *
> + * Registers @module with the test framework. See &struct kunit_module for 
> more
> + * information.
> + */
> +#define module_test(module) \
> +   static int module_kunit_init##module(void) \
> +   { \
> +  

[PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu slices

2019-05-17 Thread Dave Chiluk
It has been observed, that highly-threaded, non-cpu-bound applications
running under cpu.cfs_quota_us constraints can hit a high percentage of
periods throttled while simultaneously not consuming the allocated
amount of quota.  This use case is typical of user-interactive non-cpu
bound applications, such as those running in kubernetes or mesos when
run on multiple cpu cores.

This has been root caused to threads being allocated per cpu bandwidth
slices, and then not fully using that slice within the period. At which
point the slice and quota expires.  This expiration of unused slice
results in applications not being able to utilize the quota for which
they are allocated.

The expiration of per-cpu slices was recently fixed by commit
512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition").
Prior to that it appears that this has been broken since a least commit
51f2176d74ac ("sched/fair: Fix unlocked reads of some
cfs_b->quota/period") which was introduced in v3.16-rc1 in 2014.  That
commit added the following conditional which resulted in slices never
being expired.

if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
/* extend local deadline, drift is bounded above by 2 ticks */
cfs_rq->runtime_expires += TICK_NSEC;

Because this was broken for nearly 5 years, and has recently been fixed
and is now being noticed by many users running kubernetes
(https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
that the mechanisms around expiring runtime should be removed
altogether.

This allows only per-cpu slices to live longer than the period boundary.
This allows threads on runqueues that do not use much CPU to continue to
use their remaining slice over a longer period of time than
cpu.cfs_period_us. However, this helps prevents the above condition of
hitting throttling while also not fully utilizing your cpu quota.

This theoretically allows a machine to use slightly more than it's
allotted quota in some periods.  This overflow would be bounded by the
remaining per-cpu slice that was left un-used in the previous period.
For CPU bound tasks this will change nothing, as they should
theoretically fully utilize all of their quota and slices in each
period. For user-interactive tasks as described above this provides a
much better user/application experience as their cpu utilization will
more closely match the amount they requested when they hit throttling.

This greatly improves performance of high-thread-count, non-cpu bound
applications with low cfs_quota_us allocation on high-core-count
machines. In the case of an artificial testcase, this performance
discrepancy has been observed to be almost 30x performance improvement,
while still maintaining correct cpu quota restrictions albeit over
longer time intervals than cpu.cfs_period_us.  That testcase is available
at https://github.com/indeedeng/fibtest.

Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
Signed-off-by: Dave Chiluk 
---
 Documentation/scheduler/sched-bwc.txt | 29 +++---
 kernel/sched/fair.c   | 71 +++
 kernel/sched/sched.h  |  4 --
 3 files changed, 29 insertions(+), 75 deletions(-)

diff --git a/Documentation/scheduler/sched-bwc.txt 
b/Documentation/scheduler/sched-bwc.txt
index f6b1873..4ded8ae 100644
--- a/Documentation/scheduler/sched-bwc.txt
+++ b/Documentation/scheduler/sched-bwc.txt
@@ -8,16 +8,33 @@ CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension 
which allows the
 specification of the maximum CPU bandwidth available to a group or hierarchy.
 
 The bandwidth allowed for a group is specified using a quota and period. Within
-each given "period" (microseconds), a group is allowed to consume only up to
-"quota" microseconds of CPU time.  When the CPU bandwidth consumption of a
-group exceeds this limit (for that period), the tasks belonging to its
-hierarchy will be throttled and are not allowed to run again until the next
-period.
+each given "period" (microseconds), a task group is allocated up to "quota"
+microseconds of CPU time.  When the CPU bandwidth consumption of a group
+exceeds this limit (for that period), the tasks belonging to its hierarchy will
+be throttled and are not allowed to run again until the next period.
 
 A group's unused runtime is globally tracked, being refreshed with quota units
 above at each period boundary.  As threads consume this bandwidth it is
 transferred to cpu-local "silos" on a demand basis.  The amount transferred
-within each of these updates is tunable and described as the "slice".
+within each of these updates is tunable and described as the "slice".  Slices
+that are allocated to cpu-local silos do not expire at the end of the period,
+but unallocated quota does.  This doesn't affect cpu-bound applications as they
+by definition consume all of their bandwidth in each each period.
+
+However for highly-threaded user-interactive/non-cpu boun