[PATCH 1/5] Documentation/kr: Update Korean translation to delete reference to the kernel-mentors mailing list

2019-01-24 Thread SeongJae Park
Translate this commit to Korean:

  bc0ef4a7e4c3 ("Documentation: Delete reference to the kernel-mentors mailing 
list")

Signed-off-by: SeongJae Park 
---
 Documentation/translations/ko_KR/howto.rst | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/Documentation/translations/ko_KR/howto.rst 
b/Documentation/translations/ko_KR/howto.rst
index a8197e0..c8b0612 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -220,13 +220,6 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된
 가지고 있지 않다면 다음에 무엇을 해야할지에 관한 방향을 배울 수 있을
 것이다.
 
-여러분들이 이미 커널 트리에 반영하길 원하는 코드 묶음을 가지고 있지만
-올바른 포맷으로 포장하는데 도움이 필요하다면 그러한 문제를 돕기 위해
-만들어진 kernel-mentors 프로젝트가 있다. 그곳은 메일링 리스트이며
-다음에서 참조할 수 있다.
-
- https://selenic.com/mailman/listinfo/kernel-mentors
-
 리눅스 커널 코드에 실제 변경을 하기 전에 반드시 그 코드가 어떻게
 동작하는지 이해하고 있어야 한다. 코드를 분석하기 위하여 특정한 툴의
 도움을 빌려서라도 코드를 직접 읽는 것보다 좋은 것은 없다(대부분의
-- 
2.10.0



[PATCH 3/5] Documentation/process/howto.rst/kokr: Update Korean translation to add a missing cross-reference

2019-01-24 Thread SeongJae Park
Translate this commit to Korean:

  dad051395413 ("Documentation/process/howto.rst: add a missing 
cross-reference")

Signed-off-by: SeongJae Park 
---
 Documentation/translations/ko_KR/howto.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/translations/ko_KR/howto.rst 
b/Documentation/translations/ko_KR/howto.rst
index e88c426..bda7a4f 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -99,7 +99,7 @@ mtk.manpa...@gmail.com의 메인테이너에게 보낼 것을 권장한다.
 
 다음은 커널 소스 트리에 있는 읽어야 할 파일들의 리스트이다.
 
-  README
+  :ref:`Documentation/admin-guide/README.rst `
 이 파일은 리눅스 커널에 관하여 간단한 배경 설명과 커널을 설정하고
 빌드하기 위해 필요한 것을 설명한다. 커널에 입문하는 사람들은 여기서
 시작해야 한다.
-- 
2.10.0



[PATCH 2/5] Documentation/process/howto/kr: Update Korean translation to remove outdated info about bugzilla mailing lists

2019-01-24 Thread SeongJae Park
Translate this commit to Korean:

  bcd3cf0855c5 ("Documentation/process/howto: Remove outdated info about 
bugzilla mailing lists")

Signed-off-by: SeongJae Park 
---
 Documentation/translations/ko_KR/howto.rst | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/Documentation/translations/ko_KR/howto.rst 
b/Documentation/translations/ko_KR/howto.rst
index c8b0612..e88c426 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -370,15 +370,7 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버
 다른 사람들의 버그들을 수정하기 위하여 시간을 낭비하지 않기 때문이다.
 
 이미 보고된 버그 리포트들을 가지고 작업하기 위해서 https://bugzilla.kernel.org
-를 참조하라. 여러분이 앞으로 생겨날 버그 리포트들의 조언자가 되길 원한다면
-bugme-new 메일링 리스트나(새로운 버그 리포트들만이 이곳에서 메일로 전해진다)
-bugme-janitor 메일링 리스트(bugzilla에 모든 변화들이 여기서 메일로 전해진다)
-에 등록하면 된다.
-
-  https://lists.linux-foundation.org/mailman/listinfo/bugme-new
-
-  https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors
-
+를 참조하라.
 
 
 메일링 리스트들
-- 
2.10.0



[PATCH 4/5] docs/kokr: Update Korean translation to tidy up TOCs and refs to license-rules.rst

2019-01-24 Thread SeongJae Park
Transalte this commit to Korean:

  9799445af124 ("docs: tidy up TOCs and refs to license-rules.rst")

Signed-off-by: SeongJae Park 
---
 Documentation/translations/ko_KR/howto.rst | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/translations/ko_KR/howto.rst 
b/Documentation/translations/ko_KR/howto.rst
index bda7a4f..cfd6a42 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -77,10 +77,12 @@ Documentation/process/howto.rst
 
 리눅스 커널 소스 코드는 GPL로 배포(release)되었다. 소스트리의 메인
 디렉토리에 있는 라이센스에 관하여 상세하게 쓰여 있는 COPYING이라는
-파일을 봐라. 여러분이 라이센스에 관한 더 깊은 문제를 가지고 있다면
-리눅스 커널 메일링 리스트에 묻지말고 변호사와 연락하라. 메일링
-리스트들에 있는 사람들은 변호사가 아니기 때문에 법적 문제에 관하여
-그들의 말에 의지해서는 안된다.
+파일을 봐라. 리눅스 커널 라이센싱 규칙과 소스 코드 안의 `SPDX
+`_ 식별자 사용법은
+:ref:`Documentation/process/license-rules.rst ` 에 설명되어
+있다. 여러분이 라이센스에 관한 더 깊은 문제를 가지고 있다면 리눅스 커널 메일링
+리스트에 묻지말고 변호사와 연락하라. 메일링 리스트들에 있는 사람들은 변호사가
+아니기 때문에 법적 문제에 관하여 그들의 말에 의지해서는 안된다.
 
 GPL에 관한 잦은 질문들과 답변들은 다음을 참조하라.
 
-- 
2.10.0



[PATCH 5/5] doc:process:kokr: Update Korean translation to add links where missing

2019-01-24 Thread SeongJae Park
Translate this commit to Korean:

  f77af637f29d ("doc:process: add links where missing")

Signed-off-by: SeongJae Park 
---
 Documentation/translations/ko_KR/howto.rst | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/Documentation/translations/ko_KR/howto.rst 
b/Documentation/translations/ko_KR/howto.rst
index cfd6a42..1525243 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -298,9 +298,9 @@ Andrew Morton의 글이 있다.
 4.x.y는 "stable" 팀에 의해 관리되며 거의 매번 격주로
 배포된다.
 
-커널 트리 문서들 내에 Documentation/process/stable-kernel-rules.rst 파일은 어떤
-종류의 변경들이 -stable 트리로 들어왔는지와 배포 프로세스가 어떻게
-진행되는지를 설명한다.
+커널 트리 문서들 내의 :ref:`Documentation/process/stable-kernel-rules.rst 
`
+파일은 어떤 종류의 변경들이 -stable 트리로 들어왔는지와
+배포 프로세스가 어떻게 진행되는지를 설명한다.
 
 4.x -git 패치들
 ~~~
@@ -355,9 +355,10 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버
 
 https://bugzilla.kernel.org/page.cgi?id=faq.html
 
-메인 커널 소스 디렉토리에 있는 admin-guide/reporting-bugs.rst 파일은 커널 버그라고 생각되는
-것을 보고하는 방법에 관한 좋은 템플릿이며 문제를 추적하기 위해서 커널
-개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고 있다.
+메인 커널 소스 디렉토리에 있는 :ref:`admin-guide/reporting-bugs.rst `
+파일은 커널 버그라고 생각되는 것을 보고하는 방법에 관한 좋은 템플릿이며 문제를
+추적하기 위해서 커널 개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고
+있다.
 
 
 버그 리포트들의 관리
@@ -417,7 +418,8 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버
 "John 커널해커는 작성했다"를 유지하며 여러분들의 의견을 그 메일의 윗부분에
 작성하지 말고 각 인용한 단락들 사이에 넣어라.
 
-여러분들이 패치들을 메일에 넣는다면 그것들은 Documentation/process/submitting-patches.rst에
+여러분들이 패치들을 메일에 넣는다면 그것들은
+:ref:`Documentation/process/submitting-patches.rst ` 에
 나와있는데로 명백히(plain) 읽을 수 있는 텍스트여야 한다. 커널 개발자들은
 첨부파일이나 압축된 패치들을 원하지 않는다. 그들은 여러분들의 패치의
 각 라인 단위로 코멘트를 하길 원하며 압축하거나 첨부하지 않고 보내는 것이
-- 
2.10.0



Re: [RFC v7 0/5] pstore/block: new support logger for block devices

2019-01-24 Thread liaoweixiong
On 2019-01-24 02:26, Aaro Koskinen wrote:
> Hi,
> 
> On Wed, Jan 23, 2019 at 08:05:11PM +0800, liaoweixiong wrote:
>> Why should we need pstore_block?
>> 1. Most embedded intelligent equipment have no persistent ram, which
>> increases costs. We perfer to cheaper solutions, like block devices.
>> In fast, there is already a sample for block device logger in driver
>> MTD (drivers/mtd/mtdoops.c).
> 
> I think you should add a patch for some actual block device using this
> new framework to show that it can work. What HW you think would be
> using things?
> 

I will try to add a patch. Actually, I have implemented it on allwinner
platform, but unfortunately these codes are unsuitable to submit to
upper stream.

In addition, there is already a sample for pstore/blk on patch 3 of
version 7. It names
blkoops. Blkoops is suitable for most block device as what it need is
just a path of a partition. We can use it to test most of features but
panic.

> A.
> 

-- 
liaoweixiong


Re: [RFC v5 1/4] pstore/blk: new support logger for block devices

2019-01-24 Thread liaoweixiong
On 2019-01-24 02:19, Aaro Koskinen wrote:
> Hi,
> 
> On Sat, Jan 19, 2019 at 04:53:48PM +0800, liaoweixiong wrote:
>> On 2019-01-18 08:12, Kees Cook wrote:
 MTD (drivers/mtd/mtdoops.c).
>>>
>>> Would mtdoops get dropped in favor of pstore/blk, or do they not share
>>> features?
>>
>> We can show them what pstore/blk do. I think they will be interested in it.
>> They should do a little work, including make a function for panic read,
>> then they gain enhanced features, including present logs as a file,
>> record multiple logs.
> 
> mtdoops has been in use over a decade and known working. What benefits
> this new framework would offer? (BTW I don't see MTD as "block device".)
> 

Pstore/blk is NOT to replace mtdoops, but to enhance it. Mtdoops provides
operation interface for panic and pstore/blk manage the special
partition space.

The combination of pstore/blk and mtdoops brings follow benefits:
1. Not only panic/oops, but also all feature of pstore, such as console,
ftrace, pmsg, etc.
2. Display log as a file. Pstore/blk can save multiple records and
display all of them as files. Users have no need to parse the partition,
just read files to get what they wants.
3. User can get more information, such as the trigger time, the trigger
count, etc.
4. ... perhaps other benefits that I can't think of right now

> Why should there be a panic read? That adds complexity. This codes runs
> on panic path, so it should be as simple and fast as possible.
> 

Read operation is only used to recover old data. Pstore/blk will do
recovery only one time before write (when trigger a new crash) and read
(when mount pstore filesystem). Most of the time, pstore/blk will do
read by general operation rather than panic read. However, how if kernel
panic when pstore/blk do not recover yet? That's why we need panic read.

In addition, pstore/blk do not recover log when panic. It just recover
the trigger counter and next position. Pstore/blk need old count to
accumulate count, next position to avoid damage to old records.

> Also compatibility has to be there. E.g. user upgrades an old system
> (with mtdoops and related userspace) to a new kernel. Upgrade fails,
> so the old software must be able to read the panic dumps properly to
> tell the user why.
> 

This is really a problem, because the header of each records is very
different between pstore/blk and mtdoops. But i think it is not fatal.

> A.
> 

-- 
liaoweixiong


Re: [RFC] Provide in-kernel headers for making it easy to extend the kernel

2019-01-24 Thread Joel Fernandes
On Wed, Jan 23, 2019 at 09:32:16PM -0500, Joel Fernandes wrote:
> On Wed, Jan 23, 2019 at 02:37:47PM -0800, Daniel Colascione wrote:
> > On Wed, Jan 23, 2019 at 1:29 PM Karim Yaghmour
> >  wrote:
> [...]
> > > Personally I advocated a more aggressive approach with Joel in private:
> > > just put the darn headers straight into the kernel image, it's the
> > > *only* artifact we're sure will follow the Android device whatever
> > > happens to it (like built-in ftrace).
> > 
> > I was thinking along similar lines. Ordinarily, we make loadable
> > kernel modules. What we kind of want here is a non-loadable kernel
> > module --- or a non-loadable section in the kernel image proper. I'm
> > not familiar with early-stage kernel loader operation: I know it's
> > possible to crease discardable sections in the kernel image, but can
> > we create sections that are never slurped into memory in the first
> > place? If not, maybe loading and immediately discarding the header
> > section is good enough.
> 
> I am happy to see if I can shrink it down further. Especially using xz and
> stripping all comments period. I am optimistic this can be brought down
> further to a point where it would make sense to everyone to build it into the
> kernel. Lets see.
> 
> Last time I stripped comments, it went down by ~40%. What I haven't tried is
> doing this *with* xz compression. I am also open to brainstorming what else
> can be stripped.

Removing comments (/* */) with xz compression brings it down to 3.3MB.

thanks,

 - Joel



Re: [RFC] Provide in-kernel headers for making it easy to extend the kernel

2019-01-24 Thread Karim Yaghmour



On 1/23/19 11:37 PM, Daniel Colascione wrote:

While I think there's definitely a place for eBPF as part of the
Android performance toolkit, I think most users will end up using it
through rich front-end performance collection and analysis tools (of
the sort I'm working on) rather than directly as a first-line window
into the operation of the system.


Sure, I don't disagree.


Below this level is probably
something like bpftrace, and below that, raw eBPF and ftrace
manipulation. It's also worth noting that much of the time, system
analysis is retrospection, not inspection (e.g., investigating the
causes of rare and hard-to-reproduce bad behavior), and so iteration
via interactive specification of eBPF programs isn't a practical path
forward. It's still useful, even in this scenario, to be able (as part
of higher-level tools) attach "canned" eBPF programs to the kernel to
extract certain generally-useful bits of information, and in this
capacity, Joel's header module would be useful.


Hmm. Not sure I agree about that. There's no reason I can't use Android 
Studio to "right-click" on a line of code or even a span of code and 
select a "trace this in detail for me" option, where "in detail" could 
mean different things depending on the code that's highlighted. Doing 
I/O calls? Then automatically measure some I/O benchmarks for that 
portion of code. Doing graphics calls? Do the same for the graphics 
stack, etc.



Personally I advocated a more aggressive approach with Joel in private:
just put the darn headers straight into the kernel image, it's the
*only* artifact we're sure will follow the Android device whatever
happens to it (like built-in ftrace).


I was thinking along similar lines. Ordinarily, we make loadable
kernel modules. What we kind of want here is a non-loadable kernel
module --- or a non-loadable section in the kernel image proper. I'm
not familiar with early-stage kernel loader operation: I know it's
possible to crease discardable sections in the kernel image, but can
we create sections that are never slurped into memory in the first
place? If not, maybe loading and immediately discarding the header
section is good enough.


Interesting. Maybe just append it to the image but have it not loaded 
and have a kernel parameter than enables a "/proc/kheaders" driver to 
know where the fetch the appended headers from storage at runtime. There 
would be no RAM loading whatsoever of the headers, just some sort of 
"kheaders=/dev/foobar:offset:size" parameter. If you turn the option on, 
you get a fatter kernel image size to store on permanent storage, but no 
impact on what's loaded at boot time.



Would such a thing really do better than LZMA? LZMA already has very
clever techniques for eliminating long-range redundancies in
compressible text, including redundancies at the sub-byte level. I can
certainly understand the benefit of stripping comments, since removing
comments really does decrease the total amount of information the
compressor has to preserve, but I'm not sure how much the encoding
scheme you propose below would help, since it reminds me of the
encoding scheme that LZMA would discover automatically.


I'm no compression algorithm expert. If you say LZMA would do the 
same/better than what I suggested then I have no reason to contest that. 
My goal is to see the headers as part of the kernel image that's 
distributed on devices so that they don't have to be chased around. I'm 
just trying to make it as palatable as possible.



Whether such craziness makes sense or is adopted or not isn't mine to
chart, but I certainly can't see eBPF reaching the same mass deployment
ftrace has within the Android ecosystem until there's a way to use it
without having to chase kernel headers independently of kernel images.
There are "too many clicks" involved and someone somewhere will drop the
ball if it's not glued to the kernel in some way shape or form. Any
solution that solves this is one I'd love to hear about.


I agree. There definitely needs to be a "just collect a damn trace"
button that works on any device, and for this button to work and
incorporate eBPF, the system needs to be able to describe itself.


I like that: "the system needs to be able to describe itself". True.

Cheers,

--
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour


Re: [PATCH v8 05/26] clocksource: Add driver for the Ingenic JZ47xx OST

2019-01-24 Thread Stephen Boyd
Quoting Guenter Roeck (2019-01-23 10:01:55)
> On Wed, Jan 23, 2019 at 02:25:53PM -0300, Paul Cercueil wrote:
> > Hi,
> > 
> > Le mer. 23 janv. 2019 à 11:31, Guenter Roeck  a écrit 
> > :
> > >On 1/23/19 4:58 AM, Mathieu Malaterre wrote:
> > >>On Wed, Dec 12, 2018 at 11:09 PM Paul Cercueil 
> > >>wrote:
> > >>>
> > >>>From: Maarten ter Huurne 
> > >>>
> > >>>OST is the OS Timer, a 64-bit timer/counter with buffered reading.
> > >>>
> > >>>SoCs before the JZ4770 had (if any) a 32-bit OST; the JZ4770 and
> > >>>JZ4780 have a 64-bit OST.
> > >>>
> > >>>This driver will register both a clocksource and a sched_clock to the
> > >>>system.
> > >>>
> > >>>Signed-off-by: Maarten ter Huurne 
> > >>>Signed-off-by: Paul Cercueil 
> > >>>---
> > >>>
> > >>>Notes:
> > >>>  v5: New patch
> > >>>
> > >>>  v6: - Get rid of SoC IDs; pass pointer to ingenic_ost_soc_info
> > >>>as
> > >>>devicetree match data instead.
> > >>>  - Use device_get_match_data() instead of the of_* variant
> > >>>  - Handle error of dev_get_regmap() properly
> > >>>
> > >>>  v7: Fix section mismatch by using
> > >>>builtin_platform_driver_probe()
> > >>>
> > >>>  v8: builtin_platform_driver_probe() does not work anymore in
> > >>>  4.20-rc6? The probe function won't be called. Work around
> > >>>this
> > >>>  for now by using late_initcall.
> > >>>
> > >
> > >Did anyone notice this ? Either something is wrong with the driver, or
> > >with the kernel core. Hacking around it seems like the worst possible
> > >"solution".
> > 
> > I can confirm it still happens on 5.0-rc3.
> > 
> > Just to explain what I'm doing:
> > 
> > My ingenic-timer driver probes with builtin_platform_driver_probe (this
> > works),
> > and then calls of_platform_populate to probe its children. This driver,
> > ingenic-ost, is one of them, and will fail to probe with
> > builtin_platform_driver_probe.
> > 
> 
> The big question is _why_ it fails to probe.
> 

Are you sharing the device tree node between a 'normal' platform device
driver and something more low level DT that marks the device's backing
DT node as OF_POPULATED early on? That's my only guess why it's not
working.



Re: [PATCH v8 05/26] clocksource: Add driver for the Ingenic JZ47xx OST

2019-01-24 Thread Paul Cercueil




Le jeu. 24 janv. 2019 à 16:28, Stephen Boyd  a 
écrit :

Quoting Guenter Roeck (2019-01-23 10:01:55)

 On Wed, Jan 23, 2019 at 02:25:53PM -0300, Paul Cercueil wrote:
 > Hi,
 >
 > Le mer. 23 janv. 2019 Ã  11:31, Guenter Roeck 
 a écrit :

 > >On 1/23/19 4:58 AM, Mathieu Malaterre wrote:
 > >>On Wed, Dec 12, 2018 at 11:09 PM Paul Cercueil 


 > >>wrote:
 > >>>
 > >>>From: Maarten ter Huurne 
 > >>>
 > >>>OST is the OS Timer, a 64-bit timer/counter with buffered 
reading.

 > >>>
 > >>>SoCs before the JZ4770 had (if any) a 32-bit OST; the JZ4770 
and

 > >>>JZ4780 have a 64-bit OST.
 > >>>
 > >>>This driver will register both a clocksource and a sched_clock 
to the

 > >>>system.
 > >>>
 > >>>Signed-off-by: Maarten ter Huurne 
 > >>>Signed-off-by: Paul Cercueil 
 > >>>---
 > >>>
 > >>>Notes:
 > >>>  v5: New patch
 > >>>
 > >>>  v6: - Get rid of SoC IDs; pass pointer to 
ingenic_ost_soc_info

 > >>>as
 > >>>devicetree match data instead.
 > >>>  - Use device_get_match_data() instead of the of_* 
variant

 > >>>  - Handle error of dev_get_regmap() properly
 > >>>
 > >>>  v7: Fix section mismatch by using
 > >>>builtin_platform_driver_probe()
 > >>>
 > >>>  v8: builtin_platform_driver_probe() does not work 
anymore in
 > >>>  4.20-rc6? The probe function won't be called. Work 
around

 > >>>this
 > >>>  for now by using late_initcall.
 > >>>
 > >
 > >Did anyone notice this ? Either something is wrong with the 
driver, or
 > >with the kernel core. Hacking around it seems like the worst 
possible

 > >"solution".
 >
 > I can confirm it still happens on 5.0-rc3.
 >
 > Just to explain what I'm doing:
 >
 > My ingenic-timer driver probes with builtin_platform_driver_probe 
(this

 > works),
 > and then calls of_platform_populate to probe its children. This 
driver,

 > ingenic-ost, is one of them, and will fail to probe with
 > builtin_platform_driver_probe.
 >

 The big question is _why_ it fails to probe.



Are you sharing the device tree node between a 'normal' platform 
device

driver and something more low level DT that marks the device's backing
DT node as OF_POPULATED early on? That's my only guess why it's not
working.


I do, but I clear the OF_POPULATED flag so that it is then probed as a
normal platform device, and it's not on this driver's node but its 
parent.




Re: [RFC] Provide in-kernel headers for making it easy to extend the kernel

2019-01-24 Thread Joel Fernandes
On Thu, Jan 24, 2019 at 07:57:26PM +0100, Karim Yaghmour wrote:
> 
> On 1/23/19 11:37 PM, Daniel Colascione wrote:
[..]
> > > Personally I advocated a more aggressive approach with Joel in private:
> > > just put the darn headers straight into the kernel image, it's the
> > > *only* artifact we're sure will follow the Android device whatever
> > > happens to it (like built-in ftrace).
> > 
> > I was thinking along similar lines. Ordinarily, we make loadable
> > kernel modules. What we kind of want here is a non-loadable kernel
> > module --- or a non-loadable section in the kernel image proper. I'm
> > not familiar with early-stage kernel loader operation: I know it's
> > possible to crease discardable sections in the kernel image, but can
> > we create sections that are never slurped into memory in the first
> > place? If not, maybe loading and immediately discarding the header
> > section is good enough.
> 
> Interesting. Maybe just append it to the image but have it not loaded and
> have a kernel parameter than enables a "/proc/kheaders" driver to know where
> the fetch the appended headers from storage at runtime. There would be no
> RAM loading whatsoever of the headers, just some sort of
> "kheaders=/dev/foobar:offset:size" parameter. If you turn the option on, you
> get a fatter kernel image size to store on permanent storage, but no impact
> on what's loaded at boot time.

Embedding anything into the kernel image does impact boot time though because
it increase the time spent by bootloader. A module OTOH would not have such
overhead.

Also a kernel can be booted in any number of ways other than mass storage so
it is not a generic Linux-wide solution to have a kheaders= option like that.
If the option is forgotten, then the running system can't use the feature.
The other issue is it requires a kernel command line option / bootloader
changes for that which adds more configuration burden, which not be needed
with a module.

> > Would such a thing really do better than LZMA? LZMA already has very
> > clever techniques for eliminating long-range redundancies in
> > compressible text, including redundancies at the sub-byte level. I can
> > certainly understand the benefit of stripping comments, since removing
> > comments really does decrease the total amount of information the
> > compressor has to preserve, but I'm not sure how much the encoding
> > scheme you propose below would help, since it reminds me of the
> > encoding scheme that LZMA would discover automatically.
> 
> I'm no compression algorithm expert. If you say LZMA would do the
> same/better than what I suggested then I have no reason to contest that. My
> goal is to see the headers as part of the kernel image that's distributed on
> devices so that they don't have to be chased around. I'm just trying to make
> it as palatable as possible.

I believe LZMA is really good at that sort of thing too.

Also at 3.3MB of module size, I think we are really good size-wise. But Dan
is helping look at possibly reducing further if he gets time. Many modules in
my experience are much bigger. amdgpu.ko on my Linux machine is 6.1MB.

I really think making it a module is the best way to make sure this is
bundled with the kernel on the widest number of Android and other Linux
systems, without incurring boot time overhead, or any other command line
configuration burden.

I spoke to so many people at LPC personally with other kernel contributors,
and many folks told me one word - MODULE :D.  Even though I hesitated at
first, now it seems the right solution.

If no one seriously objects, I'll clean this up and post a v2 and with the
RFC tag taken off. Thank you!

 - Joel



[PATCH v3 5/5] psi: introduce psi monitor

2019-01-24 Thread Suren Baghdasaryan
Psi monitor aims to provide a low-latency short-term pressure
detection mechanism configurable by users. It allows users to
monitor psi metrics growth and trigger events whenever a metric
raises above user-defined threshold within user-defined time window.

Time window and threshold are both expressed in usecs. Multiple psi
resources with different thresholds and window sizes can be monitored
concurrently.

Psi monitors activate when system enters stall state for the monitored
psi metric and deactivate upon exit from the stall state. While system
is in the stall state psi signal growth is monitored at a rate of 10 times
per tracking window. Min window size is 500ms, therefore the min monitoring
interval is 50ms. Max window size is 10s with monitoring interval of 1s.

When activated psi monitor stays active for at least the duration of one
tracking window to avoid repeated activations/deactivations when psi
signal is bouncing.

Notifications to the users are rate-limited to one per tracking window.

Signed-off-by: Suren Baghdasaryan 
Signed-off-by: Johannes Weiner 
---
 Documentation/accounting/psi.txt | 104 ++
 include/linux/psi.h  |  10 +
 include/linux/psi_types.h|  59 
 kernel/cgroup/cgroup.c   | 107 +-
 kernel/sched/psi.c   | 562 +--
 5 files changed, 808 insertions(+), 34 deletions(-)

diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt
index b8ca28b60215..6b21c72aa87c 100644
--- a/Documentation/accounting/psi.txt
+++ b/Documentation/accounting/psi.txt
@@ -63,6 +63,107 @@ tracked and exported as well, to allow detection of latency 
spikes
 which wouldn't necessarily make a dent in the time averages, or to
 average trends over custom time frames.
 
+Monitoring for pressure thresholds
+==
+
+Users can register triggers and use poll() to be woken up when resource
+pressure exceeds certain thresholds.
+
+A trigger describes the maximum cumulative stall time over a specific
+time window, e.g. 100ms of total stall time within any 500ms window to
+generate a wakeup event.
+
+To register a trigger user has to open psi interface file under
+/proc/pressure/ representing the resource to be monitored and write the
+desired threshold and time window. The open file descriptor should be
+used to wait for trigger events using select(), poll() or epoll().
+The following format is used:
+
+  
+
+For example writing "some 15 100" into /proc/pressure/memory
+would add 150ms threshold for partial memory stall measured within
+1sec time window. Writing "full 5 100" into /proc/pressure/io
+would add 50ms threshold for full io stall measured within 1sec time window.
+
+Triggers can be set on more than one psi metric and more than one trigger
+for the same psi metric can be specified. However for each trigger a separate
+file descriptor is required to be able to poll it separately from others,
+therefore for each trigger a separate open() syscall should be made even
+when opening the same psi interface file.
+
+Monitors activate only when system enters stall state for the monitored
+psi metric and deactivates upon exit from the stall state. While system is
+in the stall state psi signal growth is monitored at a rate of 10 times per
+tracking window.
+
+The kernel accepts window sizes ranging from 500ms to 10s, therefore min
+monitoring update interval is 50ms and max is 1s.
+
+When activated, psi monitor stays active for at least the duration of one
+tracking window to avoid repeated activations/deactivations when system is
+bouncing in and out of the stall state.
+
+Notifications to the userspace are rate-limited to one per tracking window.
+
+The trigger will de-register when the file descriptor used to define the
+trigger  is closed.
+
+Userspace monitor usage example
+===
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Monitor memory partial stall with 1s tracking window size
+ * and 150ms threshold.
+ */
+int main() {
+   const char trig[] = "some 15 100";
+   struct pollfd fds;
+   int n;
+
+   fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
+   if (fds.fd < 0) {
+   printf("/proc/pressure/memory open error: %s\n",
+   strerror(errno));
+   return 1;
+   }
+   fds.events = POLLPRI;
+
+   if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
+   printf("/proc/pressure/memory write error: %s\n",
+   strerror(errno));
+   return 1;
+   }
+
+   printf("waiting for events...\n");
+   while (1) {
+   n = poll(&fds, 1, -1);
+   if (n < 0) {
+   printf("poll error: %s\n", strerror(errno));
+   return 1;
+   }
+   if (fds.revents & POLLERR) {
+   printf("got POLLERR,

[PATCH v3 0/5] psi: pressure stall monitors v3

2019-01-24 Thread Suren Baghdasaryan
This is respin of:
  https://lwn.net/ml/linux-kernel/20190110220718.261134-1-sur...@google.com/

Android is adopting psi to detect and remedy memory pressure that
results in stuttering and decreased responsiveness on mobile devices.

Psi gives us the stall information, but because we're dealing with
latencies in the millisecond range, periodically reading the pressure
files to detect stalls in a timely fashion is not feasible. Psi also
doesn't aggregate its averages at a high-enough frequency right now.

This patch series extends the psi interface such that users can
configure sensitive latency thresholds and use poll() and friends to
be notified when these are breached.

As high-frequency aggregation is costly, it implements an aggregation
method that is optimized for fast, short-interval averaging, and makes
the aggregation frequency adaptive, such that high-frequency updates
only happen while monitored stall events are actively occurring.

With these patches applied, Android can monitor for, and ward off,
mounting memory shortages before they cause problems for the user.
For example, using memory stall monitors in userspace low memory
killer daemon (lmkd) we can detect mounting pressure and kill less
important processes before device becomes visibly sluggish. In our
memory stress testing psi memory monitors produce roughly 10x less
false positives compared to vmpressure signals. Having ability to
specify multiple triggers for the same psi metric allows other parts
of Android framework to monitor memory state of the device and act
accordingly.

The new interface is straight-forward. The user opens one of the
pressure files for writing and writes a trigger description into the
file descriptor that defines the stall state - some or full, and the
maximum stall time over a given window of time. E.g.:

/* Signal when stall time exceeds 100ms of a 1s window */
char trigger[] = "full 10 100"
fd = open("/proc/pressure/memory")
write(fd, trigger, sizeof(trigger))
while (poll() >= 0) {
...
};
close(fd);

When the monitored stall state is entered, psi adapts its aggregation
frequency according to what the configured time window requires in
order to emit event signals in a timely fashion. Once the stalling
subsides, aggregation reverts back to normal.

The trigger is associated with the open file descriptor. To stop
monitoring, the user only needs to close the file descriptor and the
trigger is discarded.

Patches 1-4 prepare the psi code for polling support. Patch 5
implements the adaptive polling logic, the pressure growth detection
optimized for short intervals, and hooks up write() and poll() on the
pressure files.

The patches were developed in collaboration with Johannes Weiner.

The patches are based on 4.20-rc7.

Johannes Weiner (2):
  fs: kernfs: add poll file operation
  kernel: cgroup: add poll file operation

Suren Baghdasaryan (3):
  psi: introduce state_mask to represent stalled psi states
  psi: rename psi fields in preparation for psi trigger addition
  psi: introduce psi monitor

 Documentation/accounting/psi.txt | 104 ++
 fs/kernfs/file.c |  31 +-
 include/linux/cgroup-defs.h  |   4 +
 include/linux/kernfs.h   |   6 +
 include/linux/psi.h  |  10 +
 include/linux/psi_types.h|  83 -
 kernel/cgroup/cgroup.c   | 119 +-
 kernel/sched/psi.c   | 605 ---
 8 files changed, 891 insertions(+), 71 deletions(-)

Changes in v3:
- Added smp_mb in the slowpath, as per Peter
- Changed psi_group.polling flag to atomic_t, as per Peter
- Added a comment in the hotpath about implicit smp_wmb in the
chmxchg, as per Johannes
- Minimized line-breaks wherever possible, as per Peter

-- 
2.20.1.321.g9e740568ce-goog



[PATCH v3 1/5] fs: kernfs: add poll file operation

2019-01-24 Thread Suren Baghdasaryan
From: Johannes Weiner 

Kernfs has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes. To allow polling
for custom events, add a .poll callback that can override the default.

This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.

Signed-off-by: Johannes Weiner 
Signed-off-by: Suren Baghdasaryan 
---
 fs/kernfs/file.c   | 31 ---
 include/linux/kernfs.h |  6 ++
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index f8d5021a652e..ae948aaa4c53 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -832,26 +832,35 @@ void kernfs_drain_open_files(struct kernfs_node *kn)
  * to see if it supports poll (Neither 'poll' nor 'select' return
  * an appropriate error code).  When in doubt, set a suitable timeout value.
  */
+__poll_t kernfs_generic_poll(struct kernfs_open_file *of, poll_table *wait)
+{
+   struct kernfs_node *kn = kernfs_dentry_node(of->file->f_path.dentry);
+   struct kernfs_open_node *on = kn->attr.open;
+
+   poll_wait(of->file, &on->poll, wait);
+
+   if (of->event != atomic_read(&on->event))
+   return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
+
+   return DEFAULT_POLLMASK;
+}
+
 static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
 {
struct kernfs_open_file *of = kernfs_of(filp);
struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
-   struct kernfs_open_node *on = kn->attr.open;
+   __poll_t ret;
 
if (!kernfs_get_active(kn))
-   goto trigger;
+   return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
 
-   poll_wait(filp, &on->poll, wait);
+   if (kn->attr.ops->poll)
+   ret = kn->attr.ops->poll(of, wait);
+   else
+   ret = kernfs_generic_poll(of, wait);
 
kernfs_put_active(kn);
-
-   if (of->event != atomic_read(&on->event))
-   goto trigger;
-
-   return DEFAULT_POLLMASK;
-
- trigger:
-   return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
+   return ret;
 }
 
 static void kernfs_notify_workfn(struct work_struct *work)
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 5b36b1287a5a..0cac1207bb00 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -25,6 +25,7 @@ struct seq_file;
 struct vm_area_struct;
 struct super_block;
 struct file_system_type;
+struct poll_table_struct;
 
 struct kernfs_open_node;
 struct kernfs_iattrs;
@@ -261,6 +262,9 @@ struct kernfs_ops {
ssize_t (*write)(struct kernfs_open_file *of, char *buf, size_t bytes,
 loff_t off);
 
+   __poll_t (*poll)(struct kernfs_open_file *of,
+struct poll_table_struct *pt);
+
int (*mmap)(struct kernfs_open_file *of, struct vm_area_struct *vma);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -350,6 +354,8 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, 
const char *name,
 int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent,
 const char *new_name, const void *new_ns);
 int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr);
+__poll_t kernfs_generic_poll(struct kernfs_open_file *of,
+struct poll_table_struct *pt);
 void kernfs_notify(struct kernfs_node *kn);
 
 const void *kernfs_super_ns(struct super_block *sb);
-- 
2.20.1.321.g9e740568ce-goog



[PATCH v3 2/5] kernel: cgroup: add poll file operation

2019-01-24 Thread Suren Baghdasaryan
From: Johannes Weiner 

Cgroup has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes. To allow polling
for custom events, add a .poll callback that can override the default.

This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.

Signed-off-by: Johannes Weiner 
Signed-off-by: Suren Baghdasaryan 
---
 include/linux/cgroup-defs.h |  4 
 kernel/cgroup/cgroup.c  | 12 
 2 files changed, 16 insertions(+)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 8fcbae1b8db0..aad3babef007 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -32,6 +32,7 @@ struct kernfs_node;
 struct kernfs_ops;
 struct kernfs_open_file;
 struct seq_file;
+struct poll_table_struct;
 
 #define MAX_CGROUP_TYPE_NAMELEN 32
 #define MAX_CGROUP_ROOT_NAMELEN 64
@@ -574,6 +575,9 @@ struct cftype {
ssize_t (*write)(struct kernfs_open_file *of,
 char *buf, size_t nbytes, loff_t off);
 
+   __poll_t (*poll)(struct kernfs_open_file *of,
+struct poll_table_struct *pt);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lock_class_key   lockdep_key;
 #endif
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f31bd61c9466..e8cd12c6a553 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3533,6 +3533,16 @@ static ssize_t cgroup_file_write(struct kernfs_open_file 
*of, char *buf,
return ret ?: nbytes;
 }
 
+static __poll_t cgroup_file_poll(struct kernfs_open_file *of, poll_table *pt)
+{
+   struct cftype *cft = of->kn->priv;
+
+   if (cft->poll)
+   return cft->poll(of, pt);
+
+   return kernfs_generic_poll(of, pt);
+}
+
 static void *cgroup_seqfile_start(struct seq_file *seq, loff_t *ppos)
 {
return seq_cft(seq)->seq_start(seq, ppos);
@@ -3571,6 +3581,7 @@ static struct kernfs_ops cgroup_kf_single_ops = {
.open   = cgroup_file_open,
.release= cgroup_file_release,
.write  = cgroup_file_write,
+   .poll   = cgroup_file_poll,
.seq_show   = cgroup_seqfile_show,
 };
 
@@ -3579,6 +3590,7 @@ static struct kernfs_ops cgroup_kf_ops = {
.open   = cgroup_file_open,
.release= cgroup_file_release,
.write  = cgroup_file_write,
+   .poll   = cgroup_file_poll,
.seq_start  = cgroup_seqfile_start,
.seq_next   = cgroup_seqfile_next,
.seq_stop   = cgroup_seqfile_stop,
-- 
2.20.1.321.g9e740568ce-goog



[PATCH v3 3/5] psi: introduce state_mask to represent stalled psi states

2019-01-24 Thread Suren Baghdasaryan
The psi monitoring patches will need to determine the same states as
record_times(). To avoid calculating them twice, maintain a state mask
that can be consulted cheaply. Do this in a separate patch to keep the
churn in the main feature patch at a minimum.
This adds 4-byte state_mask member into psi_group_cpu struct which
results in its first cacheline-aligned part to become 52 bytes long.
Add explicit values to enumeration element counters that affect
psi_group_cpu struct size.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/psi_types.h |  9 ++---
 kernel/sched/psi.c| 29 +++--
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index 2cf422db5d18..762c6bb16f3c 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -11,7 +11,7 @@ enum psi_task_count {
NR_IOWAIT,
NR_MEMSTALL,
NR_RUNNING,
-   NR_PSI_TASK_COUNTS,
+   NR_PSI_TASK_COUNTS = 3,
 };
 
 /* Task state bitmasks */
@@ -24,7 +24,7 @@ enum psi_res {
PSI_IO,
PSI_MEM,
PSI_CPU,
-   NR_PSI_RESOURCES,
+   NR_PSI_RESOURCES = 3,
 };
 
 /*
@@ -41,7 +41,7 @@ enum psi_states {
PSI_CPU_SOME,
/* Only per-CPU, to weigh the CPU in the global average: */
PSI_NONIDLE,
-   NR_PSI_STATES,
+   NR_PSI_STATES = 6,
 };
 
 struct psi_group_cpu {
@@ -53,6 +53,9 @@ struct psi_group_cpu {
/* States of the tasks belonging to this group */
unsigned int tasks[NR_PSI_TASK_COUNTS];
 
+   /* Aggregate pressure state derived from the tasks */
+   u32 state_mask;
+
/* Period time sampling buckets for each state of interest (ns) */
u32 times[NR_PSI_STATES];
 
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index fe24de3fbc93..2262d920295f 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -212,17 +212,17 @@ static bool test_state(unsigned int *tasks, enum 
psi_states state)
 static void get_recent_times(struct psi_group *group, int cpu, u32 *times)
 {
struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu);
-   unsigned int tasks[NR_PSI_TASK_COUNTS];
u64 now, state_start;
+   enum psi_states s;
unsigned int seq;
-   int s;
+   u32 state_mask;
 
/* Snapshot a coherent view of the CPU state */
do {
seq = read_seqcount_begin(&groupc->seq);
now = cpu_clock(cpu);
memcpy(times, groupc->times, sizeof(groupc->times));
-   memcpy(tasks, groupc->tasks, sizeof(groupc->tasks));
+   state_mask = groupc->state_mask;
state_start = groupc->state_start;
} while (read_seqcount_retry(&groupc->seq, seq));
 
@@ -238,7 +238,7 @@ static void get_recent_times(struct psi_group *group, int 
cpu, u32 *times)
 * (u32) and our reported pressure close to what's
 * actually happening.
 */
-   if (test_state(tasks, s))
+   if (state_mask & (1 << s))
times[s] += now - state_start;
 
delta = times[s] - groupc->times_prev[s];
@@ -406,15 +406,15 @@ static void record_times(struct psi_group_cpu *groupc, 
int cpu,
delta = now - groupc->state_start;
groupc->state_start = now;
 
-   if (test_state(groupc->tasks, PSI_IO_SOME)) {
+   if (groupc->state_mask & (1 << PSI_IO_SOME)) {
groupc->times[PSI_IO_SOME] += delta;
-   if (test_state(groupc->tasks, PSI_IO_FULL))
+   if (groupc->state_mask & (1 << PSI_IO_FULL))
groupc->times[PSI_IO_FULL] += delta;
}
 
-   if (test_state(groupc->tasks, PSI_MEM_SOME)) {
+   if (groupc->state_mask & (1 << PSI_MEM_SOME)) {
groupc->times[PSI_MEM_SOME] += delta;
-   if (test_state(groupc->tasks, PSI_MEM_FULL))
+   if (groupc->state_mask & (1 << PSI_MEM_FULL))
groupc->times[PSI_MEM_FULL] += delta;
else if (memstall_tick) {
u32 sample;
@@ -435,10 +435,10 @@ static void record_times(struct psi_group_cpu *groupc, 
int cpu,
}
}
 
-   if (test_state(groupc->tasks, PSI_CPU_SOME))
+   if (groupc->state_mask & (1 << PSI_CPU_SOME))
groupc->times[PSI_CPU_SOME] += delta;
 
-   if (test_state(groupc->tasks, PSI_NONIDLE))
+   if (groupc->state_mask & (1 << PSI_NONIDLE))
groupc->times[PSI_NONIDLE] += delta;
 }
 
@@ -447,6 +447,8 @@ static void psi_group_change(struct psi_group *group, int 
cpu,
 {
struct psi_group_cpu *groupc;
unsigned int t, m;
+   enum psi_states s;
+   u32 state_mask = 0;
 
groupc = per_cpu_ptr(group->pcpu, cpu);
 
@@ -479,6 +481,13 @@ static void psi_group_change(struct psi_group *group, int 
cpu,
if (set & (1 << t))
  

[PATCH v3 4/5] psi: rename psi fields in preparation for psi trigger addition

2019-01-24 Thread Suren Baghdasaryan
Renaming psi_group structure member fields used for calculating psi
totals and averages for clear distinction between them and trigger-related
fields that will be added next.

Signed-off-by: Suren Baghdasaryan 
---
 include/linux/psi_types.h | 15 ---
 kernel/sched/psi.c| 26 ++
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index 762c6bb16f3c..47757668bdcb 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -69,20 +69,21 @@ struct psi_group_cpu {
 };
 
 struct psi_group {
-   /* Protects data updated during an aggregation */
-   struct mutex stat_lock;
+   /* Protects data used by the aggregator */
+   struct mutex update_lock;
 
/* Per-cpu task state & time tracking */
struct psi_group_cpu __percpu *pcpu;
 
-   /* Periodic aggregation state */
-   u64 total_prev[NR_PSI_STATES - 1];
-   u64 last_update;
-   u64 next_update;
struct delayed_work clock_work;
 
-   /* Total stall times and sampled pressure averages */
+   /* Total stall times observed */
u64 total[NR_PSI_STATES - 1];
+
+   /* Running pressure averages */
+   u64 avg_total[NR_PSI_STATES - 1];
+   u64 avg_last_update;
+   u64 avg_next_update;
unsigned long avg[NR_PSI_STATES - 1][3];
 };
 
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 2262d920295f..c366503ba135 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -172,9 +172,9 @@ static void group_init(struct psi_group *group)
 
for_each_possible_cpu(cpu)
seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
-   group->next_update = sched_clock() + psi_period;
+   group->avg_next_update = sched_clock() + psi_period;
INIT_DELAYED_WORK(&group->clock_work, psi_update_work);
-   mutex_init(&group->stat_lock);
+   mutex_init(&group->update_lock);
 }
 
 void __init psi_init(void)
@@ -277,7 +277,7 @@ static bool update_stats(struct psi_group *group)
int cpu;
int s;
 
-   mutex_lock(&group->stat_lock);
+   mutex_lock(&group->update_lock);
 
/*
 * Collect the per-cpu time buckets and average them into a
@@ -318,7 +318,7 @@ static bool update_stats(struct psi_group *group)
 
/* avgX= */
now = sched_clock();
-   expires = group->next_update;
+   expires = group->avg_next_update;
if (now < expires)
goto out;
if (now - expires > psi_period)
@@ -331,14 +331,14 @@ static bool update_stats(struct psi_group *group)
 * But the deltas we sample out of the per-cpu buckets above
 * are based on the actual time elapsing between clock ticks.
 */
-   group->next_update = expires + ((1 + missed_periods) * psi_period);
-   period = now - (group->last_update + (missed_periods * psi_period));
-   group->last_update = now;
+   group->avg_next_update = expires + ((1 + missed_periods) * psi_period);
+   period = now - (group->avg_last_update + (missed_periods * psi_period));
+   group->avg_last_update = now;
 
for (s = 0; s < NR_PSI_STATES - 1; s++) {
u32 sample;
 
-   sample = group->total[s] - group->total_prev[s];
+   sample = group->total[s] - group->avg_total[s];
/*
 * Due to the lockless sampling of the time buckets,
 * recorded time deltas can slip into the next period,
@@ -358,11 +358,11 @@ static bool update_stats(struct psi_group *group)
 */
if (sample > period)
sample = period;
-   group->total_prev[s] += sample;
+   group->avg_total[s] += sample;
calc_avgs(group->avg[s], missed_periods, sample, period);
}
 out:
-   mutex_unlock(&group->stat_lock);
+   mutex_unlock(&group->update_lock);
return nonidle_total;
 }
 
@@ -390,8 +390,10 @@ static void psi_update_work(struct work_struct *work)
u64 now;
 
now = sched_clock();
-   if (group->next_update > now)
-   delay = nsecs_to_jiffies(group->next_update - now) + 1;
+   if (group->avg_next_update > now) {
+   delay = nsecs_to_jiffies(
+   group->avg_next_update - now) + 1;
+   }
schedule_delayed_work(dwork, delay);
}
 }
-- 
2.20.1.321.g9e740568ce-goog



Re: [PATCH v8 00/26] Ingenic TCU patchset v8

2019-01-24 Thread Mathieu Malaterre
Paul,

On Wed, Dec 12, 2018 at 11:09 PM Paul Cercueil  wrote:
>
> Hi,
>
> Here's the version 8 and hopefully final version of my patchset, which
> adds support for the Timer/Counter Unit found in JZ47xx SoCs from
> Ingenic.

I can no longer boot my MIPS Creator CI20 with this series (merged
opendingux/for-upstream-timer-v8).

Using screen+ttyUSB, I can see messages stopping at:

...
[  OK  ] Started Cgroup management daemon.
 Starting Regular background program processing daemon...
[  OK  ] Started Regular background program processing daemon.
 Starting System Logging Service...
 Starting Provide limited super user privileges to specific users...
 Starting Restore /etc/resolv.conf if the system cras...s shut down
 Starting WPA supplicant...
 Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.

Nothing really stands out in the error messages. Could you suggest
things to try out to get into a bootable state ?


> The big change is that the timer driver has been simplified. The code to
> dynamically update the system timer or clocksource to a new channel has
> been removed. Now, the system timer and clocksource are provided as
> children nodes in the devicetree, and the TCU channel to use for these
> is deduced from their respective memory resource. The PWM driver will
> also deduce from its memory resources whether a given PWM channel can be
> used, or is reserved for the system timers.
>
> Kind regards,
> - Paul Cercueil
>


Re: [PATCH v8 00/26] Ingenic TCU patchset v8

2019-01-24 Thread Paul Cercueil

Hi Mathieu,

Le jeu. 24 janv. 2019 à 18:26, Mathieu Malaterre  a 
écrit :

Paul,

On Wed, Dec 12, 2018 at 11:09 PM Paul Cercueil  
wrote:


 Hi,

 Here's the version 8 and hopefully final version of my patchset, 
which

 adds support for the Timer/Counter Unit found in JZ47xx SoCs from
 Ingenic.


I can no longer boot my MIPS Creator CI20 with this series (merged
opendingux/for-upstream-timer-v8).

Using screen+ttyUSB, I can see messages stopping at:

...
[  OK  ] Started Cgroup management daemon.
 Starting Regular background program processing daemon...
[  OK  ] Started Regular background program processing daemon.
 Starting System Logging Service...
 Starting Provide limited super user privileges to specific 
users...
 Starting Restore /etc/resolv.conf if the system cras...s 
shut down

 Starting WPA supplicant...
 Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.

Nothing really stands out in the error messages. Could you suggest
things to try out to get into a bootable state ?


I'm debugging it right now on jz4740, it seems to happen when the 
clocksource
from the ingenic-timer driver is used. Is it your case? It should not 
happen

if you have CONFIG_INGENIC_OST set.

 The big change is that the timer driver has been simplified. The 
code to
 dynamically update the system timer or clocksource to a new channel 
has

 been removed. Now, the system timer and clocksource are provided as
 children nodes in the devicetree, and the TCU channel to use for 
these
 is deduced from their respective memory resource. The PWM driver 
will
 also deduce from its memory resources whether a given PWM channel 
can be

 used, or is reserved for the system timers.

 Kind regards,
 - Paul Cercueil





[PATCH 1/2] sched/Documentation/kokr: Update Korean translation to update wake_up() & co. memory-barrier guarantees

2019-01-24 Thread SeongJae Park
Translate this commit to Korean:

  7696f9910a9a ("sched/Documentation: Update wake_up() & co. memory-barrier 
guarantees")

Signed-off-by: SeongJae Park 
Reviewed-by: Yunjae Lee 
---
 .../translations/ko_KR/memory-barriers.txt | 43 +-
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
b/Documentation/translations/ko_KR/memory-barriers.txt
index 7f01fb1..4a6cf4d 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -2146,33 +2146,40 @@ set_current_state() 는 다음의 것들로 감싸질 수도 있습니다:
event_indicated = 1;
wake_up_process(event_daemon);
 
-wake_up() 류에 의해 쓰기 메모리 배리어가 내포됩니다.  만약 그것들이 뭔가를
-깨운다면요.  이 배리어는 태스크 상태가 지워지기 전에 수행되므로, 이벤트를
-알리기 위한 STORE 와 태스크 상태를 TASK_RUNNING 으로 설정하는 STORE 사이에
-위치하게 됩니다.
+wake_up() 이 무언가를 깨우게 되면, 이 함수는 범용 메모리 배리어를 수행합니다.
+이 함수가 아무것도 깨우지 않는다면 메모리 배리어는 수행될 수도, 수행되지 않을
+수도 있습니다; 이 경우에 메모리 배리어를 수행할 거라 오해해선 안됩니다.  이
+배리어는 태스크 상태가 접근되기 전에 수행되는데, 자세히 말하면 이 이벤트를
+알리기 위한 STORE 와 TASK_RUNNING 으로 상태를 쓰는 STORE 사이에 수행됩니다:
 
-   CPU 1   CPU 2
+   CPU 1 (Sleeper) CPU 2 (Waker)
=== ===
set_current_state();STORE event_indicated
  smp_store_mb();   wake_up();
-   STORE current->state  <쓰기 배리어>
-   <범용 배리어>  STORE current->state
-   LOAD event_indicated
+   STORE current->state  ...
+   <범용 배리어>  <범용 배리어>
+   LOAD event_indicated  if ((LOAD task->state) & TASK_NORMAL)
+   STORE task->state
 
-한번더 말합니다만, 이 쓰기 메모리 배리어는 이 코드가 정말로 뭔가를 깨울 때에만
-실행됩니다.  이걸 설명하기 위해, X 와 Y 는 모두 0 으로 초기화 되어 있다는 가정
-하에 아래의 이벤트 시퀀스를 생각해 봅시다:
+여기서 "task" 는 깨어나지는 쓰레드이고 CPU 1 의 "current" 와 같습니다.
+
+반복하지만, wake_up() 이 무언가를 정말 깨운다면 범용 메모리 배리어가 수행될
+것이 보장되지만, 그렇지 않다면 그런 보장이 없습니다.  이걸 이해하기 위해, X 와
+Y 는 모두 0 으로 초기화 되어 있다는 가정 하에 아래의 이벤트 시퀀스를 생각해
+봅시다:
 
CPU 1   CPU 2
=== ===
-   X = 1;  STORE event_indicated
+   X = 1;  Y = 1;
smp_mb();   wake_up();
-   Y = 1;  wait_event(wq, Y == 1);
-   wake_up();load from Y sees 1, no memory barrier
-   load from X might see 0
+   LOAD Y  LOAD X
+
+정말로 깨우기가 행해졌다면, 두 로드 중 (최소한) 하나는 1 을 보게 됩니다.
+반면에, 실제 깨우기가 행해지지 않았다면, 두 로드 모두 0을 볼 수도 있습니다.
 
-위 예제에서의 경우와 달리 깨우기가 정말로 행해졌다면, CPU 2 의 X 로드는 1 을
-본다고 보장될 수 있을 겁니다.
+wake_up_process() 는 항상 범용 메모리 배리어를 수행합니다.  이 배리어 역시
+태스크 상태가 접근되기 전에 수행됩니다.  특히, 앞의 예제 코드에서 wake_up() 이
+wake_up_process() 로 대체된다면 두 로드 중 하나는 1을 볼 것이 보장됩니다.
 
 사용 가능한 깨우기류 함수들로 다음과 같은 것들이 있습니다:
 
@@ -2192,6 +2199,8 @@ wake_up() 류에 의해 쓰기 메모리 배리어가 내포됩니다.  만약 
wake_up_poll();
wake_up_process();
 
+메모리 순서규칙 관점에서, 이 함수들은 모두 wake_up() 과 같거나 보다 강한 순서
+보장을 제공합니다.
 
 [!] 잠재우는 코드와 깨우는 코드에 내포되는 메모리 배리어들은 깨우기 전에
 이루어진 스토어를 잠재우는 코드가 set_current_state() 를 호출한 후에 행하는
-- 
2.10.0



[PATCH 2/2] locking/memory-barriers/kokr: Update Korean translation to replace smp_cond_acquire() with smp_cond_load_acquire()

2019-01-24 Thread SeongJae Park
Transalte this commit to Korean:

  2f359c7ea554 ("locking/memory-barriers: Replace smp_cond_acquire() with 
smp_cond_load_acquire()")

Signed-off-by: SeongJae Park 
Reviewed-by: Yunjae Lee 
---
 Documentation/translations/ko_KR/memory-barriers.txt | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
b/Documentation/translations/ko_KR/memory-barriers.txt
index 4a6cf4d..db0b9d86 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -493,10 +493,8 @@ CPU 에게 기대할 수 있는 최소한의 보장사항 몇가지가 있습니
  이 타입의 오퍼레이션은 단방향의 투과성 배리어처럼 동작합니다.  ACQUIRE
  오퍼레이션 뒤의 모든 메모리 오퍼레이션들이 ACQUIRE 오퍼레이션 후에
  일어난 것으로 시스템의 나머지 컴포넌트들에 보이게 될 것이 보장됩니다.
- LOCK 오퍼레이션과 smp_load_acquire(), smp_cond_acquire() 오퍼레이션도
- ACQUIRE 오퍼레이션에 포함됩니다.  smp_cond_acquire() 오퍼레이션은 컨트롤
- 의존성과 smp_rmb() 를 사용해서 ACQUIRE 의 의미적 요구사항(semantic)을
- 충족시킵니다.
+ LOCK 오퍼레이션과 smp_load_acquire(), smp_cond_load_acquire() 오퍼레이션도
+ ACQUIRE 오퍼레이션에 포함됩니다.
 
  ACQUIRE 오퍼레이션 앞의 메모리 오퍼레이션들은 ACQUIRE 오퍼레이션 완료 후에
  수행된 것처럼 보일 수 있습니다.
-- 
2.10.0



[PATCH 0/2] Korean translation of memory-barriers.txt update

2019-01-24 Thread SeongJae Park
This patchset updates the Korean translation of memory-barriers.txt to follow
latest changes.  It has been reviewed by my one Korean colleague.

SeongJae Park (2):
  sched/Documentation/kokr: Update Korean translation to update
wake_up() & co. memory-barrier guarantees
  locking/memory-barriers/kokr: Update Korean translation to replace
smp_cond_acquire() with smp_cond_load_acquire()

 .../translations/ko_KR/memory-barriers.txt | 49 --
 1 file changed, 28 insertions(+), 21 deletions(-)

-- 
2.10.0



Re: [PATCH v8 05/26] clocksource: Add driver for the Ingenic JZ47xx OST

2019-01-24 Thread Stephen Boyd
Quoting Paul Cercueil (2019-01-24 12:46:28)
> 
> 
> Le jeu. 24 janv. 2019 à 16:28, Stephen Boyd  a 
> écrit :
> > Quoting Guenter Roeck (2019-01-23 10:01:55)
> >>  On Wed, Jan 23, 2019 at 02:25:53PM -0300, Paul Cercueil wrote:
> >>  > Hi,
> >>  >
> >>  > Le mer. 23 janv. 2019 Ã  11:31, Guenter Roeck 
> >>  a écrit :
> >>  > >On 1/23/19 4:58 AM, Mathieu Malaterre wrote:
> >>  > >>On Wed, Dec 12, 2018 at 11:09 PM Paul Cercueil 
> >> 
> >>  > >>wrote:
> >>  > >>>
> >>  > >>>From: Maarten ter Huurne 
> >>  > >>>
> >>  > >>>OST is the OS Timer, a 64-bit timer/counter with buffered 
> >> reading.
> >>  > >>>
> >>  > >>>SoCs before the JZ4770 had (if any) a 32-bit OST; the JZ4770 
> >> and
> >>  > >>>JZ4780 have a 64-bit OST.
> >>  > >>>
> >>  > >>>This driver will register both a clocksource and a sched_clock 
> >> to the
> >>  > >>>system.
> >>  > >>>
> >>  > >>>Signed-off-by: Maarten ter Huurne 
> >>  > >>>Signed-off-by: Paul Cercueil 
> >>  > >>>---
> >>  > >>>
> >>  > >>>Notes:
> >>  > >>>  v5: New patch
> >>  > >>>
> >>  > >>>  v6: - Get rid of SoC IDs; pass pointer to 
> >> ingenic_ost_soc_info
> >>  > >>>as
> >>  > >>>devicetree match data instead.
> >>  > >>>  - Use device_get_match_data() instead of the of_* 
> >> variant
> >>  > >>>  - Handle error of dev_get_regmap() properly
> >>  > >>>
> >>  > >>>  v7: Fix section mismatch by using
> >>  > >>>builtin_platform_driver_probe()
> >>  > >>>
> >>  > >>>  v8: builtin_platform_driver_probe() does not work 
> >> anymore in
> >>  > >>>  4.20-rc6? The probe function won't be called. Work 
> >> around
> >>  > >>>this
> >>  > >>>  for now by using late_initcall.
> >>  > >>>
> >>  > >
> >>  > >Did anyone notice this ? Either something is wrong with the 
> >> driver, or
> >>  > >with the kernel core. Hacking around it seems like the worst 
> >> possible
> >>  > >"solution".
> >>  >
> >>  > I can confirm it still happens on 5.0-rc3.
> >>  >
> >>  > Just to explain what I'm doing:
> >>  >
> >>  > My ingenic-timer driver probes with builtin_platform_driver_probe 
> >> (this
> >>  > works),
> >>  > and then calls of_platform_populate to probe its children. This 
> >> driver,
> >>  > ingenic-ost, is one of them, and will fail to probe with
> >>  > builtin_platform_driver_probe.
> >>  >
> >> 
> >>  The big question is _why_ it fails to probe.
> >> 
> > 
> > Are you sharing the device tree node between a 'normal' platform 
> > device
> > driver and something more low level DT that marks the device's backing
> > DT node as OF_POPULATED early on? That's my only guess why it's not
> > working.
> 
> I do, but I clear the OF_POPULATED flag so that it is then probed as a
> normal platform device, and it's not on this driver's node but its 
> parent.
> 

Where do you clear the OF_POPULATED flag?



Re: [PATCH v8 05/26] clocksource: Add driver for the Ingenic JZ47xx OST

2019-01-24 Thread Paul Cercueil




Le jeu. 24 janv. 2019 à 19:46, Stephen Boyd  a 
écrit :

Quoting Paul Cercueil (2019-01-24 12:46:28)



 Le jeu. 24 janv. 2019 à 16:28, Stephen Boyd  a
 écrit :
 > Quoting Guenter Roeck (2019-01-23 10:01:55)
 >>  On Wed, Jan 23, 2019 at 02:25:53PM -0300, Paul Cercueil wrote:
 >>  > Hi,
 >>  >
 >>  > Le mer. 23 janv. 2019 Ã  11:31, Guenter Roeck
 >>  a écrit :
 >>  > >On 1/23/19 4:58 AM, Mathieu Malaterre wrote:
 >>  > >>On Wed, Dec 12, 2018 at 11:09 PM Paul Cercueil
 >> 
 >>  > >>wrote:
 >>  > >>>
 >>  > >>>From: Maarten ter Huurne 
 >>  > >>>
 >>  > >>>OST is the OS Timer, a 64-bit timer/counter with buffered
 >> reading.
 >>  > >>>
 >>  > >>>SoCs before the JZ4770 had (if any) a 32-bit OST; the 
JZ4770

 >> and
 >>  > >>>JZ4780 have a 64-bit OST.
 >>  > >>>
 >>  > >>>This driver will register both a clocksource and a 
sched_clock

 >> to the
 >>  > >>>system.
 >>  > >>>
 >>  > >>>Signed-off-by: Maarten ter Huurne 
 >>  > >>>Signed-off-by: Paul Cercueil 
 >>  > >>>---
 >>  > >>>
 >>  > >>>Notes:
 >>  > >>>  v5: New patch
 >>  > >>>
 >>  > >>>  v6: - Get rid of SoC IDs; pass pointer to
 >> ingenic_ost_soc_info
 >>  > >>>as
 >>  > >>>devicetree match data instead.
 >>  > >>>  - Use device_get_match_data() instead of the of_*
 >> variant
 >>  > >>>  - Handle error of dev_get_regmap() properly
 >>  > >>>
 >>  > >>>  v7: Fix section mismatch by using
 >>  > >>>builtin_platform_driver_probe()
 >>  > >>>
 >>  > >>>  v8: builtin_platform_driver_probe() does not work
 >> anymore in
 >>  > >>>  4.20-rc6? The probe function won't be called. 
Work

 >> around
 >>  > >>>this
 >>  > >>>  for now by using late_initcall.
 >>  > >>>
 >>  > >
 >>  > >Did anyone notice this ? Either something is wrong with the
 >> driver, or
 >>  > >with the kernel core. Hacking around it seems like the worst
 >> possible
 >>  > >"solution".
 >>  >
 >>  > I can confirm it still happens on 5.0-rc3.
 >>  >
 >>  > Just to explain what I'm doing:
 >>  >
 >>  > My ingenic-timer driver probes with 
builtin_platform_driver_probe

 >> (this
 >>  > works),
 >>  > and then calls of_platform_populate to probe its children. 
This

 >> driver,
 >>  > ingenic-ost, is one of them, and will fail to probe with
 >>  > builtin_platform_driver_probe.
 >>  >
 >>
 >>  The big question is _why_ it fails to probe.
 >>
 >
 > Are you sharing the device tree node between a 'normal' platform
 > device
 > driver and something more low level DT that marks the device's 
backing
 > DT node as OF_POPULATED early on? That's my only guess why it's 
not

 > working.

 I do, but I clear the OF_POPULATED flag so that it is then probed 
as a

 normal platform device, and it's not on this driver's node but its
 parent.



Where do you clear the OF_POPULATED flag?



In the ingenic-timer driver introduced in patch [04/26], inside the 
probe function.