Re: [PATCH v2] net/mlx5e: Use refcount_t for refcount
On Mon, Aug 05, 2019 at 08:06:36PM +, Saeed Mahameed wrote: > On Mon, 2019-08-05 at 14:55 +0800, Chuhong Yuan wrote: > > On Mon, Aug 5, 2019 at 2:13 PM Leon Romanovsky > > wrote: > > > On Sun, Aug 04, 2019 at 10:44:47PM +0800, Chuhong Yuan wrote: > > > > On Sun, Aug 4, 2019 at 8:59 PM Leon Romanovsky > > > > wrote: > > > > > On Sat, Aug 03, 2019 at 12:48:28AM +0800, Chuhong Yuan wrote: > > > > > > refcount_t is better for reference counters since its > > > > > > implementation can prevent overflows. > > > > > > So convert atomic_t ref counters to refcount_t. > > > > > > > > > > I'm not thrilled to see those automatic conversion patches, > > > > > especially > > > > > for flows which can't overflow. There is nothing wrong in using > > > > > atomic_t > > > > > type of variable, do you have in mind flow which will cause to > > > > > overflow? > > > > > > > > > > Thanks > > > > > > > > I have to say that these patches are not done automatically... > > > > Only the detection of problems is done by a script. > > > > All conversions are done manually. > > > > > > Even worse, you need to audit usage of atomic_t and replace there > > > it can overflow. > > > > > > > I am not sure whether the flow can cause an overflow. > > > > > > It can't. > > > > > > > But I think it is hard to ensure that a data path is impossible > > > > to have problems in any cases including being attacked. > > > > > > It is not data path, and I doubt that such conversion will be > > > allowed > > > in data paths without proving that no performance regression is > > > introduced. > > > > So I think it is better to do this minor revision to prevent > > > > potential risk, just like we have done in mlx5/core/cq.c. > > > > > > mlx5/core/cq.c is a different beast, refcount there means actual > > > users > > > of CQ which are limited in SW, so in theory, they have potential > > > to be overflown. > > > > > > It is not the case here, there your are adding new port. > > > There is nothing wrong with atomic_t. > > > > > > > Thanks for your explanation! > > I will pay attention to this point in similar cases. > > But it seems that the semantic of refcount is not always as clear as > > here... > > > > Semantically speaking, there is nothing wrong with moving to refcount_t > in the case of vxlan ports.. it also seems more accurate and will > provide the type protection, even if it is not necessary. Please let me > know what is the verdict here, i can apply this patch to net-next-mlx5. There is no verdict here, it is up to you., if you like code churn, go for it. Thanks > > Thanks, > Saeed.
[PATCH] ALSA: usb-audio: fix a memory leak bug
In snd_usb_get_audioformat_uac3(), a structure for channel maps 'chmap' is allocated through kzalloc() before the execution goto 'found_clock'. However, this structure is not deallocated if the memory allocation for 'pd' fails, leading to a memory leak bug. To fix the above issue, free 'fp->chmap' before returning NULL. Signed-off-by: Wenwen Wang --- sound/usb/stream.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/usb/stream.c b/sound/usb/stream.c index 7ee9d17..e852c7f 100644 --- a/sound/usb/stream.c +++ b/sound/usb/stream.c @@ -1043,6 +1043,7 @@ snd_usb_get_audioformat_uac3(struct snd_usb_audio *chip, pd = kzalloc(sizeof(*pd), GFP_KERNEL); if (!pd) { + kfree(fp->chmap); kfree(fp->rate_table); kfree(fp); return NULL; -- 2.7.4
Unrelated question and threading (was: Bisected: Kernel 4.14 + has 3 times higher write IO latency than Kernel 4.4 with raid1)
Dear Rick, It looks like your message is unrelated to the thread at hand. Therefore, please start a new thread by *not* using the reply feature, but create a new message in your mail program (MUA). Please read some mailing list etiquettes on the Web like [1]. Kind regards, Paul [1]: https://wiki.openstack.org/wiki/MailingListEtiquette
Re: [PATCH 16/16] dt-bindings: net: add bindings for ADIN PHY driver
On Mon, 2019-08-05 at 16:11 +0200, Andrew Lunn wrote: > [External] > > > + adi,rx-internal-delay: > > +$ref: /schemas/types.yaml#/definitions/uint32 > > +description: | > > + RGMII RX Clock Delay used only when PHY operates in RGMII mode > > (phy-mode > > + is "rgmii-id", "rgmii-rxid", "rgmii-txid") see > > `dt-bindings/net/adin.h` > > + default value is 0 (which represents 2 ns) > > +enum: [ 0, 1, 2, 6, 7 ] > > We want these numbers to be in ns. So the default value would actually > be 2. The driver needs to convert the number in DT to a value to poke > into a PHY register. Please rename the property adi,rx-internal-delay-ns. ack; also, good point about ns units and PHY driver to convert it to reg values; > > > + > > + adi,tx-internal-delay: > > +$ref: /schemas/types.yaml#/definitions/uint32 > > +description: | > > + RGMII TX Clock Delay used only when PHY operates in RGMII mode > > (phy-mode > > + is "rgmii-id", "rgmii-rxid", "rgmii-txid") see > > `dt-bindings/net/adin.h` > > + default value is 0 (which represents 2 ns) > > +enum: [ 0, 1, 2, 6, 7 ] > > Same here. > > > + > > + adi,fifo-depth: > > +$ref: /schemas/types.yaml#/definitions/uint32 > > +description: | > > + When operating in RMII mode, this option configures the FIFO depth. > > + See `dt-bindings/net/adin.h`. > > +enum: [ 0, 1, 2, 3, 4, 5 ] > > Units? You should probably rename this adi,fifo-depth-bits and list > the valid values in bits. units are bits; will adapt this > > > + > > + adi,eee-enabled: > > +description: | > > + Advertise EEE capabilities on power-up/init (default disabled) > > +type: boolean > > It is not the PHY which decides this. The MAC indicates if it is EEE > capable to phylib. phylib looks into the PHY registers to determine if > the PHY supports EEE. phylib will then enable EEE > advertisement. Please remove this, and ensure EEE is disabled by > default. ack; will remove > > Andrew
Re: [RFC PATCH v3 00/16] Core scheduling v3
On 2019/8/6 14:56, Aubrey Li wrote: > On Tue, Aug 6, 2019 at 11:24 AM Aaron Lu wrote: >> I've been thinking if we should consider core wide tenent fairness? >> >> Let's say there are 3 tasks on 2 threads' rq of the same core, 2 tasks >> (e.g. A1, A2) belong to tenent A and the 3rd B1 belong to another tenent >> B. Assume A1 and B1 are queued on the same thread and A2 on the other >> thread, when we decide priority for A1 and B1, shall we also consider >> A2's vruntime? i.e. shall we consider A1 and A2 as a whole since they >> belong to the same tenent? I tend to think we should make fairness per >> core per tenent, instead of per thread(cpu) per task(sched entity). What >> do you guys think? >> > > I also think a way to make fairness per cookie per core, is this what you > want to propose? Yes, that's what I meant.
Re: [PATCH RFC] mm/memcontrol: reclaim severe usage over high limit in get_user_pages loop
On Mon 05-08-19 20:28:40, Yang Shi wrote: > On Mon, Aug 5, 2019 at 7:32 AM Michal Hocko wrote: > > > > On Fri 02-08-19 11:56:28, Yang Shi wrote: > > > On Fri, Aug 2, 2019 at 2:35 AM Michal Hocko wrote: > > > > > > > > On Thu 01-08-19 14:00:51, Yang Shi wrote: > > > > > On Mon, Jul 29, 2019 at 11:48 AM Michal Hocko > > > > > wrote: > > > > > > > > > > > > On Mon 29-07-19 10:28:43, Yang Shi wrote: > > > > > > [...] > > > > > > > I don't worry too much about scale since the scale issue is not > > > > > > > unique > > > > > > > to background reclaim, direct reclaim may run into the same > > > > > > > problem. > > > > > > > > > > > > Just to clarify. By scaling problem I mean 1:1 kswapd thread to > > > > > > memcg. > > > > > > You can have thousands of memcgs and I do not think we really do > > > > > > want > > > > > > to create one kswapd for each. Once we have a kswapd thread pool > > > > > > then we > > > > > > get into a tricky land where a determinism/fairness would be non > > > > > > trivial > > > > > > to achieve. Direct reclaim, on the other hand is bound by the > > > > > > workload > > > > > > itself. > > > > > > > > > > Yes, I agree thread pool would introduce more latency than dedicated > > > > > kswapd thread. But, it looks not that bad in our test. When memory > > > > > allocation is fast, even though dedicated kswapd thread can't catch > > > > > up. So, such background reclaim is best effort, not guaranteed. > > > > > > > > > > I don't quite get what you mean about fairness. Do you mean they may > > > > > spend excessive cpu time then cause other processes starvation? I > > > > > think this could be mitigated by properly organizing and setting > > > > > groups. But, I agree this is tricky. > > > > > > > > No, I meant that the cost of reclaiming a unit of charges (e.g. > > > > SWAP_CLUSTER_MAX) is not constant and depends on the state of the memory > > > > on LRUs. Therefore any thread pool mechanism would lead to unfair > > > > reclaim and non-deterministic behavior. > > > > > > Yes, the cost depends on the state of pages, but I still don't quite > > > understand what does "unfair" refer to in this context. Do you mean > > > some cgroups may reclaim much more than others? > > > > > Or the work may take too long so it can't not serve other cgroups in time? > > > > exactly. > > Actually, I'm not very concerned by this. In our design each memcg has > its dedicated work (memcg->wmark_work), so the reclaim work for > different memcgs could be run in parallel since they are *different* > work in fact although they run the same function. And, We could queue > them to a dedicated unbound workqueue which may have maximum 512 or > scale with nr cpus active works. Although the system may have > thousands of online memcgs, I'm supposed it should be rare to have all > of them trigger reclaim at the same time. I do believe that it might work for your particular usecase but I do not think this is robust enough for the upstream kernel, I am afraid. As I've said I am open to discuss an opt-in per memcg pro-active reclaim (a kernel thread that belongs to the memcg) but it has to be a dedicated worker bound by all the cgroup resource restrictions. -- Michal Hocko SUSE Labs
Re: [PATCH] scsi/megaraid_sas: fix a compilation warning
On Fri, Jul 26, 2019 at 7:55 PM Qian Cai wrote: > > The commit de516379e85f ("scsi: megaraid_sas: changes to function > prototypes") introduced a comilation warning due to it changed the > function prototype of read_fw_status_reg() to take an instance pointer > instead, but forgot to remove an unused variable. > > drivers/scsi/megaraid/megaraid_sas_fusion.c: In function > 'megasas_fusion_update_can_queue': > drivers/scsi/megaraid/megaraid_sas_fusion.c:326:39: warning: variable > 'reg_set' set but not used [-Wunused-but-set-variable] > struct megasas_register_set __iomem *reg_set; >^~~ > Fixes: de516379e85f ("scsi: megaraid_sas: changes to function prototypes") > Signed-off-by: Qian Cai Acked-by: Sumit Saxena > --- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > index a32b3f0fcd15..e8092d59d575 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > @@ -323,9 +323,6 @@ inline void megasas_return_cmd_fusion(struct > megasas_instance *instance, > { > u16 cur_max_fw_cmds = 0; > u16 ldio_threshold = 0; > - struct megasas_register_set __iomem *reg_set; > - > - reg_set = instance->reg_set; > > /* ventura FW does not fill outbound_scratch_pad_2 with queue depth */ > if (instance->adapter_type < VENTURA_SERIES) > -- > 1.8.3.1 >
Re: [PATCH RFC] mm/memcontrol: reclaim severe usage over high limit in get_user_pages loop
On Fri 02-08-19 13:44:38, Michal Hocko wrote: [...] > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index ba9138a4a1de..53a35c526e43 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -2429,8 +2429,12 @@ static int try_charge(struct mem_cgroup *memcg, > > > gfp_t gfp_mask, > > > schedule_work(&memcg->high_work); > > > break; > > > } > > > - current->memcg_nr_pages_over_high += batch; > > > - set_notify_resume(current); > > > + if (gfpflags_allow_blocking(gfp_mask)) { > > > + reclaim_high(memcg, nr_pages, GFP_KERNEL); > > ups, this should be s@GFP_KERNEL@gfp_mask@ > > > > + } else { > > > + current->memcg_nr_pages_over_high += batch; > > > + set_notify_resume(current); > > > + } > > > break; > > > } > > > } while ((memcg = parent_mem_cgroup(memcg))); > > > Should I send an official patch for this? -- Michal Hocko SUSE Labs
[PATCH] media: rc: add include guard to rc-map.h
Add a header include guard just in case. Signed-off-by: Masahiro Yamada --- include/media/rc-map.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/media/rc-map.h b/include/media/rc-map.h index bebd3c4c6338..4e0873f6e853 100644 --- a/include/media/rc-map.h +++ b/include/media/rc-map.h @@ -5,6 +5,9 @@ * Copyright (c) 2010 by Mauro Carvalho Chehab */ +#ifndef _MEDIA_RC_MAP_H +#define _MEDIA_RC_MAP_H + #include #include @@ -290,3 +293,5 @@ struct rc_map *rc_map_get(const char *name); * Please, do not just append newer Remote Controller names at the end. * The names should be ordered in alphabetical order */ + +#endif /* _MEDIA_RC_MAP_H */ -- 2.17.1
Re: XFS segementation fault with new linux 4.19.63
[adding the linux-xfs list] On Wed, Jul 31, 2019 at 03:01:33PM +0200, Kinky Nekoboi wrote: > I am not subscribed, so if you want to contact me do Direkt Email. > > kern output: > > > ul 31 13:51:53 lain kernel: [ 71.660736] XFS: Assertion failed: > xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + > xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= > pag->pagf_freeblks + pag->pagf_flcount, file: > fs/xfs/libxfs/xfs_ag_resv.c, line: 319 > Jul 31 13:51:53 lain kernel: [ 71.681711] [ cut here > ] > Jul 31 13:51:53 lain kernel: [ 71.686416] kernel BUG at > fs/xfs/xfs_message.c:102! > Jul 31 13:51:53 lain kernel: [ 71.691431] invalid opcode: [#1] > SMP NOPTI > Jul 31 13:51:53 lain kernel: [ 71.696047] CPU: 2 PID: 1322 Comm: mount > Not tainted 4.19.63-custom #1 > Jul 31 13:51:53 lain kernel: [ 71.702730] Hardware name: ASUS > KGPE-D16/KGPE-D16, BIOS 4.10-108-gc19161538c 07/29/2019 > Jul 31 13:51:53 lain kernel: [ 71.711028] RIP: 0010:assfail+0x25/0x36 > [xfs] > Jul 31 13:51:53 lain kernel: [ 71.715475] Code: d4 e8 0f 0b c3 0f 1f > 44 00 00 48 89 f1 41 89 d0 48 c7 c6 80 62 fb c0 48 89 fa 31 ff e8 72 f9 > ff ff 80 3d 2e cb 08 00 00 74 02 <0f> 0b 48 c7 c7 b0 62 fb c0 e8 74 11 > d4 e8 0f 0b c3 48 8b b3 a8 01 > Jul 31 13:51:53 lain kernel: [ 71.734532] RSP: 0018:b3a584117cb8 > EFLAGS: 00010202 > Jul 31 13:51:53 lain kernel: [ 71.739849] RAX: RBX: > a0259fc22a00 RCX: > Jul 31 13:51:53 lain kernel: [ 71.747135] RDX: ffc0 RSI: > 000a RDI: c0fa971b > Jul 31 13:51:53 lain kernel: [ 71.754407] RBP: R08: > R09: > Jul 31 13:51:53 lain kernel: [ 71.761633] R10: 000a R11: > f000 R12: a0259c157000 > Jul 31 13:51:53 lain kernel: [ 71.768861] R13: 0008 R14: > a0259c157000 R15: > Jul 31 13:51:53 lain kernel: [ 71.776110] FS: 7f169d61f100() > GS:a025a7c8() knlGS: > Jul 31 13:51:53 lain kernel: [ 71.784330] CS: 0010 DS: ES: > CR0: 80050033 > Jul 31 13:51:53 lain kernel: [ 71.790198] CR2: 7fa8c52fc441 CR3: > 00042453c000 CR4: 000406e0 > Jul 31 13:51:53 lain kernel: [ 71.797446] Call Trace: > Jul 31 13:51:53 lain kernel: [ 71.800040] > xfs_ag_resv_init+0x1bd/0x1d0 [xfs] > Jul 31 13:51:53 lain kernel: [ 71.804717] > xfs_fs_reserve_ag_blocks+0x3e/0xb0 [xfs] > Jul 31 13:51:53 lain kernel: [ 71.809937] xfs_mountfs+0x5b3/0x920 [xfs] > Jul 31 13:51:53 lain kernel: [ 71.814212] > xfs_fs_fill_super+0x44d/0x620 [xfs] > Jul 31 13:51:53 lain kernel: [ 71.818997] ? > xfs_test_remount_options+0x60/0x60 [xfs] > Jul 31 13:51:53 lain kernel: [ 71.824320] mount_bdev+0x177/0x1b0 > Jul 31 13:51:53 lain kernel: [ 71.827868] mount_fs+0x3e/0x145 > Jul 31 13:51:53 lain kernel: [ 71.831178] > vfs_kern_mount.part.35+0x54/0x120 > Jul 31 13:51:53 lain kernel: [ 71.835728] do_mount+0x20e/0xcc0 > Jul 31 13:51:53 lain kernel: [ 71.839098] ? _copy_from_user+0x37/0x60 > Jul 31 13:51:53 lain kernel: [ 71.843097] ? memdup_user+0x4b/0x70 > Jul 31 13:51:53 lain kernel: [ 71.846751] ksys_mount+0xb6/0xd0 > Jul 31 13:51:53 lain kernel: [ 71.850137] __x64_sys_mount+0x21/0x30 > Jul 31 13:51:53 lain kernel: [ 71.853983] do_syscall_64+0x55/0xf0 > Jul 31 13:51:53 lain kernel: [ 71.857621] > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > Jul 31 13:51:53 lain kernel: [ 71.862727] RIP: 0033:0x7f169d2b1fea > Jul 31 13:51:53 lain kernel: [ 71.866362] Code: 48 8b 0d a9 0e 0c 00 > f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 > 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d > 76 0e 0c 00 f7 d8 64 89 01 48 > Jul 31 13:51:53 lain kernel: [ 71.885417] RSP: 002b:7ffe6c4dd048 > EFLAGS: 0246 ORIG_RAX: 00a5 > Jul 31 13:51:53 lain kernel: [ 71.895075] RAX: ffda RBX: > 55e2f3093a40 RCX: 7f169d2b1fea > Jul 31 13:51:53 lain kernel: [ 71.904309] RDX: 55e2f309b220 RSI: > 55e2f3093c70 RDI: 55e2f3093c50 > Jul 31 13:51:53 lain kernel: [ 71.913532] RBP: 7f169d6061c4 R08: > R09: 7f169d2f3400 > Jul 31 13:51:53 lain kernel: [ 71.922730] R10: R11: > 0246 R12: > Jul 31 13:51:53 lain kernel: [ 71.931950] R13: R14: > 55e2f3093c50 R15: 55e2f309b220 > Jul 31 13:51:53 lain kernel: [ 71.941107] Modules linked in: dm_crypt > twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 > twofish_common xts algif_skcipher af_alg dm_mod tun devlink > cpufreq_userspace cpufreq_powersave cpufreq_conservative binfmt_misc xfs > amd64_edac_mod edac_mce_amd kvm_amd ccp rng_core kvm irqbypass > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_intel ast > snd_hda_codec ttm snd_hda_core evdev snd_pcsp snd_hwdep drm_
[PATCH v2] checkpatch: exclude sizeof sub-expressions from MACRO_ARG_REUSE
The arguments of sizeof are not evaluated so arguments are safe to re-use in that context. Excluding sizeof sub-expressions means macros like ARRAY_SIZE can pass checkpatch. Cc: Andy Whitcroft Cc: Joe Perches Signed-off-by: Brendan Jackman --- v2 is the same patch, I just forgot to add CCs to the original. --- scripts/checkpatch.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 93a7edfe0f05..907a8e8d80ae 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -5191,7 +5191,7 @@ sub process { next if ($arg =~ /\.\.\./); next if ($arg =~ /^type$/i); my $tmp_stmt = $define_stmt; - $tmp_stmt =~ s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g; + $tmp_stmt =~ s/\b(sizeof|typeof|__typeof__|__builtin\w+|typecheck\s*\(\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*\)*\b//g; $tmp_stmt =~ s/\#+\s*$arg\b//g; $tmp_stmt =~ s/\b$arg\s*\#\#//g; my $use_cnt = () = $tmp_stmt =~ /\b$arg\b/g; -- 2.17.1
Re: XFS segementation fault with new linux 4.19.63
Addional info: this only occurs if kernel is compiled with: CONFIG_XFS_DEBUG=y running 4.19.64 without xfs debugging works fine Am 06.08.19 um 09:08 schrieb Christoph Hellwig: > [adding the linux-xfs list] > > On Wed, Jul 31, 2019 at 03:01:33PM +0200, Kinky Nekoboi wrote: >> I am not subscribed, so if you want to contact me do Direkt Email. >> >> kern output: >> >> >> ul 31 13:51:53 lain kernel: [ 71.660736] XFS: Assertion failed: >> xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + >> xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= >> pag->pagf_freeblks + pag->pagf_flcount, file: >> fs/xfs/libxfs/xfs_ag_resv.c, line: 319 >> Jul 31 13:51:53 lain kernel: [ 71.681711] [ cut here >> ] >> Jul 31 13:51:53 lain kernel: [ 71.686416] kernel BUG at >> fs/xfs/xfs_message.c:102! >> Jul 31 13:51:53 lain kernel: [ 71.691431] invalid opcode: [#1] >> SMP NOPTI >> Jul 31 13:51:53 lain kernel: [ 71.696047] CPU: 2 PID: 1322 Comm: mount >> Not tainted 4.19.63-custom #1 >> Jul 31 13:51:53 lain kernel: [ 71.702730] Hardware name: ASUS >> KGPE-D16/KGPE-D16, BIOS 4.10-108-gc19161538c 07/29/2019 >> Jul 31 13:51:53 lain kernel: [ 71.711028] RIP: 0010:assfail+0x25/0x36 >> [xfs] >> Jul 31 13:51:53 lain kernel: [ 71.715475] Code: d4 e8 0f 0b c3 0f 1f >> 44 00 00 48 89 f1 41 89 d0 48 c7 c6 80 62 fb c0 48 89 fa 31 ff e8 72 f9 >> ff ff 80 3d 2e cb 08 00 00 74 02 <0f> 0b 48 c7 c7 b0 62 fb c0 e8 74 11 >> d4 e8 0f 0b c3 48 8b b3 a8 01 >> Jul 31 13:51:53 lain kernel: [ 71.734532] RSP: 0018:b3a584117cb8 >> EFLAGS: 00010202 >> Jul 31 13:51:53 lain kernel: [ 71.739849] RAX: RBX: >> a0259fc22a00 RCX: >> Jul 31 13:51:53 lain kernel: [ 71.747135] RDX: ffc0 RSI: >> 000a RDI: c0fa971b >> Jul 31 13:51:53 lain kernel: [ 71.754407] RBP: R08: >> R09: >> Jul 31 13:51:53 lain kernel: [ 71.761633] R10: 000a R11: >> f000 R12: a0259c157000 >> Jul 31 13:51:53 lain kernel: [ 71.768861] R13: 0008 R14: >> a0259c157000 R15: >> Jul 31 13:51:53 lain kernel: [ 71.776110] FS: 7f169d61f100() >> GS:a025a7c8() knlGS: >> Jul 31 13:51:53 lain kernel: [ 71.784330] CS: 0010 DS: ES: >> CR0: 80050033 >> Jul 31 13:51:53 lain kernel: [ 71.790198] CR2: 7fa8c52fc441 CR3: >> 00042453c000 CR4: 000406e0 >> Jul 31 13:51:53 lain kernel: [ 71.797446] Call Trace: >> Jul 31 13:51:53 lain kernel: [ 71.800040] >> xfs_ag_resv_init+0x1bd/0x1d0 [xfs] >> Jul 31 13:51:53 lain kernel: [ 71.804717] >> xfs_fs_reserve_ag_blocks+0x3e/0xb0 [xfs] >> Jul 31 13:51:53 lain kernel: [ 71.809937] xfs_mountfs+0x5b3/0x920 [xfs] >> Jul 31 13:51:53 lain kernel: [ 71.814212] >> xfs_fs_fill_super+0x44d/0x620 [xfs] >> Jul 31 13:51:53 lain kernel: [ 71.818997] ? >> xfs_test_remount_options+0x60/0x60 [xfs] >> Jul 31 13:51:53 lain kernel: [ 71.824320] mount_bdev+0x177/0x1b0 >> Jul 31 13:51:53 lain kernel: [ 71.827868] mount_fs+0x3e/0x145 >> Jul 31 13:51:53 lain kernel: [ 71.831178] >> vfs_kern_mount.part.35+0x54/0x120 >> Jul 31 13:51:53 lain kernel: [ 71.835728] do_mount+0x20e/0xcc0 >> Jul 31 13:51:53 lain kernel: [ 71.839098] ? _copy_from_user+0x37/0x60 >> Jul 31 13:51:53 lain kernel: [ 71.843097] ? memdup_user+0x4b/0x70 >> Jul 31 13:51:53 lain kernel: [ 71.846751] ksys_mount+0xb6/0xd0 >> Jul 31 13:51:53 lain kernel: [ 71.850137] __x64_sys_mount+0x21/0x30 >> Jul 31 13:51:53 lain kernel: [ 71.853983] do_syscall_64+0x55/0xf0 >> Jul 31 13:51:53 lain kernel: [ 71.857621] >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> Jul 31 13:51:53 lain kernel: [ 71.862727] RIP: 0033:0x7f169d2b1fea >> Jul 31 13:51:53 lain kernel: [ 71.866362] Code: 48 8b 0d a9 0e 0c 00 >> f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 >> 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d >> 76 0e 0c 00 f7 d8 64 89 01 48 >> Jul 31 13:51:53 lain kernel: [ 71.885417] RSP: 002b:7ffe6c4dd048 >> EFLAGS: 0246 ORIG_RAX: 00a5 >> Jul 31 13:51:53 lain kernel: [ 71.895075] RAX: ffda RBX: >> 55e2f3093a40 RCX: 7f169d2b1fea >> Jul 31 13:51:53 lain kernel: [ 71.904309] RDX: 55e2f309b220 RSI: >> 55e2f3093c70 RDI: 55e2f3093c50 >> Jul 31 13:51:53 lain kernel: [ 71.913532] RBP: 7f169d6061c4 R08: >> R09: 7f169d2f3400 >> Jul 31 13:51:53 lain kernel: [ 71.922730] R10: R11: >> 0246 R12: >> Jul 31 13:51:53 lain kernel: [ 71.931950] R13: R14: >> 55e2f3093c50 R15: 55e2f309b220 >> Jul 31 13:51:53 lain kernel: [ 71.941107] Modules linked in: dm_crypt >> twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 >> twofish_common xts algif_skcipher af_alg dm_mod tun devlink >> cpufre
Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
On Tue, Aug 6, 2019 at 2:49 PM Christoph Hellwig wrote: > > On Tue, Jul 30, 2019 at 08:15:44PM +0800, guo...@kernel.org wrote: > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > > index 80783bb..3f1ff9d 100644 > > --- a/arch/csky/mm/dma-mapping.c > > +++ b/arch/csky/mm/dma-mapping.c > > @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void) > > { > > return dma_atomic_pool_init(GFP_KERNEL, > > pgprot_noncached(PAGE_KERNEL)); > > } > > -postcore_initcall(atomic_pool_init); > > Please keep the postcore_initcall next to the function it calls. Ok. Change arch_initcall back to postcore_initcall. :) -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/
Re: [PATCH -next] scsi: megaraid_sas: Make a bunch of functions static
On Fri, Jul 26, 2019 at 7:26 PM YueHaibing wrote: > > Fix sparse warnings: > > drivers/scsi/megaraid/megaraid_sas_fusion.c:3369:1: warning: symbol > 'complete_cmd_fusion' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:3535:6: warning: symbol > 'megasas_sync_irqs' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:3554:1: warning: symbol > 'megasas_complete_cmd_dpc_fusion' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:3573:13: warning: symbol > 'megasas_isr_fusion' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:3604:1: warning: symbol > 'build_mpt_mfi_pass_thru' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:3661:40: warning: symbol > 'build_mpt_cmd' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:3688:1: warning: symbol > 'megasas_issue_dcmd_fusion' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:3881:5: warning: symbol > 'megasas_wait_for_outstanding_fusion' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:4005:6: warning: symbol > 'megasas_refire_mgmt_cmd' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:4525:25: warning: symbol > 'megasas_get_peer_instance' was not declared. Should it be static? > drivers/scsi/megaraid/megaraid_sas_fusion.c:4825:7: warning: symbol > 'megasas_fusion_crash_dump' was not declared. Should it be static? > > Reported-by: Hulk Robot > Signed-off-by: YueHaibing Acked-by: Sumit Saxena > --- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 26 ++ > 1 file changed, 14 insertions(+), 12 deletions(-) > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > index 120e3c4..10ef99e 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > @@ -3511,7 +3511,7 @@ megasas_complete_r1_command(struct megasas_instance > *instance, > * @instance: Adapter soft state > * Completes all commands that is in reply descriptor queue > */ > -int > +static int > complete_cmd_fusion(struct megasas_instance *instance, u32 MSIxIndex, > struct megasas_irq_context *irq_context) > { > @@ -3702,7 +3702,7 @@ static void megasas_enable_irq_poll(struct > megasas_instance *instance) > * megasas_sync_irqs - Synchronizes all IRQs owned by adapter > * @instance: Adapter soft state > */ > -void megasas_sync_irqs(unsigned long instance_addr) > +static void megasas_sync_irqs(unsigned long instance_addr) > { > u32 count, i; > struct megasas_instance *instance = > @@ -3760,7 +3760,7 @@ int megasas_irqpoll(struct irq_poll *irqpoll, int > budget) > * > * Tasklet to complete cmds > */ > -void > +static void > megasas_complete_cmd_dpc_fusion(unsigned long instance_addr) > { > struct megasas_instance *instance = > @@ -3780,7 +3780,7 @@ megasas_complete_cmd_dpc_fusion(unsigned long > instance_addr) > /** > * megasas_isr_fusion - isr entry point > */ > -irqreturn_t megasas_isr_fusion(int irq, void *devp) > +static irqreturn_t megasas_isr_fusion(int irq, void *devp) > { > struct megasas_irq_context *irq_context = devp; > struct megasas_instance *instance = irq_context->instance; > @@ -3816,7 +3816,7 @@ irqreturn_t megasas_isr_fusion(int irq, void *devp) > * mfi_cmd:megasas_cmd pointer > * > */ > -void > +static void > build_mpt_mfi_pass_thru(struct megasas_instance *instance, > struct megasas_cmd *mfi_cmd) > { > @@ -3874,7 +3874,7 @@ build_mpt_mfi_pass_thru(struct megasas_instance > *instance, > * @cmd: mfi cmd to build > * > */ > -union MEGASAS_REQUEST_DESCRIPTOR_UNION * > +static union MEGASAS_REQUEST_DESCRIPTOR_UNION * > build_mpt_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd) > { > union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc = NULL; > @@ -3900,7 +3900,7 @@ build_mpt_cmd(struct megasas_instance *instance, struct > megasas_cmd *cmd) > * @cmd: mfi cmd pointer > * > */ > -void > +static void > megasas_issue_dcmd_fusion(struct megasas_instance *instance, > struct megasas_cmd *cmd) > { > @@ -4096,8 +4096,9 @@ static inline void megasas_trigger_snap_dump(struct > megasas_instance *instance) > } > > /* This function waits for outstanding commands on fusion to complete */ > -int megasas_wait_for_outstanding_fusion(struct megasas_instance *instance, > - int reason, int *convert) > +static int > +megasas_wait_for_outstanding_fusion(struct megasas_instance *instance, > + int reason, int *convert) >
Re: [PATCH 15/16] net: phy: adin: add ethtool get_stats support
On Mon, 2019-08-05 at 17:28 +0200, Andrew Lunn wrote: > [External] > > > +struct adin_hw_stat { > > + const char *string; > > +static void adin_get_strings(struct phy_device *phydev, u8 *data) > > +{ > > + int i; > > + > > + for (i = 0; i < ARRAY_SIZE(adin_hw_stats); i++) { > > + memcpy(data + i * ETH_GSTRING_LEN, > > + adin_hw_stats[i].string, ETH_GSTRING_LEN); > > You define string as a char *. So it will be only as long as it should > be. However memcpy always copies ETH_GSTRING_LEN bytes, doing off the > end of the string and into whatever follows. > hmm, will use strlcpy() i blindedly copied memcpy() from some other driver > > > + } > > +} > > + > > +static int adin_read_mmd_stat_regs(struct phy_device *phydev, > > + struct adin_hw_stat *stat, > > + u32 *val) > > +{ > > + int ret; > > + > > + ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, stat->reg1); > > + if (ret < 0) > > + return ret; > > + > > + *val = (ret & 0x); > > + > > + if (stat->reg2 == 0) > > + return 0; > > + > > + ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, stat->reg2); > > + if (ret < 0) > > + return ret; > > + > > + *val <<= 16; > > + *val |= (ret & 0x); > > Does the hardware have a snapshot feature? Is there a danger that > between the two reads stat->reg1 rolls over and you end up with too > big a value? i'm afraid i don't understand about the snapshot feature you are mentioning; i.e. i don't remember seeing it in other chips; regarding the danger that stat->reg1 rolls over, i guess that is possible, but it's a bit hard to guard against; i guess if it ends up in that scenario, [for many counters] things would be horribly bad, and the chip, or cabling would be unusable; not sure if this answer is sufficient/satisfactory; thanks > > Andrew
[PATCH] auxdisplay: charlcd: add include guard to charlcd.h
Add a header include guard just in case. Signed-off-by: Masahiro Yamada --- drivers/auxdisplay/charlcd.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/auxdisplay/charlcd.h b/drivers/auxdisplay/charlcd.h index 8cf6c18b0adb..00911ad0f3de 100644 --- a/drivers/auxdisplay/charlcd.h +++ b/drivers/auxdisplay/charlcd.h @@ -6,6 +6,9 @@ * Copyright (C) 2016-2017 Glider bvba */ +#ifndef _CHARLCD_H +#define _CHARLCD_H + struct charlcd { const struct charlcd_ops *ops; const unsigned char *char_conv; /* Optional */ @@ -37,3 +40,5 @@ int charlcd_register(struct charlcd *lcd); int charlcd_unregister(struct charlcd *lcd); void charlcd_poke(struct charlcd *lcd); + +#endif /* CHARLCD_H */ -- 2.17.1
[PATCH 2/2] auxdisplay: charlcd: add include guard to charlcd.h
Add a header include guard just in case. Signed-off-by: Masahiro Yamada --- drivers/auxdisplay/charlcd.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/auxdisplay/charlcd.h b/drivers/auxdisplay/charlcd.h index 8cf6c18b0adb..00911ad0f3de 100644 --- a/drivers/auxdisplay/charlcd.h +++ b/drivers/auxdisplay/charlcd.h @@ -6,6 +6,9 @@ * Copyright (C) 2016-2017 Glider bvba */ +#ifndef _CHARLCD_H +#define _CHARLCD_H + struct charlcd { const struct charlcd_ops *ops; const unsigned char *char_conv; /* Optional */ @@ -37,3 +40,5 @@ int charlcd_register(struct charlcd *lcd); int charlcd_unregister(struct charlcd *lcd); void charlcd_poke(struct charlcd *lcd); + +#endif /* CHARLCD_H */ -- 2.17.1
[PATCH 1/2] auxdisplay: charlcd: move charlcd.h to drivers/auxdisplay
This header is included in drivers/auxdisplay/. Make it a local header. Signed-off-by: Masahiro Yamada --- drivers/auxdisplay/charlcd.c | 2 +- {include/misc => drivers/auxdisplay}/charlcd.h | 0 drivers/auxdisplay/hd44780.c | 3 +-- drivers/auxdisplay/panel.c | 2 +- 4 files changed, 3 insertions(+), 4 deletions(-) rename {include/misc => drivers/auxdisplay}/charlcd.h (100%) diff --git a/drivers/auxdisplay/charlcd.c b/drivers/auxdisplay/charlcd.c index 92745efefb54..bef6b85778b6 100644 --- a/drivers/auxdisplay/charlcd.c +++ b/drivers/auxdisplay/charlcd.c @@ -20,7 +20,7 @@ #include -#include +#include "charlcd.h" #define LCD_MINOR 156 diff --git a/include/misc/charlcd.h b/drivers/auxdisplay/charlcd.h similarity index 100% rename from include/misc/charlcd.h rename to drivers/auxdisplay/charlcd.h diff --git a/drivers/auxdisplay/hd44780.c b/drivers/auxdisplay/hd44780.c index ab15b64707ad..bcbe13092327 100644 --- a/drivers/auxdisplay/hd44780.c +++ b/drivers/auxdisplay/hd44780.c @@ -14,8 +14,7 @@ #include #include -#include - +#include "charlcd.h" enum hd44780_pin { /* Order does matter due to writing to GPIO array subsets! */ diff --git a/drivers/auxdisplay/panel.c b/drivers/auxdisplay/panel.c index e06de63497cf..f8ff18ba6889 100644 --- a/drivers/auxdisplay/panel.c +++ b/drivers/auxdisplay/panel.c @@ -55,7 +55,7 @@ #include #include -#include +#include "charlcd.h" #define KEYPAD_MINOR 185 -- 2.17.1
Build regressions/improvements in v5.3-rc3
Below is the list of build error/warning regressions/improvements in v5.3-rc3[1] compared to v5.2[2]. Summarized: - build errors: +9/-1 - build warnings: +133/-170 JFYI, when comparing v5.3-rc3[1] to v5.3-rc2[3], the summaries are: - build errors: +0/-1 - build warnings: +59/-99 Note that there may be false regressions, as some logs are incomplete. Still, they're build errors/warnings. Happy fixing! ;-) Thanks to the linux-next team for providing the build service. [1] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/e21a712a9685488f5ce80495b37b9fdbe96c230d/ (all 242 configs) [2] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/0ecfebd2b52404ae0c54a878c872bb93363ada36/ (all 242 configs) [3] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/609488bc979f99f805f34e9a32c1e3b71179d10b/ (241 out of 242 configs) *** ERRORS *** 9 error regressions: + /kisskb/src/drivers/misc/lkdtm/bugs.c: error: 'X86_CR4_SMEP' undeclared (first use in this function): => 281:13 + /kisskb/src/drivers/misc/lkdtm/bugs.c: error: implicit declaration of function 'native_read_cr4' [-Werror=implicit-function-declaration]: => 279:8 + /kisskb/src/drivers/misc/lkdtm/bugs.c: error: implicit declaration of function 'native_write_cr4' [-Werror=implicit-function-declaration]: => 288:2 + /kisskb/src/drivers/net/wireless/intel/iwlwifi/fw/dbg.c: error: call to '__compiletime_assert_2446' declared with attribute error: BUILD_BUG_ON failed: err_str[sizeof(err_str) - 2] != '\n': => 2445:3 + /kisskb/src/drivers/net/wireless/intel/iwlwifi/fw/dbg.c: error: call to '__compiletime_assert_2452' declared with attribute error: BUILD_BUG_ON failed: err_str[sizeof(err_str) - 2] != '\n': => 2451:3 + /kisskb/src/drivers/net/wireless/intel/iwlwifi/fw/dbg.c: error: call to '__compiletime_assert_2790' declared with attribute error: BUILD_BUG_ON failed: invalid_ap_str[sizeof(invalid_ap_str) - 2] != '\n': => 2789:5 + /kisskb/src/drivers/net/wireless/intel/iwlwifi/fw/dbg.c: error: call to '__compiletime_assert_2801' declared with attribute error: BUILD_BUG_ON failed: invalid_ap_str[sizeof(invalid_ap_str) - 2] != '\n': => 2800:5 + /kisskb/src/mm/hmm.c: error: implicit declaration of function 'pud_pfn' [-Werror=implicit-function-declaration]: => 753:9, 753:3 + /kisskb/src/mm/hmm.c: error: implicit declaration of function 'pud_pfn'; did you mean 'pte_pfn'? [-Werror=implicit-function-declaration]: => 753:9 1 error improvements: - error: arch/sh/kernel/cpu/sh2/clock-sh7619.o: undefined reference to `followparent_recalc': .data+0x70) => *** WARNINGS *** 133 warning regressions: + /kisskb/src/arch/arm64/include/asm/kvm_hyp.h: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 31:3 + /kisskb/src/arch/arm64/include/asm/sysreg.h: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 837:2 + /kisskb/src/arch/arm64/kvm/hyp/../../../../virt/kvm/arm/hyp/vgic-v3-sr.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 363:24, 353:24, 396:3, 386:3, 351:24, 384:3, 361:24, 394:3 + /kisskb/src/arch/arm64/kvm/hyp/debug-sr.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 26:18, 34:18, 29:18, 25:19, 23:19, 33:18, 32:18, 27:18, 21:19, 20:19, 22:19, 24:19, 28:18, 31:18, 30:18 + /kisskb/src/arch/mips/oprofile/op_model_mipsxx.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 299:3, 201:3, 199:3, 177:3, 221:3, 197:3, 219:3, 174:3, 302:3, 217:3, 242:6, 305:3, 180:3 + /kisskb/src/arch/nds32/kernel/signal.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 362:20, 315:7 + /kisskb/src/drivers/crypto/chelsio/chtls/chtls_cm.c: warning: 'wait_for_states.constprop.28' uses dynamic stack allocation [enabled by default]: => 403:1 + /kisskb/src/drivers/crypto/talitos.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 3142:4 + /kisskb/src/drivers/dma-buf/dma-buf.c: warning: format '%zu' expects argument of type 'size_t', but argument 3 has type 'unsigned int' [-Wformat=]: => 402:26 + /kisskb/src/drivers/dma/fsldma.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 1165:26 + /kisskb/src/drivers/dma/imx-dma.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 542:6 + /kisskb/src/drivers/dma/tegra210-adma.c: warning: 'tegra_adma_runtime_resume' defined but not used [-Wunused-function]: => 747:12 + /kisskb/src/drivers/dma/tegra210-adma.c: warning: 'tegra_adma_runtime_suspend' defined but not used [-Wunused-function]: => 715:12 + /kisskb/src/drivers/gpu/drm/arm/malidp_hw.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 1311:4, 387:8 + /kisskb/src/drivers/gpu/drm/sun4i/sun4i_tcon.c: warning: this statement may fall through [-Wimplicit-fallthrough=]: => 316:7 + /kisskb/src/drivers/gpu/drm/sun4i/sun6i_mipi_dsi.c: warning: this statement may fall th
Re: [PATCH 4.14 00/53] 4.14.137-stable review
Greg Kroah-Hartman 于2019年8月5日周一 下午3:14写道: > > This is the start of the stable review cycle for the 4.14.137 release. > There are 53 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed 07 Aug 2019 12:47:58 PM UTC. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.137-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.14.y > and the diffstat can be found below. > > thanks, > > greg k-h > Merge, and regression tested on my test machines, all looks good! Thanks, Jack Wang
Re: [PATCH 15/16] net: phy: adin: add ethtool get_stats support
On Mon, 2019-08-05 at 17:30 +0200, Andrew Lunn wrote: > [External] > > On Mon, Aug 05, 2019 at 07:54:52PM +0300, Alexandru Ardelean wrote: > > This change implements retrieving all the error counters from the PHY. > > The PHY supports several error counters/stats. The `Mean Square Errors` > > status values are only valie when a link is established, and shouldn't be > > incremented. These values characterize the quality of a signal. > > I think you mean accumulated, not incremented? accumulated sounds better; > > The rest of the error counters are self-clearing on read. > > Most of them are reports from the Frame Checker engine that the PHY has. > > > > Not retrieving the `LPI Wake Error Count Register` here, since that is used > > by the PHY framework to check for any EEE errors. And that register is > > self-clearing when read (as per IEEE spec). > > > > Signed-off-by: Alexandru Ardelean > > --- > > drivers/net/phy/adin.c | 108 + > > 1 file changed, 108 insertions(+) > > > > diff --git a/drivers/net/phy/adin.c b/drivers/net/phy/adin.c > > index a1f3456a8504..04896547dac8 100644 > > --- a/drivers/net/phy/adin.c > > +++ b/drivers/net/phy/adin.c > > @@ -103,6 +103,32 @@ static struct clause22_mmd_map clause22_mmd_map[] = { > > { MDIO_MMD_PCS, MDIO_PCS_EEE_WK_ERR,ADIN1300_LPI_WAKE_ERR_CNT_REG }, > > }; > > > > +struct adin_hw_stat { > > + const char *string; > > + u16 reg1; > > + u16 reg2; > > + bool do_not_inc; > > do_not_accumulate? or reverse its meaning, clear_on_read? do_not_accumulate works; there are only 4 regs that need this property set to true > >Andrew
Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
On Tue, Aug 06, 2019 at 03:11:13PM +0800, Guo Ren wrote: > On Tue, Aug 6, 2019 at 2:49 PM Christoph Hellwig wrote: > > > > On Tue, Jul 30, 2019 at 08:15:44PM +0800, guo...@kernel.org wrote: > > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c > > > index 80783bb..3f1ff9d 100644 > > > --- a/arch/csky/mm/dma-mapping.c > > > +++ b/arch/csky/mm/dma-mapping.c > > > @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void) > > > { > > > return dma_atomic_pool_init(GFP_KERNEL, > > > pgprot_noncached(PAGE_KERNEL)); > > > } > > > -postcore_initcall(atomic_pool_init); > > > > Please keep the postcore_initcall next to the function it calls. > Ok. Change arch_initcall back to postcore_initcall. :) Well, if you have a good reason to change it please keep the type init level change, but put it in a separate patch. But most importantly don't move the place where is called around.
Re: [PATCH 1/2] auxdisplay: charlcd: move charlcd.h to drivers/auxdisplay
On Tue, Aug 6, 2019 at 9:16 AM Masahiro Yamada wrote: > This header is included in drivers/auxdisplay/. Make it a local header. > > Signed-off-by: Masahiro Yamada Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
[PATCH V2 2/3] csky/dma: Fixup cache_op failed when cross memory ZONEs
From: Guo Ren If the paddr and size are cross between NORMAL_ZONE and HIGHMEM_ZONE memory range, cache_op will panic in do_page_fault with bad_area. Optimize the code to support the range which cross memory ZONEs. Changes for V2: - Revert back to postcore_initcall Signed-off-by: Guo Ren Cc: Christoph Hellwig Cc: Arnd Bergmann --- arch/csky/mm/dma-mapping.c | 71 +- 1 file changed, 26 insertions(+), 45 deletions(-) diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c index 80783bb..65f531d 100644 --- a/arch/csky/mm/dma-mapping.c +++ b/arch/csky/mm/dma-mapping.c @@ -20,69 +20,50 @@ static int __init atomic_pool_init(void) } postcore_initcall(atomic_pool_init); -void arch_dma_prep_coherent(struct page *page, size_t size) -{ - if (PageHighMem(page)) { - unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; - - do { - void *ptr = kmap_atomic(page); - size_t _size = (size < PAGE_SIZE) ? size : PAGE_SIZE; - - memset(ptr, 0, _size); - dma_wbinv_range((unsigned long)ptr, - (unsigned long)ptr + _size); - - kunmap_atomic(ptr); - - page++; - size -= PAGE_SIZE; - count--; - } while (count); - } else { - void *ptr = page_address(page); - - memset(ptr, 0, size); - dma_wbinv_range((unsigned long)ptr, (unsigned long)ptr + size); - } -} - static inline void cache_op(phys_addr_t paddr, size_t size, void (*fn)(unsigned long start, unsigned long end)) { - struct page *page = pfn_to_page(paddr >> PAGE_SHIFT); - unsigned int offset = paddr & ~PAGE_MASK; - size_t left = size; - unsigned long start; + struct page *page= phys_to_page(paddr); + void *start = __va(page_to_phys(page)); + unsigned long offset = offset_in_page(paddr); + size_t left = size; do { size_t len = left; + if (offset + len > PAGE_SIZE) + len = PAGE_SIZE - offset; + if (PageHighMem(page)) { - void *addr; + start = kmap_atomic(page); - if (offset + len > PAGE_SIZE) { - if (offset >= PAGE_SIZE) { - page += offset >> PAGE_SHIFT; - offset &= ~PAGE_MASK; - } - len = PAGE_SIZE - offset; - } + fn((unsigned long)start + offset, + (unsigned long)start + offset + len); - addr = kmap_atomic(page); - start = (unsigned long)(addr + offset); - fn(start, start + len); - kunmap_atomic(addr); + kunmap_atomic(start); } else { - start = (unsigned long)phys_to_virt(paddr); - fn(start, start + size); + fn((unsigned long)start + offset, + (unsigned long)start + offset + len); } offset = 0; + page++; + start += PAGE_SIZE; left -= len; } while (left); } +static void dma_wbinv_set_zero_range(unsigned long start, unsigned long end) +{ + memset((void *)start, 0, end - start); + dma_wbinv_range(start, end); +} + +void arch_dma_prep_coherent(struct page *page, size_t size) +{ + cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range); +} + void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr, size_t size, enum dma_data_direction dir) { -- 2.7.4
Re: [PATCH RFC] mm/memcontrol: reclaim severe usage over high limit in get_user_pages loop
On 8/6/19 10:07 AM, Michal Hocko wrote: On Fri 02-08-19 13:44:38, Michal Hocko wrote: [...] diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ba9138a4a1de..53a35c526e43 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2429,8 +2429,12 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, schedule_work(&memcg->high_work); break; } - current->memcg_nr_pages_over_high += batch; - set_notify_resume(current); + if (gfpflags_allow_blocking(gfp_mask)) { + reclaim_high(memcg, nr_pages, GFP_KERNEL); ups, this should be s@GFP_KERNEL@gfp_mask@ + } else { + current->memcg_nr_pages_over_high += batch; + set_notify_resume(current); + } break; } } while ((memcg = parent_mem_cgroup(memcg))); Should I send an official patch for this? I prefer to keep it as is while we have no better solution.
Re: [PATCH RFC] modpost: Support I2C Aliases from OF tables
Hi Javier, On Tue, Aug 6, 2019 at 12:25 AM Javier Martinez Canillas wrote: > On 7/31/19 9:44 PM, Wolfram Sang wrote: > > Hi Javier, > >> The other option is to remove i2c_of_match_device() and don't make OF match > >> to fallback to i2c_of_match_device_sysfs(). This is what happens in the > >> ACPI > >> case, since i2c_device_match() just calls acpi_driver_match_device() > >> directly > >> and doesn't have a wrapper function that fallbacks to sysfs matching. > >> > >> In this case an I2C device ID table would be required if the devices have > >> to > >> be instantiated through sysfs. That way the I2C table would be used both > >> for > >> auto-loading and also to match the device when it doesn't have an of_node. > > > > That would probably mean that only a minority of drivers will not add an I2C > > device ID table because it is easy to add an you get the sysfs feature? > > > > I believe so yes. > As Masahiro-san mentioned, this approach will still require to add a new macro > MODULE_DEVICE_TABLE(i2c_of, bar_of_match) so the OF device table is used > twice. > > One to expose the "of:N*T*Cfoo,bar" and another one to expose it as "i2c:bar". > > I expect that many developers would miss adding this macro for new drivers > that > are DT-only and so sysfs instantiation would not work there. So whatever is > the > approach taken we should clearly document all this so drivers authors are > aware. You could add a new I2C_MODULE_DEVICE_TABLE() that adds both, right? Makes it a little bit easier to check/enforce this. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [RFC PATCH] pciehp: use completion to wait irq_thread 'pciehp_ist'
On Thu, Jul 04, 2019 at 03:50:38PM +0800, Xiongfeng Wang wrote: > When I use the following command to power on a slot which has been > powered off already. > echo 1 > /sys/bus/pci/slots/22/power > It prints the following error: > -bash: echo: write error: No such device > But the slot is actually powered on and the devices is probed. > > In function 'pciehp_sysfs_enable_slot()', we use 'wait_event()' to wait > until 'ctrl->pending_events' is cleared in 'pciehp_ist()'. But in some > situation, when 'pciehp_ist()' is woken up on a nearby CPU after > 'pciehp_request' is called, 'ctrl->pending_events' is cleared before we > go into sleep state. 'wait_event()' will check the condition before > going into sleep. So we return immediately and '-ENODEV' is return. > > This patch use struct completion to wait until irq_thread 'pciehp_ist' > is completed. Thank you, good catch. Unfortunately your patch still allows the following race AFAICS: * pciehp_ist() is running (e.g. due to a hotplug operation) * a request to disable or enable the slot is submitted via sysfs, the completion is reinitialized * pciehp_ist() finishes, signals completion * the sysfs request returns to user space prematurely * pciehp_ist() is run, handles the sysfs request, signals completion again I'd suggest something like the below instead, could you give it a whirl and see if it reliably fixes the issue for you? -- >8 -- Subject: [PATCH] PCI: pciehp: Avoid returning prematurely from sysfs requests A sysfs request to enable or disable a PCIe hotplug slot should not return before it has been carried out. That is sought to be achieved by waiting until the controller's "pending_events" have been cleared. However the IRQ thread pciehp_ist() clears the "pending_events" before it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen to check the "pending_events" after they have been cleared but while pciehp_ist() is still running, the functions may return prematurely with an incorrect return value. Fix by introducing an "ist_running" flag which must be false before a sysfs request is allowed to return. Fixes: 32a8cef274fe ("PCI: pciehp: Enable/disable exclusively from IRQ thread") Link: https://lore.kernel.org/linux-pci/1562226638-54134-1-git-send-email-wangxiongfe...@huawei.com Reported-by: Xiongfeng Wang Signed-off-by: Lukas Wunner Cc: sta...@vger.kernel.org # v4.19+ --- drivers/pci/hotplug/pciehp.h | 2 ++ drivers/pci/hotplug/pciehp_ctrl.c | 6 -- drivers/pci/hotplug/pciehp_hpc.c | 2 ++ 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index 8c51a04b8083..e316bde45c7b 100644 --- a/drivers/pci/hotplug/pciehp.h +++ b/drivers/pci/hotplug/pciehp.h @@ -72,6 +72,7 @@ extern int pciehp_poll_time; * @reset_lock: prevents access to the Data Link Layer Link Active bit in the * Link Status register and to the Presence Detect State bit in the Slot * Status register during a slot reset which may cause them to flap + * @ist_running: flag to keep user request waiting while IRQ thread is running * @request_result: result of last user request submitted to the IRQ thread * @requester: wait queue to wake up on completion of user request, * used for synchronous slot enable/disable request via sysfs @@ -101,6 +102,7 @@ struct controller { struct hotplug_slot hotplug_slot; /* hotplug core interface */ struct rw_semaphore reset_lock; + unsigned int ist_running; int request_result; wait_queue_head_t requester; }; diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index 631ced0ab28a..1ce9ce335291 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -368,7 +368,8 @@ int pciehp_sysfs_enable_slot(struct hotplug_slot *hotplug_slot) ctrl->request_result = -ENODEV; pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); wait_event(ctrl->requester, - !atomic_read(&ctrl->pending_events)); + !atomic_read(&ctrl->pending_events) && + !ctrl->ist_running); return ctrl->request_result; case POWERON_STATE: ctrl_info(ctrl, "Slot(%s): Already in powering on state\n", @@ -401,7 +402,8 @@ int pciehp_sysfs_disable_slot(struct hotplug_slot *hotplug_slot) mutex_unlock(&ctrl->state_lock); pciehp_request(ctrl, DISABLE_SLOT); wait_event(ctrl->requester, - !atomic_read(&ctrl->pending_events)); + !atomic_read(&ctrl->pending_events) && + !ctrl->ist_running); return ctrl->request_result; case POWEROFF_STATE: ctrl_info(ctrl, "Slot(%s): Already in powering off state\n", diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pcie
Re: [PATCH 4.4 00/22] 4.4.188-stable review
On Mon, 5 Aug 2019 at 18:34, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.4.188 release. > There are 22 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Wed 07 Aug 2019 12:47:58 PM UTC. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.188-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.4.y > and the diffstat can be found below. > > thanks, > > greg k-h > Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 4.4.188-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: 462a4b2bd3bfaa6e11d1e8180bc95324efc96390 git describe: v4.4.187-23-g462a4b2bd3bf Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.187-23-g462a4b2bd3bf No regressions (compared to build v4.4.187) No fixes (compared to build v4.4.187) Ran 20069 total tests in the following environments and test suites. Environments -- - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites --- * build * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * network-basic-tests * perf * prep-tmp-disk * spectre-meltdown-checker-test * kvm-unit-tests * v4l2-compliance * install-android-platform-tools-r2600 * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite Summary kernel: 4.4.188-rc1 git repo: https://git.linaro.org/lkft/arm64-stable-rc.git git branch: 4.4.188-rc1-hikey-20190805-520 git commit: c9b6c3a54493f03773243bfd3c3ffbc88982ec27 git describe: 4.4.188-rc1-hikey-20190805-520 Test details: https://qa-reports.linaro.org/lkft/linaro-hikey-stable-rc-4.4-oe/build/4.4.188-rc1-hikey-20190805-520 No regressions (compared to build 4.4.187-rc2-hikey-20190802-517) No fixes (compared to build 4.4.187-rc2-hikey-20190802-517) Ran 1550 total tests in the following environments and test suites. Environments -- - hi6220-hikey - arm64 Test Suites --- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance -- Linaro LKFT https://lkft.linaro.org
Re: [PATCH v1 2/3] ASoC: rsnd: Allow reconfiguration of clock rate
Hi Morimoto-san Sorry for the delayed response On 2019/07/22 17:41, Kuninori Morimoto wrote: Hi Jiada The solution looks very over-kill to me, especiallyq [3/3] patch is too much to me. 1st, can we start clock at .hw_param, instead of .prepare ? and stop it at .hw_free ? the reasoning to move start of clock from .init to .prepare by commit 4d230d1271064 ("ASoC: rsnd: fixup not to call clk_get/set under non-atomic") is to prevent clk_{get, set_rate} be called from atomic context, since .hw_params is non atomic context, so I think start of clock can be moved from .prepare to .hw_params 2nd, can we keep usrcnt setup as-is ? I guess we can just avoid rsnd_ssi_master_clk_start() if ssi->rate was not 0 ? I don't fully understand your 2nd question, in case of rsnd_ssi_master_clk_stop(), if avoid rsnd_ssi_master_clk_stop() when ssi->rate is 0 by apply following change static void rsnd_ssi_master_clk_stop(struct rsnd_mod *mod, struct rsnd_dai_stream *io) { ... - if (ssi->usrcnt > 1) + if (ssi->rate == 0) return; ... } then when any IO stream with same SSI calls .hw_free, the other IO stream's clock will be stopped too. Thanks, Jiada similar for rsnd_ssi_master_clk_stop() static int rsnd_ssi_master_clk_start(struct rsnd_mod *mod, struct rsnd_dai_stream *io) { ... if (ssi->rate) return 0; ... } static void rsnd_ssi_master_clk_stop(struct rsnd_mod *mod, struct rsnd_dai_stream *io) { ... - if (ssi->usrcnt > 1) + if (ssi->rate == 0) return; ... } From: Timo Wischer Currently after clock rate is started in .prepare reconfiguration of clock rate is not allowed, unless the stream is stopped. But there is use case in Gstreamer ALSA sink, in case of changed caps Gsreatmer reconfigs and it calls snd_pcm_hw_free() before snd_pcm_prepre(). See gstreamer1.0-plugins-base/ gst-libs/gst/audio/gstaudiobasesink.c: gst_audio_base_sink_setcaps(): - gst_audio_ring_buffer_release() - gst_audio_sink_ring_buffer_release() - gst_alsasink_unprepare() - snd_pcm_hw_free() is called before - gst_audio_ring_buffer_acquire() - gst_audio_sink_ring_buffer_acquire() - gst_alsasink_prepare() - set_hwparams() - snd_pcm_hw_params() - snd_pcm_prepare() The issue mentioned in this commit can be reproduced with the following aplay patch: >diff --git a/aplay/aplay.c b/aplay/aplay.c >@@ -2760,6 +2760,8 @@ static void playback_go(int fd, size_t loaded, > header(rtype, name); > set_params(); >+ hwparams.rate = (hwparams.rate == 48000) ? 44100 : 48000; >+ set_params(); > > while (loaded > chunk_bytes && written < count && !in_aborting) > { > if (pcm_write(audiobuf + written, chunk_size) <= 0) $ aplay -Dplughw:0,0,0 -c8 -fS24_LE -r48000 /dev/zero rcar_sound ec50.sound: SSI parent/child should use same rate rcar_sound ec50.sound: ssi[3] : prepare error -22 rcar_sound ec50.sound: ASoC: cpu DAI prepare error: -22 rsnd_link0: ASoC: prepare FE rsnd_link0 failed this patch address the issue by stop clock in .hw_free, to allow reconfiguration of clock rate without stop of the stream. Signed-off-by: Timo Wischer Signed-off-by: Jiada Wang --- sound/soc/sh/rcar/ssi.c | 53 + 1 file changed, 38 insertions(+), 15 deletions(-) diff --git a/sound/soc/sh/rcar/ssi.c b/sound/soc/sh/rcar/ssi.c index f6a7466622ea..f43937d2c588 100644 --- a/sound/soc/sh/rcar/ssi.c +++ b/sound/soc/sh/rcar/ssi.c @@ -286,7 +286,7 @@ static int rsnd_ssi_master_clk_start(struct rsnd_mod *mod, if (rsnd_ssi_is_multi_slave(mod, io)) return 0; - if (ssi->usrcnt > 0) { + if (ssi->rate) { if (ssi->rate != rate) { dev_err(dev, "SSI parent/child should use same rate\n"); return -EINVAL; @@ -471,13 +471,9 @@ static int rsnd_ssi_init(struct rsnd_mod *mod, struct rsnd_dai_stream *io, struct rsnd_priv *priv) { - struct rsnd_ssi *ssi = rsnd_mod_to_ssi(mod); - if (!rsnd_ssi_is_run_mods(mod, io)) return 0; - ssi->usrcnt++; - rsnd_mod_power_on(mod); rsnd_ssi_config_init(mod, io); @@ -505,18 +501,8 @@ static int rsnd_ssi_quit(struct rsnd_mod *mod, return -EIO; } - rsnd_ssi_master_clk_stop(mod, io); - rsnd_mod_power_off(mod); - ssi->usrcnt--; - - if (!ssi->usrcnt) { - ssi->cr_own = 0; - ssi->cr_mode = 0; - ssi->wsr = 0; - } -
[PATCH v5 7/7] perf intel-pt: Add brief documentation for PEBS via Intel PT
From: Adrian Hunter Document how to select PEBS via Intel PT and how to display synthesized PEBS samples. Signed-off-by: Adrian Hunter Signed-off-by: Alexander Shishkin --- tools/perf/Documentation/intel-pt.txt | 15 +++ 1 file changed, 15 insertions(+) diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt index 50c5b60101bd..8dc513b6607b 100644 --- a/tools/perf/Documentation/intel-pt.txt +++ b/tools/perf/Documentation/intel-pt.txt @@ -919,3 +919,18 @@ amended to take the number of elements as a parameter. Note there is currently no advantage to using Intel PT instead of LBR, but that may change in the future if greater use is made of the data. + + +PEBS via Intel PT += + +Some hardware has the feature to redirect PEBS records to the Intel PT trace. +Recording is selected by using the aux-output config term e.g. + + perf record -c 1 -e cycles/aux-output/ppp -e intel_pt/branch=0/ uname + +Note that currently, software only supports redirecting at most one PEBS event. + +To display PEBS events from the Intel PT trace, use the itrace 'o' option e.g. + + perf script --itrace=oe -- 2.20.1
[PATCH v5 1/7] perf: Allow normal events to output AUX data
In some cases, ordinary (non-AUX) events can generate data for AUX events. For example, PEBS events can come out as records in the Intel PT stream instead of their usual DS records, if configured to do so. One requirement for such events is to consistently schedule together, to ensure that the data from the "AUX output" events isn't lost while their corresponding AUX event is not scheduled. We use grouping to provide this guarantee: an "AUX output" event can be added to a group where an AUX event is a group leader, and provided that the former supports writing to the latter. Signed-off-by: Alexander Shishkin --- include/linux/perf_event.h | 14 + include/uapi/linux/perf_event.h | 3 +- kernel/events/core.c| 93 + 3 files changed, 109 insertions(+), 1 deletion(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e8ad3c590a23..b1e9168516e3 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -246,6 +246,7 @@ struct perf_event; #define PERF_PMU_CAP_ITRACE0x20 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40 #define PERF_PMU_CAP_NO_EXCLUDE0x80 +#define PERF_PMU_CAP_AUX_SOURCE0x100 /** * struct pmu - generic performance monitoring unit @@ -446,6 +447,16 @@ struct pmu { void (*addr_filters_sync) (struct perf_event *event); /* optional */ + /* +* Check if event can be used for aux_output purposes for +* events of this PMU. +* +* Runs from perf_event_open(). Should return 0 for "no match" +* or non-zero for "match". +*/ + int (*aux_output_match) (struct perf_event *event); + /* optional */ + /* * Filter events for PMU-specific reasons. */ @@ -681,6 +692,9 @@ struct perf_event { struct perf_addr_filter_range *addr_filter_ranges; unsigned long addr_filters_gen; + /* for aux_output events */ + struct perf_event *aux_event; + void (*destroy)(struct perf_event *); struct rcu_head rcu_head; diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 7198ddd0c6b1..bb7b271397a6 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -374,7 +374,8 @@ struct perf_event_attr { namespaces : 1, /* include namespaces data */ ksymbol: 1, /* include ksymbol events */ bpf_event : 1, /* include bpf events */ - __reserved_1 : 33; + aux_output : 1, /* generate AUX records instead of events */ + __reserved_1 : 32; union { __u32 wakeup_events;/* wakeup every n events */ diff --git a/kernel/events/core.c b/kernel/events/core.c index c1f52a749db2..2f5504b09163 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1887,6 +1887,89 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx) ctx->generation++; } +static int +perf_aux_output_match(struct perf_event *event, struct perf_event *aux_event) +{ + if (!has_aux(aux_event)) + return 0; + + if (!event->pmu->aux_output_match) + return 0; + + return event->pmu->aux_output_match(aux_event); +} + +static void put_event(struct perf_event *event); +static void event_sched_out(struct perf_event *event, + struct perf_cpu_context *cpuctx, + struct perf_event_context *ctx); + +static void perf_put_aux_event(struct perf_event *event) +{ + struct perf_event_context *ctx = event->ctx; + struct perf_cpu_context *cpuctx = __get_cpu_context(ctx); + struct perf_event *iter; + + /* +* If event uses aux_event tear down the link +*/ + if (event->aux_event) { + iter = event->aux_event; + event->aux_event = NULL; + put_event(iter); + return; + } + + /* +* If the event is an aux_event, tear down all links to +* it from other events. +*/ + for_each_sibling_event(iter, event->group_leader) { + if (iter->aux_event != event) + continue; + + iter->aux_event = NULL; + put_event(event); + + /* +* If it's ACTIVE, schedule it out and put it into ERROR +* state so that we don't try to schedule it again. Note +* that perf_event_enable() will clear the ERROR status. +*/ + event_sched_out(iter, cpuctx, ctx); +
[PATCH v5 3/7] perf tools: Add aux_output attribute flag
From: Adrian Hunter Add aux_output attribute flag to match the kernel's perf_event.h file. Signed-off-by: Adrian Hunter Signed-off-by: Alexander Shishkin --- tools/include/uapi/linux/perf_event.h | 3 ++- tools/perf/util/evsel.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index 7198ddd0c6b1..bb7b271397a6 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -374,7 +374,8 @@ struct perf_event_attr { namespaces : 1, /* include namespaces data */ ksymbol: 1, /* include ksymbol events */ bpf_event : 1, /* include bpf events */ - __reserved_1 : 33; + aux_output : 1, /* generate AUX records instead of events */ + __reserved_1 : 32; union { __u32 wakeup_events;/* wakeup every n events */ diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 52459dd5ad0c..9ec8782d3226 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1684,6 +1684,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr, PRINT_ATTRf(namespaces, p_unsigned); PRINT_ATTRf(ksymbol, p_unsigned); PRINT_ATTRf(bpf_event, p_unsigned); + PRINT_ATTRf(aux_output, p_unsigned); PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned); PRINT_ATTRf(bp_type, p_unsigned); -- 2.20.1
[PATCH v5 2/7] perf/x86/intel: Support PEBS output to PT
If PEBS declares ability to output its data to Intel PT stream, use the aux_output attribute bit to enable PEBS data output to PT. This requires a PT event to be present and scheduled in the same context. Unlike the DS area, the kernel does not extract PEBS records from the PT stream to generate corresponding records in the perf stream, because that would require real time in-kernel PT decoding, which is not feasible. The PMI, however, can still be used. The output setting is per-CPU, so all PEBS events must be either writing to PT or to the DS area, therefore, in case of conflict, the conflicting event will fail to schedule, allowing the rotation logic to alternate between the PEBS->PT and PEBS->DS events. Signed-off-by: Alexander Shishkin --- arch/x86/events/core.c | 34 + arch/x86/events/intel/core.c | 18 +++ arch/x86/events/intel/ds.c | 51 +++- arch/x86/events/intel/pt.c | 5 arch/x86/events/perf_event.h | 17 +++ arch/x86/include/asm/intel_pt.h | 2 ++ arch/x86/include/asm/msr-index.h | 4 +++ 7 files changed, 130 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index cfe256ca76df..384c4936aedd 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -1005,6 +1005,27 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader, /* current number of events already accepted */ n = cpuc->n_events; + if (!cpuc->n_events) + cpuc->pebs_output = 0; + + if (!cpuc->is_fake && leader->attr.precise_ip) { + /* +* For PEBS->PT, if !aux_event, the group leader (PT) went +* away, the group was broken down and this singleton event +* can't schedule any more. +*/ + if (is_pebs_pt(leader) && !leader->aux_event) + return -EINVAL; + + /* +* pebs_output: 0: no PEBS so far, 1: PT, 2: DS +*/ + if (cpuc->pebs_output && + cpuc->pebs_output != is_pebs_pt(leader) + 1) + return -EINVAL; + + cpuc->pebs_output = is_pebs_pt(leader) + 1; + } if (is_x86_event(leader)) { if (n >= max_count) @@ -2241,6 +2262,17 @@ static int x86_pmu_check_period(struct perf_event *event, u64 value) return 0; } +static int x86_pmu_aux_output_match(struct perf_event *event) +{ + if (!(pmu.capabilities & PERF_PMU_CAP_AUX_SOURCE)) + return 0; + + if (x86_pmu.aux_output_match) + return x86_pmu.aux_output_match(event); + + return 0; +} + static struct pmu pmu = { .pmu_enable = x86_pmu_enable, .pmu_disable= x86_pmu_disable, @@ -2266,6 +2298,8 @@ static struct pmu pmu = { .sched_task = x86_pmu_sched_task, .task_ctx_size = sizeof(struct x86_perf_task_context), .check_period = x86_pmu_check_period, + + .aux_output_match = x86_pmu_aux_output_match, }; void arch_perf_update_userpage(struct perf_event *event, diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 648260b5f367..28459f4b795a 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -3298,6 +3299,13 @@ static int intel_pmu_hw_config(struct perf_event *event) } } + if (event->attr.aux_output) { + if (!event->attr.precise_ip) + return -EINVAL; + + event->hw.flags |= PERF_X86_EVENT_PEBS_VIA_PT; + } + if (event->attr.type != PERF_TYPE_RAW) return 0; @@ -3811,6 +3819,14 @@ static int intel_pmu_check_period(struct perf_event *event, u64 value) return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0; } +static int intel_pmu_aux_output_match(struct perf_event *event) +{ + if (!x86_pmu.intel_cap.pebs_output_pt_available) + return 0; + + return is_intel_pt_event(event); +} + PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63"); PMU_FORMAT_ATTR(ldlat, "config1:0-15"); @@ -3935,6 +3951,8 @@ static __initconst const struct x86_pmu intel_pmu = { .sched_task = intel_pmu_sched_task, .check_period = intel_pmu_check_period, + + .aux_output_match = intel_pmu_aux_output_match, }; static __init void intel_clovertown_quirk(void) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index f1269e804e9b..65a3cea04f60 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -902,6 +902,9 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event) */ static inline bool pebs_needs_sched_cb(struc
[PATCH v5 0/7] perf, intel: Add support for PEBS output to Intel PT
Hi Peter, Sixth attempt at the PEBS-via-PT feature. The previous ones were [1], [2], [3], [4], [5]. This one addresses the most recent review comments, mainly renaming the new attribute bit and everything related to 'aux_output'. Tooling also changed to use /aux-output/ config term. The PEBS feature: output to Intel PT stream instead of the DS area. It's theoretically useful in virtualized environments, where DS area can't be used. It's also good for those who are interested in instruction trace for context of the PEBS events. As PEBS goes, it can provide LBR context with all the branch-related information that PT doesn't provide at the moment. PEBS records are packetized in the PT stream, so instead of extracting them in the PMI, we leave it to the perf tool, because real time PT decoding is not practical. [1] https://marc.info/?l=linux-kernel&m=155679423430002 [2] https://marc.info/?l=linux-kernel&m=156225605132606 [3] https://marc.info/?l=linux-kernel&m=156458152126310 [4] https://marc.info/?l=linux-kernel&m=156458348626999 [5] https://marc.info/?l=linux-kernel&m=156498939722450 Adrian Hunter (5): perf tools: Add aux_output attribute flag perf tools: Add itrace option 'o' to synthesize aux-output events perf intel-pt: Process options for PEBS event synthesis perf tools: Add aux-output config term perf intel-pt: Add brief documentation for PEBS via Intel PT Alexander Shishkin (2): perf: Allow normal events to output AUX data perf/x86/intel: Support PEBS output to PT arch/x86/events/core.c | 34 + arch/x86/events/intel/core.c | 18 + arch/x86/events/intel/ds.c | 51 - arch/x86/events/intel/pt.c | 5 ++ arch/x86/events/perf_event.h | 17 + arch/x86/include/asm/intel_pt.h | 2 + arch/x86/include/asm/msr-index.h | 4 + include/linux/perf_event.h | 14 include/uapi/linux/perf_event.h | 3 +- kernel/events/core.c | 93 tools/include/uapi/linux/perf_event.h| 3 +- tools/perf/Documentation/intel-pt.txt| 15 tools/perf/Documentation/itrace.txt | 2 + tools/perf/Documentation/perf-record.txt | 2 + tools/perf/arch/x86/util/intel-pt.c | 23 ++ tools/perf/util/auxtrace.c | 4 + tools/perf/util/auxtrace.h | 3 + tools/perf/util/evsel.c | 4 + tools/perf/util/evsel.h | 2 + tools/perf/util/intel-pt.c | 18 + tools/perf/util/parse-events.c | 8 ++ tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 1 + 23 files changed, 324 insertions(+), 3 deletions(-) -- 2.20.1
[PATCH v5 4/7] perf tools: Add itrace option 'o' to synthesize aux-output events
From: Adrian Hunter Add itrace option 'o' to synthesize events recorded in the AUX area due to the use of perf record's aux-output config term. Signed-off-by: Adrian Hunter Signed-off-by: Alexander Shishkin --- tools/perf/Documentation/itrace.txt | 2 ++ tools/perf/util/auxtrace.c | 4 tools/perf/util/auxtrace.h | 3 +++ 3 files changed, 9 insertions(+) diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt index c2182cbabde3..82ff7dad40c2 100644 --- a/tools/perf/Documentation/itrace.txt +++ b/tools/perf/Documentation/itrace.txt @@ -5,6 +5,8 @@ x synthesize transactions events w synthesize ptwrite events p synthesize power events + o synthesize other events recorded due to the use + of aux-output (refer to perf record) e synthesize error events d create a debug log g synthesize a call chain (use with i or x) diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index ec0af36697c4..cd763f9e7400 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -964,6 +964,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts, synth_opts->transactions = true; synth_opts->ptwrites = true; synth_opts->pwr_events = true; + synth_opts->other_events = true; synth_opts->errors = true; if (no_sample) { synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS; @@ -1061,6 +1062,9 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str, case 'p': synth_opts->pwr_events = true; break; + case 'o': + synth_opts->other_events = true; + break; case 'e': synth_opts->errors = true; break; diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index e9b4c5edf78b..d2001fc2625b 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -60,6 +60,8 @@ enum itrace_period_type { * @transactions: whether to synthesize events for transactions * @ptwrites: whether to synthesize events for ptwrites * @pwr_events: whether to synthesize power events + * @other_events: whether to synthesize other events recorded due to the use of + *aux_output * @errors: whether to synthesize decoder error events * @dont_decode: whether to skip decoding entirely * @log: write a decoding log @@ -86,6 +88,7 @@ struct itrace_synth_opts { booltransactions; boolptwrites; boolpwr_events; + boolother_events; boolerrors; booldont_decode; boollog; -- 2.20.1
[PATCH v5 6/7] perf tools: Add aux-output config term
From: Adrian Hunter Expose the aux_output attribute flag to the user to configure, by adding a config term 'aux-output'. For events that support it, selection of 'aux-output' causes the generation of AUX records instead of event records. This requires that an AUX area event is also provided. Signed-off-by: Adrian Hunter Signed-off-by: Alexander Shishkin --- tools/perf/Documentation/perf-record.txt | 2 ++ tools/perf/util/evsel.c | 3 +++ tools/perf/util/evsel.h | 2 ++ tools/perf/util/parse-events.c | 8 tools/perf/util/parse-events.h | 1 + tools/perf/util/parse-events.l | 1 + 6 files changed, 17 insertions(+) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 15e0fa87241b..566050066c77 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -60,6 +60,8 @@ OPTIONS - 'name' : User defined event name. Single quotes (') may be used to escape symbols in the name from parsing by shell and tool like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'. + - 'aux-output': Generate AUX records instead of events. This requires + that an AUX area event is also provided. See the linkperf:perf-list[1] man page for more parameters. diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 9ec8782d3226..0530ad796033 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -832,6 +832,9 @@ static void apply_config_terms(struct perf_evsel *evsel, break; case PERF_EVSEL__CONFIG_TERM_PERCORE: break; + case PERF_EVSEL__CONFIG_TERM_AUX_SOURCE: + attr->aux_output = term->val.aux_output ? 1 : 0; + break; default: break; } diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index cad54e8ba522..fa18358a806b 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -51,6 +51,7 @@ enum term_type { PERF_EVSEL__CONFIG_TERM_DRV_CFG, PERF_EVSEL__CONFIG_TERM_BRANCH, PERF_EVSEL__CONFIG_TERM_PERCORE, + PERF_EVSEL__CONFIG_TERM_AUX_SOURCE, }; struct perf_evsel_config_term { @@ -69,6 +70,7 @@ struct perf_evsel_config_term { char*branch; unsigned long max_events; boolpercore; + boolaux_output; } val; bool weak; }; diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 371ff3aee769..20ad7eb2c27f 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -952,6 +952,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = { [PARSE_EVENTS__TERM_TYPE_NOOVERWRITE] = "no-overwrite", [PARSE_EVENTS__TERM_TYPE_DRV_CFG] = "driver-config", [PARSE_EVENTS__TERM_TYPE_PERCORE] = "percore", + [PARSE_EVENTS__TERM_TYPE_AUX_SOURCE]= "aux-output", }; static bool config_term_shrinked; @@ -1072,6 +1073,9 @@ do { \ return -EINVAL; } break; + case PARSE_EVENTS__TERM_TYPE_AUX_SOURCE: + CHECK_TYPE_VAL(NUM); + break; default: err->str = strdup("unknown term"); err->idx = term->err_term; @@ -1122,6 +1126,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr, case PARSE_EVENTS__TERM_TYPE_MAX_EVENTS: case PARSE_EVENTS__TERM_TYPE_OVERWRITE: case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE: + case PARSE_EVENTS__TERM_TYPE_AUX_SOURCE: return config_term_common(attr, term, err); default: if (err) { @@ -1214,6 +1219,9 @@ do { \ ADD_CONFIG_TERM(PERCORE, percore, term->val.num ? true : false); break; + case PARSE_EVENTS__TERM_TYPE_AUX_SOURCE: + ADD_CONFIG_TERM(AUX_SOURCE, aux_output, term->val.num ? 1 : 0); + break; default: break; } diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index f7139e1a2fd3..782195ce8238 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -76,6 +76,7 @@ enum { PARSE_EVENTS__TERM_TYPE_OVERWRITE, PARSE_EVENTS__TERM_TYPE_DRV_CFG, PARSE_EVENTS__TERM_TYPE_PERCORE, + PARSE_EVENTS__TERM_TYPE_AUX_SOURCE, __PARSE_EVENTS__TERM_TYPE_NR, }; diff --git a/tools/perf/ut
Re: [PATCH v4 07/10] regulator: mt6358: Add support for MT6358 regulator
Hi Mark, On Mon, 2019-08-05 at 14:10 +0100, Mark Brown wrote: > On Mon, Aug 05, 2019 at 01:21:55PM +0800, Hsin-Hsiung Wang wrote: > > > +static const u32 vmch_voltages[] = { > > + 290, 300, 330, > > +}; > > > +static const u32 vemc_voltages[] = { > > + 290, 300, 330, > > +}; > > Several of these tables appear to be identical. > I will use the same voltage table in the next patch. > > +static inline unsigned int mt6358_map_mode(unsigned int mode) > > +{ > > + return mode == MT6358_BUCK_MODE_AUTO ? > > + REGULATOR_MODE_NORMAL : REGULATOR_MODE_FAST; > > +} > > There is no need for this to be an inline and please write normal > conditional statements to improve legibility. There's other examples in > the driver. > will fix it in the next patch. > > +static int mt6358_get_buck_voltage_sel(struct regulator_dev *rdev) > > +{ > > + int ret, regval; > > + struct mt6358_regulator_info *info = rdev_get_drvdata(rdev); > > + > > + ret = regmap_read(rdev->regmap, info->da_vsel_reg, ®val); > > + if (ret != 0) { > > + dev_info(&rdev->dev, > > +"Failed to get mt6358 Buck %s vsel reg: %d\n", > > +info->desc.name, ret); > > dev_err() for errors here and throughout the driver. > will fix it in the next patch. > > + return ret; > > + } > > + > > + ret = (regval >> info->da_vsel_shift) & info->da_vsel_mask; > > + > > + return ret; > > +} > > This looks like a standard get_voltage_sel_regmap()? > MT6358 has buck voltage status registers to show the actual output voltage and the registers are different from the voltage setting registers. We want to get the actual voltage output, so we use the da_vsel status registers here. > > +err_mode: > > + if (ret != 0) > > + return ret; > > + > > + return 0; > > Or just return ret unconditionally? will modify it to return ret unconditionally in the next patch. Thanks a lot. > ___ > Linux-mediatek mailing list > linux-media...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-mediatek
[PATCH v5 5/7] perf intel-pt: Process options for PEBS event synthesis
From: Adrian Hunter Process synth_opts.other_events and attr.aux_output to set up for synthesizing PEBs via Intel PT events. Signed-off-by: Adrian Hunter Signed-off-by: Alexander Shishkin --- tools/perf/arch/x86/util/intel-pt.c | 23 +++ tools/perf/util/intel-pt.c | 18 ++ 2 files changed, 41 insertions(+) diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c index 609088c01e3a..9a66e1575dd3 100644 --- a/tools/perf/arch/x86/util/intel-pt.c +++ b/tools/perf/arch/x86/util/intel-pt.c @@ -548,6 +548,26 @@ static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu, evsel->attr.config); } +/* + * Currently, there is not enough information to disambiguate different PEBS + * events, so only allow one. + */ +static bool intel_pt_too_many_aux_output(struct perf_evlist *evlist) +{ + struct perf_evsel *evsel; + int aux_output_cnt = 0; + + evlist__for_each_entry(evlist, evsel) + aux_output_cnt += !!evsel->attr.aux_output; + + if (aux_output_cnt > 1) { + pr_err(INTEL_PT_PMU_NAME " supports at most one event with aux-output\n"); + return true; + } + + return false; +} + static int intel_pt_recording_options(struct auxtrace_record *itr, struct perf_evlist *evlist, struct record_opts *opts) @@ -588,6 +608,9 @@ static int intel_pt_recording_options(struct auxtrace_record *itr, return -EINVAL; } + if (intel_pt_too_many_aux_output(evlist)) + return -EINVAL; + if (!opts->full_auxtrace) return 0; diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c index df061599fef4..04ce74a66fee 100644 --- a/tools/perf/util/intel-pt.c +++ b/tools/perf/util/intel-pt.c @@ -2894,6 +2894,22 @@ static int intel_pt_synth_events(struct intel_pt *pt, return 0; } +static void intel_pt_setup_pebs_events(struct intel_pt *pt) +{ + struct perf_evsel *evsel; + + if (!pt->synth_opts.other_events) + return; + + evlist__for_each_entry(pt->session->evlist, evsel) { + if (evsel->attr.aux_output && evsel->id) { + pt->sample_pebs = true; + pt->pebs_evsel = evsel; + return; + } + } +} + static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist) { struct perf_evsel *evsel; @@ -3263,6 +3279,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event, if (err) goto err_delete_thread; + intel_pt_setup_pebs_events(pt); + err = auxtrace_queues__process_index(&pt->queues, session); if (err) goto err_delete_thread; -- 2.20.1
Re: [PATCH] iio: adc: max9611: Fix temperature reading in probe
Hi Jonathan, On Mon, Aug 05, 2019 at 06:12:44PM +0100, Jonathan Cameron wrote: > On Mon, 5 Aug 2019 17:55:15 +0200 > Jacopo Mondi wrote: > > > The max9611 driver reads the die temperature at probe time to validate > > the communication channel. Use the actual read value to perform the test > > instead of the read function return value, which was mistakenly used so > > far. > > > > The temperature reading test was only successful because the 0 return > > value is in the range of supported temperatures. > > > > Fixes: 69780a3bbc0b ("iio: adc: Add Maxim max9611 ADC driver") > > Signed-off-by: Jacopo Mondi > > Applied to the fixes-togreg branch of iio.git and marked for > stable. That'll be a bit fiddly given other changes around this > so we may need to do backports. > Indeed, I should have mentioned this patch depends on Joe's ae8cc91a7d85 ("iio: adc: max9611: Fix misuse of GENMASK macro") which is now in linux-next, otherwise it might atually trigger errors due to the wrong mask value. I wonder if there's a way to keep track of these dependencies for the sake of backporting, or it's an operation that has to be carried out manually... Thanks j > > > --- > > drivers/iio/adc/max9611.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/iio/adc/max9611.c b/drivers/iio/adc/max9611.c > > index 917223d5ff5b..e9f6b1da1b94 100644 > > --- a/drivers/iio/adc/max9611.c > > +++ b/drivers/iio/adc/max9611.c > > @@ -480,7 +480,7 @@ static int max9611_init(struct max9611_dev *max9611) > > if (ret) > > return ret; > > > > - regval = ret & MAX9611_TEMP_MASK; > > + regval &= MAX9611_TEMP_MASK; > > > > if ((regval > MAX9611_TEMP_MAX_POS && > > regval < MAX9611_TEMP_MIN_NEG) || > > -- > > 2.22.0 > > > signature.asc Description: PGP signature
Re: [PATCH RFC] modpost: Support I2C Aliases from OF tables
On Tue, Aug 6, 2019 at 12:48 AM Javier Martinez Canillas wrote: > On 8/1/19 4:17 AM, Masahiro Yamada wrote: > So I think that we should either: > > a) take Kieran's patch or b) remove the i2c_of_match_device_sysfs() fallback > for OF and require an I2C device table for sysfs instantiation and matching. > > > If a driver supports DT and devices are instantiated via DT, > > in which situation is this useful? > > Is useful if you don't have all the I2C devices described in the DT. For > example > a daughterboard with an I2C device is connected to a board through an > expansion > slot or an I2C device connected directly to I2C pins exposed in a machine. > > In these cases your I2C devices won't be static so users might want to use the > sysfs user-space interface to instantiate the I2C devices, i.e: > > # echo eeprom 0x50 > /sys/bus/i2c/devices/i2c-3/new_device > > as explained in > https://github.com/torvalds/linux/blob/master/Documentation/i2c/instantiating-devices#L207 Does this actually work with DT names, too? E.g. # echo atmel,24c02 > /sys/bus/i2c/devices/i2c-3/new_device Still leaves us with legacy names for backwards compatibility. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v2 3/3] soc/tegra: regulators: Add regulators coupler for Tegra30
On Mon, Aug 05, 2019 at 02:03:29PM +0300, Dmitry Osipenko wrote: > 05.08.2019 11:33, Peter De Schrijver пишет: > > On Fri, Aug 02, 2019 at 05:39:23PM +0300, Dmitry Osipenko wrote: > >> 02.08.2019 17:05, Peter De Schrijver пишет: > >>> On Thu, Jul 25, 2019 at 06:18:32PM +0300, Dmitry Osipenko wrote: > Add regulators coupler for Tegra30 SoCs that performs voltage balancing > of a coupled regulators and thus provides voltage scaling functionality. > > There are 2 coupled regulators on all Tegra30 SoCs: CORE and CPU. The > coupled regulator voltages shall be in a range of 300mV from each other > and CORE voltage shall be higher than the CPU by N mV, where N depends > on the CPU voltage. > > Signed-off-by: Dmitry Osipenko > --- > drivers/soc/tegra/Kconfig | 4 + > drivers/soc/tegra/Makefile | 1 + > drivers/soc/tegra/regulators-tegra30.c | 316 + > 3 files changed, 321 insertions(+) > create mode 100644 drivers/soc/tegra/regulators-tegra30.c > > >>> ... > >>> > + > +static int tegra30_core_cpu_limit(int cpu_uV) > +{ > +if (cpu_uV < 80) > +return 95; > + > +if (cpu_uV < 90) > +return 100; > + > +if (cpu_uV < 100) > +return 110; > + > +if (cpu_uV < 110) > +return 120; > + > +if (cpu_uV < 125) { > +switch (tegra_sku_info.cpu_speedo_id) { > +case 0 ... 1: > >>> Aren't we supposed to add /* fall through */ here now? > >> > >> There is no compiler warning if there is nothing in-between of the > >> case-switches, so annotation isn't really necessary here. Of course it > >> is possible to add an explicit annotation just to make clear the > >> fall-through intention. > >> > > > > Ah. Ok. Whatever you want then :) > > I'll add the comments if there will be a need to re-spin this series. > > +case 4: > +case 7 ... 8: > +return 120; > + > +default: > +return 130; > +} > +} > + > >>> > >>> Other than that, this looks ok to me. > >> > >> Awesome, thank you very much! Explicit ACK will be appreciated as well. > > > > Acked-By: Peter De Schrijver All of them. Peter.
[PATCH] arm:unwind: fix backtrace error with unwind_table
For arm, when load_module success, the mod->init_layout.base would be free in function do_free_init, but do not remove it's unwind table from the unwind_tables' list. And later the above mod->init_layout.base would alloc for another module's text section, and add to the unwind_tables which cause one address can found more than two unwind table in the unwind_tables' list, therefore may get to errror unwind table to backtrace, and get an error backtrace. Signed-off-by: chenzefeng --- arch/arm/kernel/module.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c index deef17f..a4eb5f4 100644 --- a/arch/arm/kernel/module.c +++ b/arch/arm/kernel/module.c @@ -403,14 +403,24 @@ int module_finalize(const Elf32_Ehdr *hdr, const Elf_Shdr *sechdrs, return 0; } -void -module_arch_cleanup(struct module *mod) -{ + #ifdef CONFIG_ARM_UNWIND +void module_arch_cleanup(struct module *mod) +{ int i; for (i = 0; i < ARM_SEC_MAX; i++) - if (mod->arch.unwind[i]) + if (mod->arch.unwind[i]) { unwind_table_del(mod->arch.unwind[i]); -#endif + mod->arch.unwind[i] = NULL; + } } + +void module_arch_freeing_init(struct module *mod) +{ + if (mod->arch.unwind[ARM_SEC_INIT]) { + unwind_table_del(mod->arch.unwind[ARM_SEC_INIT]); + mod->arch.unwind[ARM_SEC_INIT] = NULL; + } +} +#endif -- 1.8.5.6
[PATCH net] net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
From: Chen-Yu Tsai The sun4i-emac uses the "phy" property to find the PHY it's supposed to use. This property was deprecated in favor of "phy-handle" in commit 8c5b09447625 ("dt-bindings: net: sun4i-emac: Convert the binding to a schemas"). Add support for this new property name, and fall back to the old one in case the device tree hasn't been updated. Signed-off-by: Chen-Yu Tsai --- The aforementioned commit is in v5.3-rc1. It would be nice to have the driver fix in the same release. In addition, an update for the device tree has been queued up for v5.4, which made us realize the driver needs an update. --- drivers/net/ethernet/allwinner/sun4i-emac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/allwinner/sun4i-emac.c b/drivers/net/ethernet/allwinner/sun4i-emac.c index 3434730a7699..0537df06a9b5 100644 --- a/drivers/net/ethernet/allwinner/sun4i-emac.c +++ b/drivers/net/ethernet/allwinner/sun4i-emac.c @@ -860,7 +860,9 @@ static int emac_probe(struct platform_device *pdev) goto out_clk_disable_unprepare; } - db->phy_node = of_parse_phandle(np, "phy", 0); + db->phy_node = of_parse_phandle(np, "phy-handle", 0); + if (!db->phy_node) + db->phy_node = of_parse_phandle(np, "phy", 0); if (!db->phy_node) { dev_err(&pdev->dev, "no associated PHY\n"); ret = -ENODEV; -- 2.20.1
Re: [PATCH RFC] mm/memcontrol: reclaim severe usage over high limit in get_user_pages loop
On Tue 06-08-19 10:19:49, Konstantin Khlebnikov wrote: > On 8/6/19 10:07 AM, Michal Hocko wrote: > > On Fri 02-08-19 13:44:38, Michal Hocko wrote: > > [...] > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > > index ba9138a4a1de..53a35c526e43 100644 > > > > > --- a/mm/memcontrol.c > > > > > +++ b/mm/memcontrol.c > > > > > @@ -2429,8 +2429,12 @@ static int try_charge(struct mem_cgroup > > > > > *memcg, gfp_t gfp_mask, > > > > > schedule_work(&memcg->high_work); > > > > > break; > > > > > } > > > > > - current->memcg_nr_pages_over_high += batch; > > > > > - set_notify_resume(current); > > > > > + if (gfpflags_allow_blocking(gfp_mask)) { > > > > > + reclaim_high(memcg, nr_pages, > > > > > GFP_KERNEL); > > > > > > ups, this should be s@GFP_KERNEL@gfp_mask@ > > > > > > > > + } else { > > > > > + current->memcg_nr_pages_over_high += > > > > > batch; > > > > > + set_notify_resume(current); > > > > > + } > > > > > break; > > > > > } > > > > > } while ((memcg = parent_mem_cgroup(memcg))); > > > > > > > > > Should I send an official patch for this? > > > > I prefer to keep it as is while we have no better solution. Fine with me. -- Michal Hocko SUSE Labs
[PATCH 2/2] habanalabs: improve security in Debug IOCTL
This patch improves the security in the Debug IOCTL. It adds checks that: - The register index value is in the allowed range for all opcodes. - The event types number is in the allowed range in SPMU enable. - The events number is in the allowed range in SPMU disable. Signed-off-by: Omer Shpigelman --- drivers/misc/habanalabs/goya/goya_coresight.c | 72 --- 1 file changed, 63 insertions(+), 9 deletions(-) diff --git a/drivers/misc/habanalabs/goya/goya_coresight.c b/drivers/misc/habanalabs/goya/goya_coresight.c index d7ec7ad84cc6..4f7ffc137ab7 100644 --- a/drivers/misc/habanalabs/goya/goya_coresight.c +++ b/drivers/misc/habanalabs/goya/goya_coresight.c @@ -15,6 +15,12 @@ #define GOYA_PLDM_CORESIGHT_TIMEOUT_USEC (CORESIGHT_TIMEOUT_USEC * 100) +#define SPMU_SECTION_SIZE DMA_CH_0_CS_SPMU_MAX_OFFSET +#define SPMU_EVENT_TYPES_OFFSET0x400 +#define SPMU_MAX_EVENT_TYPES ((SPMU_SECTION_SIZE - \ + SPMU_EVENT_TYPES_OFFSET) / 4) +#define SPMU_MAX_EVENTS(SPMU_SECTION_SIZE / 4) + static u64 debug_stm_regs[GOYA_STM_LAST + 1] = { [GOYA_STM_CPU] = mmCPU_STM_BASE, [GOYA_STM_DMA_CH_0_CS] = mmDMA_CH_0_CS_STM_BASE, @@ -226,9 +232,16 @@ static int goya_config_stm(struct hl_device *hdev, struct hl_debug_params *params) { struct hl_debug_params_stm *input; - u64 base_reg = debug_stm_regs[params->reg_idx] - CFG_BASE; + u64 base_reg; int rc; + if (params->reg_idx >= ARRAY_SIZE(debug_stm_regs)) { + dev_err(hdev->dev, "Invalid register index in STM\n"); + return -EINVAL; + } + + base_reg = debug_stm_regs[params->reg_idx] - CFG_BASE; + WREG32(base_reg + 0xFB0, CORESIGHT_UNLOCK); if (params->enable) { @@ -288,10 +301,17 @@ static int goya_config_etf(struct hl_device *hdev, struct hl_debug_params *params) { struct hl_debug_params_etf *input; - u64 base_reg = debug_etf_regs[params->reg_idx] - CFG_BASE; + u64 base_reg; u32 val; int rc; + if (params->reg_idx >= ARRAY_SIZE(debug_etf_regs)) { + dev_err(hdev->dev, "Invalid register index in ETF\n"); + return -EINVAL; + } + + base_reg = debug_etf_regs[params->reg_idx] - CFG_BASE; + WREG32(base_reg + 0xFB0, CORESIGHT_UNLOCK); val = RREG32(base_reg + 0x304); @@ -445,11 +465,18 @@ static int goya_config_etr(struct hl_device *hdev, static int goya_config_funnel(struct hl_device *hdev, struct hl_debug_params *params) { - WREG32(debug_funnel_regs[params->reg_idx] - CFG_BASE + 0xFB0, - CORESIGHT_UNLOCK); + u64 base_reg; + + if (params->reg_idx >= ARRAY_SIZE(debug_funnel_regs)) { + dev_err(hdev->dev, "Invalid register index in FUNNEL\n"); + return -EINVAL; + } - WREG32(debug_funnel_regs[params->reg_idx] - CFG_BASE, - params->enable ? 0x33F : 0); + base_reg = debug_funnel_regs[params->reg_idx] - CFG_BASE; + + WREG32(base_reg + 0xFB0, CORESIGHT_UNLOCK); + + WREG32(base_reg, params->enable ? 0x33F : 0); return 0; } @@ -458,9 +485,16 @@ static int goya_config_bmon(struct hl_device *hdev, struct hl_debug_params *params) { struct hl_debug_params_bmon *input; - u64 base_reg = debug_bmon_regs[params->reg_idx] - CFG_BASE; + u64 base_reg; u32 pcie_base = 0; + if (params->reg_idx >= ARRAY_SIZE(debug_bmon_regs)) { + dev_err(hdev->dev, "Invalid register index in BMON\n"); + return -EINVAL; + } + + base_reg = debug_bmon_regs[params->reg_idx] - CFG_BASE; + WREG32(base_reg + 0x104, 1); if (params->enable) { @@ -522,7 +556,7 @@ static int goya_config_bmon(struct hl_device *hdev, static int goya_config_spmu(struct hl_device *hdev, struct hl_debug_params *params) { - u64 base_reg = debug_spmu_regs[params->reg_idx] - CFG_BASE; + u64 base_reg; struct hl_debug_params_spmu *input = params->input; u64 *output; u32 output_arr_len; @@ -531,6 +565,13 @@ static int goya_config_spmu(struct hl_device *hdev, u32 cycle_cnt_idx; int i; + if (params->reg_idx >= ARRAY_SIZE(debug_spmu_regs)) { + dev_err(hdev->dev, "Invalid register index in SPMU\n"); + return -EINVAL; + } + + base_reg = debug_spmu_regs[params->reg_idx] - CFG_BASE; + if (params->enable) { input = params->input; @@ -543,11 +584,18 @@ static int goya_config_spmu(struct hl_device *hdev, return -EINVAL; } + if (input->event_types_num > SPMU_MAX_EVENT_TYPES) { + dev_err(hdev->dev, +
[PATCH 1/2] habanalabs: use default structure for user input in Debug IOCTL
This patch fixes a possible kernel crash when a user provides a too small input structure to the Debug IOCTL. The fix sets a default input structure and copies to it the user data. In case the user provided as input a too small structure, the code will use the default values taken from the default structure. Note that in contrary to the input structure, the user can provide an output structure with changing size or no size at all. Therefore the user output structure validation is already done in the Debug logic later on. Signed-off-by: Omer Shpigelman --- drivers/misc/habanalabs/habanalabs_ioctl.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/misc/habanalabs/habanalabs_ioctl.c b/drivers/misc/habanalabs/habanalabs_ioctl.c index ce0cd93a8421..3ce65459b01c 100644 --- a/drivers/misc/habanalabs/habanalabs_ioctl.c +++ b/drivers/misc/habanalabs/habanalabs_ioctl.c @@ -144,13 +144,16 @@ static int debug_coresight(struct hl_device *hdev, struct hl_debug_args *args) params->op = args->op; if (args->input_ptr && args->input_size) { - input = memdup_user(u64_to_user_ptr(args->input_ptr), - args->input_size); - if (IS_ERR(input)) { - rc = PTR_ERR(input); - input = NULL; - dev_err(hdev->dev, - "error %d when copying input debug data\n", rc); + input = kzalloc(hl_debug_struct_size[args->op], GFP_KERNEL); + if (!input) { + rc = -ENOMEM; + goto out; + } + + if (copy_from_user(input, u64_to_user_ptr(args->input_ptr), + args->input_size)) { + rc = -EFAULT; + dev_err(hdev->dev, "failed to copy input debug data\n"); goto out; } -- 2.17.1
Re: [PATCH] iio: adc: sc27xx: Change to polling mode to read data
Hi Jonathan, On Mon, 5 Aug 2019 at 21:50, Jonathan Cameron wrote: > > On Mon, 29 Jul 2019 10:19:48 +0800 > Baolin Wang wrote: > > > Hi Jonathan, > > > > On Sun, 28 Jul 2019 at 01:27, Jonathan Cameron wrote: > > > > > > On Thu, 25 Jul 2019 14:33:50 +0800 > > > Baolin Wang wrote: > > > > > > > From: Freeman Liu > > > > > > > > On Spreadtrum platform, the headphone will read one ADC channel multiple > > > > times to identify the headphone type, and the headphone identification > > > > is > > > > sensitive of the ADC reading time. And we found it will take longer time > > > > to reading ADC data by using interrupt mode comparing with the polling > > > > mode, thus we should change to polling mode to improve the efficiency > > > > of reading data, which can identify the headphone type successfully. > > > > > > > > Signed-off-by: Freeman Liu > > > > Signed-off-by: Baolin Wang > > > > > > Hi, > > > > > > My concerns with this sort of approach is that we may be sacrificing power > > > efficiency for some usecases to support one demanding one. > > > > > > The maximum sleep time is 1 second (I think) which is probably too long > > > to poll a register for in general. > > > > 1 second is the timeout time, that means something wrong when reading > > the data taking 1 second, and we will poll the register status every > > 500 us. > > From the testing, polling mode takes less time than interrupt mode > > when reading ADC data multiple times, so polling mode did not > > sacrifice power > > efficiency. > > Hmm. I'll go with a probably on that, depends on interrupt response > latency etc so isn't entirely obvious. Faster response doesn't necessarily > mean lower power. > > > > > > Is there some way we can bound that time and perhaps switch between > > > interrupt and polling modes depending on how long we expect to wait? > > > > I do not think the interrupt mode is needed any more, since the ADC > > reading is so fast enough usually. Thanks. > The reason for interrupts in such devices is usually precisely the opposite. > > You do it because things are slow enough that you can go to sleep > for a long time before the interrupt occurs. > > So question becomes whether there are circumstances in which we are > running with long timescales and would benefit from using interrupts. >From our testing, the ADC version time is usually about 100us, it will be faster to get data if we poll every 50us in this case. But if we change to use interrupt mode, it will take millisecond level time to get data. That will cause problems for those time sensitive scenarios, like headphone detection, that's the main reason we can not use interrupt mode. For those non-time-sensitive scenarios, yes, I agree with you, the interrupt mode will get a better power efficiency. But ADC driver can not know what scenarios asked by consumers, so changing to polling mode seems the easiest way to solve the problem, and we've applied this patch in our downstream kernel for a while, we did not see any other problem. Thanks for your comments. -- Baolin Wang Best Regards
[PATCH] watchdog: jz4740: Fix unused variable warning in jz4740_wdt_probe
Fix the following warning (Building: ci20_defconfig mips): drivers/watchdog/jz4740_wdt.c: In function ‘jz4740_wdt_probe’: drivers/watchdog/jz4740_wdt.c:165:6: warning: unused variable ‘ret’ [-Wunused-variable] int ret; ^~~ Fixes: 9ee644c9326c ("watchdog: jz4740_wdt: drop warning after registering device") Signed-off-by: Gustavo A. R. Silva --- drivers/watchdog/jz4740_wdt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/watchdog/jz4740_wdt.c b/drivers/watchdog/jz4740_wdt.c index d4a90916dd38..c6052ae54f32 100644 --- a/drivers/watchdog/jz4740_wdt.c +++ b/drivers/watchdog/jz4740_wdt.c @@ -162,7 +162,6 @@ static int jz4740_wdt_probe(struct platform_device *pdev) struct device *dev = &pdev->dev; struct jz4740_wdt_drvdata *drvdata; struct watchdog_device *jz4740_wdt; - int ret; drvdata = devm_kzalloc(dev, sizeof(struct jz4740_wdt_drvdata), GFP_KERNEL); -- 2.22.0
Re: [PATCH v3 4/4] serial: 8250: Don't check for mctrl_gpio_init() returning -ENOSYS
Hey Greg On Fri, Aug 02, 2019 at 02:26:23PM +0200, Greg Kroah-Hartman wrote: > On Fri, Aug 02, 2019 at 02:15:55PM +0200, Uwe Kleine-König wrote: > > On Fri, Aug 02, 2019 at 10:04:11AM +, Schrempf Frieder wrote: > > > From: Frieder Schrempf > > > > > > Now that the mctrl_gpio code returns NULL instead of ERR_PTR(-ENOSYS) > > > if CONFIG_GPIOLIB is disabled, we can safely remove this check. > > > > > > Signed-off-by: Frieder Schrempf > > > > Acked-by: Uwe Kleine-König > > > > @greg: This patch doesn't depend on patch 2; ditto for patch 3. So only > > taking patches 1, 3 and 4 should be fine. This way Frieder's v4 only > > have to care for patch 2. (Assuming noone objects to 1, 3 and 4 of > > course.) > > Sounds good, I'll do that, thanks. again you somehow managed to mangle my name :-| $ git log -3 8f0df898b27926e443d13770adfd828cc0f50148 | grep Uwe Acked-by: Uwe Kleine-Knig Acked-by: Uwe Kleine-Knig Reviewed-by: Uwe Kleine-Knig in all three instances the ö is missing. Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
Re: [PATCH v3 2/4] serial: mctrl_gpio: Add a NULL check to mctrl_gpio_to_gpiod()
Hello Frieder, On Mon, Aug 05, 2019 at 09:01:39AM +, Schrempf Frieder wrote: > On 02.08.19 14:12, Uwe Kleine-König wrote: > > On Fri, Aug 02, 2019 at 10:04:10AM +, Schrempf Frieder wrote: > >> From: Frieder Schrempf > >> > >> As it is allowed to use the mctrl_gpio_* functions before > >> initialization (as the 8250 driver does according to 434be0ae7aa7), > > > > Actually I was surprised some time ago that 8250 used serial_mctrl > > without first initializing it and expecting it to work. I didn't look in > > detail, but I wouldn't go so far to call this "allowed". The commit > > itself calls it "workaround" which seems a better match. > > Ok, but if this is considered to be a workaround and as the 8250 driver > does not use mctrl_gpio_to_gpiod(), we should maybe just drop this patch > instead of encouraging others to use mctrl_gpio before initialization. > > I'm really not sure what's best, so depending on what you will propose, > I'll send a new version of this patch with adjusted commit message or not. I wouldn't encourage usage of mctrl-gpio before it's initialized. So I suggest to drop this patch. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
[PATCH] ia64:unwind: fix double free for mod->arch.init_unw_table
The function free_module in file kernel/module.c as follow: void free_module(struct module *mod) { .. module_arch_cleanup(mod); .. module_arch_freeing_init(mod); .. } Both module_arch_cleanup and module_arch_freeing_init function would free the mod->arch.init_unw_table, which cause double free. Here, set mod->arch.init_unw_table = NULL after remove the unwind table to avoid double free. Signed-off-by: chenzefeng --- arch/ia64/kernel/module.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/ia64/kernel/module.c b/arch/ia64/kernel/module.c index 326448f..1a42ba8 100644 --- a/arch/ia64/kernel/module.c +++ b/arch/ia64/kernel/module.c @@ -914,10 +914,14 @@ struct plt_entry { void module_arch_cleanup (struct module *mod) { - if (mod->arch.init_unw_table) + if (mod->arch.init_unw_table) { unw_remove_unwind_table(mod->arch.init_unw_table); - if (mod->arch.core_unw_table) + mod->arch.init_unw_table = NULL; + } + if (mod->arch.core_unw_table) { unw_remove_unwind_table(mod->arch.core_unw_table); + mod->arch.core_unw_table = NULL; + } } void *dereference_module_function_descriptor(struct module *mod, void *ptr) -- 1.8.5.6
[PATCH] i2c: designware: Fix unused variable warning in i2c_dw_init_recovery_info
Fix the following warning: drivers/i2c/busses/i2c-designware-master.c: In function ‘i2c_dw_init_recovery_info’: drivers/i2c/busses/i2c-designware-master.c:658:6: warning: unused variable ‘r’ [-Wunused-variable] int r; ^ Fixes: 33eb09a02e8d ("i2c: designware: make use of devm_gpiod_get_optional") Signed-off-by: Gustavo A. R. Silva --- drivers/i2c/busses/i2c-designware-master.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c index 867787dade43..e8b328242256 100644 --- a/drivers/i2c/busses/i2c-designware-master.c +++ b/drivers/i2c/busses/i2c-designware-master.c @@ -655,7 +655,6 @@ static int i2c_dw_init_recovery_info(struct dw_i2c_dev *dev) struct i2c_bus_recovery_info *rinfo = &dev->rinfo; struct i2c_adapter *adap = &dev->adapter; struct gpio_desc *gpio; - int r; gpio = devm_gpiod_get_optional(dev->dev, "scl", GPIOD_OUT_HIGH); if (IS_ERR_OR_NULL(gpio)) -- 2.22.0
Re: [PATCH v4 06/10] powerpc/fsl_booke/32: implement KASLR infrastructure
Le 05/08/2019 à 08:43, Jason Yan a écrit : This patch add support to boot kernel from places other than KERNELBASE. Since CONFIG_RELOCATABLE has already supported, what we need to do is map or copy kernel to a proper place and relocate. Freescale Book-E parts expect lowmem to be mapped by fixed TLB entries(TLB1). The TLB1 entries are not suitable to map the kernel directly in a randomized region, so we chose to copy the kernel to a proper place and restart to relocate. The offset of the kernel was not randomized yet(a fixed 64M is set). We will randomize it in the next patch. Signed-off-by: Jason Yan Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook Tested-by: Diana Craciun Reviewed-by: Christophe Leroy --- arch/powerpc/Kconfig | 11 +++ arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/early_32.c| 2 +- arch/powerpc/kernel/fsl_booke_entry_mapping.S | 17 ++-- arch/powerpc/kernel/head_fsl_booke.S | 13 ++- arch/powerpc/kernel/kaslr_booke.c | 84 +++ arch/powerpc/mm/mmu_decl.h| 6 ++ arch/powerpc/mm/nohash/fsl_booke.c| 7 +- 8 files changed, 126 insertions(+), 15 deletions(-) create mode 100644 arch/powerpc/kernel/kaslr_booke.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 77f6ebf97113..755378887912 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -548,6 +548,17 @@ config RELOCATABLE setting can still be useful to bootwrappers that need to know the load address of the kernel (eg. u-boot/mkimage). +config RANDOMIZE_BASE + bool "Randomize the address of the kernel image" + depends on (FSL_BOOKE && FLATMEM && PPC32) + select RELOCATABLE + help + Randomizes the virtual address at which the kernel image is + loaded, as a security feature that deters exploit attempts + relying on knowledge of the location of kernel internals. + + If unsure, say N. + config RELOCATABLE_TEST bool "Test relocatable kernel" depends on (PPC64 && RELOCATABLE) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index ea0c69236789..32f6c5b99307 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -106,6 +106,7 @@ extra-$(CONFIG_PPC_8xx) := head_8xx.o extra-y += vmlinux.lds obj-$(CONFIG_RELOCATABLE) += reloc_$(BITS).o +obj-$(CONFIG_RANDOMIZE_BASE) += kaslr_booke.o obj-$(CONFIG_PPC32) += entry_32.o setup_32.o early_32.o obj-$(CONFIG_PPC64) += dma-iommu.o iommu.o diff --git a/arch/powerpc/kernel/early_32.c b/arch/powerpc/kernel/early_32.c index 3482118ffe76..fe8347cdc07d 100644 --- a/arch/powerpc/kernel/early_32.c +++ b/arch/powerpc/kernel/early_32.c @@ -32,5 +32,5 @@ notrace unsigned long __init early_init(unsigned long dt_ptr) apply_feature_fixups(); - return KERNELBASE + offset; + return kimage_vaddr + offset; } diff --git a/arch/powerpc/kernel/fsl_booke_entry_mapping.S b/arch/powerpc/kernel/fsl_booke_entry_mapping.S index de0980945510..de7ee682bb4a 100644 --- a/arch/powerpc/kernel/fsl_booke_entry_mapping.S +++ b/arch/powerpc/kernel/fsl_booke_entry_mapping.S @@ -155,23 +155,22 @@ skpinv: addir6,r6,1 /* Increment */ #if defined(ENTRY_MAPPING_BOOT_SETUP) -/* 6. Setup KERNELBASE mapping in TLB1[0] */ +/* 6. Setup kimage_vaddr mapping in TLB1[0] */ lis r6,0x1000 /* Set MAS0(TLBSEL) = TLB1(1), ESEL = 0 */ mtspr SPRN_MAS0,r6 lis r6,(MAS1_VALID|MAS1_IPROT)@h ori r6,r6,(MAS1_TSIZE(BOOK3E_PAGESZ_64M))@l mtspr SPRN_MAS1,r6 - lis r6,MAS2_VAL(PAGE_OFFSET, BOOK3E_PAGESZ_64M, M_IF_NEEDED)@h - ori r6,r6,MAS2_VAL(PAGE_OFFSET, BOOK3E_PAGESZ_64M, M_IF_NEEDED)@l - mtspr SPRN_MAS2,r6 + lis r6,MAS2_EPN_MASK(BOOK3E_PAGESZ_64M)@h + ori r6,r6,MAS2_EPN_MASK(BOOK3E_PAGESZ_64M)@l + and r6,r6,r20 + ori r6,r6,M_IF_NEEDED@l + mtspr SPRN_MAS2,r6 mtspr SPRN_MAS3,r8 tlbwe -/* 7. Jump to KERNELBASE mapping */ - lis r6,(KERNELBASE & ~0xfff)@h - ori r6,r6,(KERNELBASE & ~0xfff)@l - rlwinm r7,r25,0,0x03ff - add r6,r7,r6 +/* 7. Jump to kimage_vaddr mapping */ + mr r6,r20 #elif defined(ENTRY_MAPPING_KEXEC_SETUP) /* diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index 2083382dd662..aa55832e7506 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -155,6 +155,8 @@ _ENTRY(_start); */ _ENTRY(__early_start) + LOAD_REG_ADDR_PIC(r20, kimage_vaddr) + lwz r20,0(r20) #define ENTRY_MAPPING_B
[PATCH net v2] net: dsa: Check existence of .port_mdb_add callback before calling it
From: Chen-Yu Tsai With the recent addition of commit 75dad2520fc3 ("net: dsa: b53: Disable all ports on setup"), users of b53 (BCM53125 on Lamobo R1 in my case) are forced to use the dsa subsystem to enable the switch, instead of having it in the default transparent "forward-to-all" mode. The b53 driver does not support mdb bitmap functions. However the dsa layer does not check for the existence of the .port_mdb_add callback before actually using it. This results in a NULL pointer dereference, as shown in the kernel oops below. The other functions seem to be properly guarded. Do the same for .port_mdb_add in dsa_switch_mdb_add_bitmap() as well. b53 is not the only driver that doesn't support mdb bitmap functions. Others include bcm_sf2, dsa_loop, lantiq_gswip, mt7530, mv88e6060, qca8k, realtek-smi, and vitesse-vsc73xx. 8<--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address pgd = (ptrval) [] *pgd= Internal error: Oops: 8005 [#1] SMP ARM Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211 CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1 Hardware name: Allwinner sun7i (A20) Family Workqueue: events switchdev_deferred_process_work PC is at 0x0 LR is at dsa_switch_event+0x570/0x620 pc : [<>]lr : []psr: 80070013 sp : ee871db8 ip : fp : ee98d0a4 r10: 000c r9 : 0008 r8 : ee89f710 r7 : ee98d040 r6 : ee98d088 r5 : c0f04c48 r4 : ee98d04c r3 : r2 : ee89f710 r1 : 0008 r0 : ee98d040 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 6deb406a DAC: 0051 Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval)) Stack: (0xee871db8 to 0xee872000) 1da0: ee871e14 103ace2d 1dc0: ee871e14 0005 c08524a0 1de0: e000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0 1e00: b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 1e20: 0008 103ace2d c087e248 ee29c868 103ace2d 0001 1e40: ee871e98 0006 c0fb2a50 c087e2d0 c08523c4 1e60: c014bdfc 0006 c0fad2d0 ee871e98 ee89f710 c014c500 1e80: ee89f3c0 c0f04c48 ee9e5000 c087dfb4 ee9e5000 1ea0: ee89f710 ee871ecb 0001 103ace2d c0f04c48 c087e0a8 1ec0: efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 0122 1ee0: 0100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec 1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 c087def8 c0fad2ec c01447dc 1f20: ef315640 ef7a62c0 0008 ee839580 ee839594 ef7a62c0 0008 c0f03d00 1f40: ef7a62d8 ef7a62c0 e000 c0145b84 e000 c0fb2420 c0bfaa8c 1f60: e000 ee84b600 ee84b5c0 ee87 ee839580 c0145b40 ef0e5ea4 1f80: ee84b61c c014a6f8 0001 ee84b5c0 c014a5b0 1fa0: c01010e8 1fc0: 1fe0: 0013 [] (dsa_switch_event) from [] (notifier_call_chain+0x48/0x84) [] (notifier_call_chain) from [] (raw_notifier_call_chain+0x18/0x20) [] (raw_notifier_call_chain) from [] (dsa_port_mdb_add+0x48/0x74) [] (dsa_port_mdb_add) from [] (__switchdev_handle_port_obj_add+0x54/0xd4) [] (__switchdev_handle_port_obj_add) from [] (switchdev_handle_port_obj_add+0x8/0x14) [] (switchdev_handle_port_obj_add) from [] (dsa_slave_switchdev_blocking_event+0x94/0xa4) [] (dsa_slave_switchdev_blocking_event) from [] (notifier_call_chain+0x48/0x84) [] (notifier_call_chain) from [] (blocking_notifier_call_chain+0x50/0x68) [] (blocking_notifier_call_chain) from [] (switchdev_port_obj_notify+0x44/0xa8) [] (switchdev_port_obj_notify) from [] (switchdev_port_obj_add_now+0x90/0x104) [] (switchdev_port_obj_add_now) from [] (switchdev_port_obj_add_deferred+0x14/0x5c) [] (switchdev_port_obj_add_deferred) from [] (switchdev_deferred_process+0x64/0x104) [] (switchdev_deferred_process) from [] (switchdev_deferred_process_work+0xc/0x14) [] (switchdev_deferred_process_work) from [] (process_one_work+0x218/0x50c) [] (process_one_work) from [] (worker_thread+0x44/0x5bc) [] (worker_thread) from [] (kthread+0x148/0x150) [] (kthread) from [] (ret_from_fork+0x14/0x2c) Exception stack(0xee871fb0 to 0xee871ff8) 1fa0: 1fc0: 1fe0: 00
Re: [PATCH V2] fork: Improve error message for corrupted page tables
On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote: > When a user process exits, the kernel cleans up the mm_struct of the user > process and during cleanup, check_mm() checks the page tables of the user > process for corruption (E.g: unexpected page flags set/cleared). For > corrupted page tables, the error message printed by check_mm() isn't very > clear as it prints the loop index instead of page table type (E.g: Resident > file mapping pages vs Resident shared memory pages). The loop index in > check_mm() is used to index rss_stat[] which represents individual memory > type stats. Hence, instead of printing index, print memory type, thereby > improving error message. > > Without patch: > -- > [ 204.836425] mm/pgtable-generic.c:29: bad p4d > 89eb4e92(80025f941467) > [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2 > [ 204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5 > [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > With patch: > --- > [ 69.815453] mm/pgtable-generic.c:29: bad p4d > 84653642(80025ca37467) > [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 > type:MM_FILEPAGES val:2 > [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 > type:MM_ANONPAGES val:5 > [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 > > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so > that it matches the other print statement. > > Cc: Ingo Molnar > Cc: Vlastimil Babka > Cc: Peter Zijlstra > Cc: Andrew Morton > Cc: Anshuman Khandual > Acked-by: Dave Hansen > Suggested-by: Dave Hansen > Signed-off-by: Sai Praneeth Prakhya Acked-by: Vlastimil Babka I would also add something like this to reduce risk of breaking it in the future: 8< diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index d7016dcb245e..a6f83cbe4603 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -36,6 +36,9 @@ struct vmacache { struct vm_area_struct *vmas[VMACACHE_SIZE]; }; +/* + * When touching this, update also resident_page_types in kernel/fork.c + */ enum { MM_FILEPAGES, /* Resident file mapping pages */ MM_ANONPAGES, /* Resident anonymous pages */
Re: Bisected: Kernel 4.14 + has 3 times higher write IO latency than Kernel 4.4 with raid1
On Tue, Aug 6, 2019 at 1:46 AM NeilBrown wrote: > > On Mon, Aug 05 2019, Jinpu Wang wrote: > > > Hi Neil, > > > > For the md higher write IO latency problem, I bisected it to these commits: > > > > 4ad23a97 MD: use per-cpu counter for writes_pending > > 210f7cd percpu-refcount: support synchronous switch to atomic mode. > > > > Do you maybe have an idea? How can we fix it? > > Hmmm not sure. Hi Neil, Thanks for reply, detailed result in line. > > My guess is that the set_in_sync() call from md_check_recovery() > is taking a long time, and is being called too often. > > Could you try two experiments please. > Baseline on 5.3-rc3: root@ib2:/home/jwang# cat md_lat_ib2_5.3.0-rc3-1-storage_2019_0806_092003.log write-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 fio-2.2.10 Starting 1 process write-test: (groupid=0, jobs=1): err= 0: pid=2621: Tue Aug 6 09:20:44 2019 write: io=84KB, bw=2KB/s, iops=4999, runt= 40001msec slat (usec): min=2, max=69992, avg= 5.37, stdev=374.95 clat (usec): min=0, max=147, avg= 2.42, stdev=13.57 lat (usec): min=2, max=70079, avg= 7.84, stdev=376.07 clat percentiles (usec): | 1.00th=[0], 5.00th=[0], 10.00th=[0], 20.00th=[1], | 30.00th=[1], 40.00th=[1], 50.00th=[1], 60.00th=[1], | 70.00th=[1], 80.00th=[1], 90.00th=[1], 95.00th=[1], | 99.00th=[ 96], 99.50th=[ 125], 99.90th=[ 137], 99.95th=[ 139], | 99.99th=[ 141] bw (KB /s): min=18454, max=21608, per=100.00%, avg=20005.15, stdev=352.24 lat (usec) : 2=98.52%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.06% lat (usec) : 100=0.46%, 250=0.94% cpu : usr=4.64%, sys=0.00%, ctx=197118, majf=0, minf=11 IO depths: 1=98.5%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=1.3%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued: total=r=0/w=21/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: io=84KB, aggrb=1KB/s, minb=1KB/s, maxb=1KB/s, mint=40001msec, maxt=40001msec Disk stats (read/write): md0: ios=60/199436, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% ram1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% > 1/ set /sys/block/md0/md/safe_mode_delay >to 20 or more. It defaults to about 0.2. only set 20 to safe_mode_delay, give a nice improvement. root@ib2:/home/jwang# cat md_lat_ib2_5.3.0-rc3-1-storage_2019_0806_092144.log write-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 fio-2.2.10 Starting 1 process write-test: (groupid=0, jobs=1): err= 0: pid=2676: Tue Aug 6 09:22:25 2019 write: io=84KB, bw=2KB/s, iops=4999, runt= 40001msec slat (usec): min=2, max=99490, avg= 2.98, stdev=222.46 clat (usec): min=0, max=103, avg= 0.96, stdev= 4.51 lat (usec): min=2, max=99581, avg= 3.99, stdev=222.71 clat percentiles (usec): | 1.00th=[0], 5.00th=[0], 10.00th=[0], 20.00th=[0], | 30.00th=[1], 40.00th=[1], 50.00th=[1], 60.00th=[1], | 70.00th=[1], 80.00th=[1], 90.00th=[1], 95.00th=[1], | 99.00th=[1], 99.50th=[1], 99.90th=[ 90], 99.95th=[ 91], | 99.99th=[ 95] bw (KB /s): min=2, max=20008, per=100.00%, avg=20001.82, stdev= 3.38 lat (usec) : 2=99.72%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (usec) : 100=0.25%, 250=0.01% cpu : usr=3.17%, sys=1.48%, ctx=199470, majf=0, minf=11 IO depths: 1=99.7%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued: total=r=0/w=21/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: io=84KB, aggrb=1KB/s, minb=1KB/s, maxb=1KB/s, mint=40001msec, maxt=40001msec Disk stats (read/write): md0: ios=60/199461, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% ram1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% > > 2/ comment out the call the set_in_sync() in md_check_recovery(). Only commented out set_in_sync get a better improvement root@ib2:/home/jwang# cat md_lat_ib2_5.3.0-rc3-1-storage+_2019_0806_093340.log write-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 fio-2.2.10 Starting 1 process write-test: (groupid=0, jobs=1): err= 0: pid=2626: Tue Aug 6 09:34
[PATCH] dt-bindings: arm: amlogic: fix x96-max/sei510 section in amlogic.yaml
From: Christian Hewitt Move amediatech,x96-max and seirobotics,sei510 to the S905D2 section and update the S905D2 description to S905D2/X2/Y2. Signed-off-by: Christian Hewitt Signed-off-by: Neil Armstrong --- Documentation/devicetree/bindings/arm/amlogic.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/arm/amlogic.yaml b/Documentation/devicetree/bindings/arm/amlogic.yaml index 325c6fd3566d..4668064dd7e5 100644 --- a/Documentation/devicetree/bindings/arm/amlogic.yaml +++ b/Documentation/devicetree/bindings/arm/amlogic.yaml @@ -91,13 +91,11 @@ properties: - description: Boards with the Amlogic Meson GXL S905X SoC items: - enum: - - amediatech,x96-max - amlogic,p212 - hwacom,amazetv - khadas,vim - libretech,cc - nexbox,a95x - - seirobotics,sei510 - const: amlogic,s905x - const: amlogic,meson-gxl @@ -129,10 +127,12 @@ properties: - const: amlogic,a113d - const: amlogic,meson-axg - - description: Boards with the Amlogic Meson G12A S905D2 SoC + - description: Boards with the Amlogic Meson G12A S905D2/X2/Y2 SoC items: - enum: + - amediatech,x96-max - amlogic,u200 + - seirobotics,sei510 - const: amlogic,g12a - description: Boards with the Amlogic Meson G12B S922X SoC -- 2.22.0
Re: [PATCH v4 07/10] powerpc/fsl_booke/32: randomize the kernel image offset
Le 05/08/2019 à 08:43, Jason Yan a écrit : After we have the basic support of relocate the kernel in some appropriate place, we can start to randomize the offset now. Entropy is derived from the banner and timer, which will change every build and boot. This not so much safe so additionally the bootloader may pass entropy via the /chosen/kaslr-seed node in device tree. We will use the first 512M of the low memory to randomize the kernel image. The memory will be split in 64M zones. We will use the lower 8 bit of the entropy to decide the index of the 64M zone. Then we chose a 16K aligned offset inside the 64M zone to put the kernel in. KERNELBASE |--> 64M <--| | | +---+++---+ | |||kernel|| | +---+++---+ | | |-> offset<-| kimage_vaddr We also check if we will overlap with some areas like the dtb area, the initrd area or the crashkernel area. If we cannot find a proper area, kaslr will be disabled and boot from the original kernel. Signed-off-by: Jason Yan Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook Reviewed-by: Diana Craciun Tested-by: Diana Craciun Reviewed-by: Christophe Leroy One small comment below --- arch/powerpc/kernel/kaslr_booke.c | 322 +- 1 file changed, 320 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/kaslr_booke.c b/arch/powerpc/kernel/kaslr_booke.c index 30f84c0321b2..97250cad71de 100644 --- a/arch/powerpc/kernel/kaslr_booke.c +++ b/arch/powerpc/kernel/kaslr_booke.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include #include #include #include @@ -34,15 +36,329 @@ #include #include #include +#include #include +#include +#include + +#ifdef DEBUG +#define DBG(fmt...) printk(KERN_ERR fmt) +#else +#define DBG(fmt...) +#endif + +struct regions { + unsigned long pa_start; + unsigned long pa_end; + unsigned long kernel_size; + unsigned long dtb_start; + unsigned long dtb_end; + unsigned long initrd_start; + unsigned long initrd_end; + unsigned long crash_start; + unsigned long crash_end; + int reserved_mem; + int reserved_mem_addr_cells; + int reserved_mem_size_cells; +}; extern int is_second_reloc; +/* Simplified build-specific string for starting entropy. */ +static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@" + LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION; + +static __init void kaslr_get_cmdline(void *fdt) +{ + int node = fdt_path_offset(fdt, "/chosen"); + + early_init_dt_scan_chosen(node, "chosen", 1, boot_command_line); +} + +static unsigned long __init rotate_xor(unsigned long hash, const void *area, + size_t size) +{ + size_t i; + unsigned long *ptr = (unsigned long *)area; As area is a void *, this cast shouldn't be necessary. Or maybe it is necessary because it discards the const ? Christophe + + for (i = 0; i < size / sizeof(hash); i++) { + /* Rotate by odd number of bits and XOR. */ + hash = (hash << ((sizeof(hash) * 8) - 7)) | (hash >> 7); + hash ^= ptr[i]; + } + + return hash; +} + +/* Attempt to create a simple but unpredictable starting entropy. */ +static unsigned long __init get_boot_seed(void *fdt) +{ + unsigned long hash = 0; + + hash = rotate_xor(hash, build_str, sizeof(build_str)); + hash = rotate_xor(hash, fdt, fdt_totalsize(fdt)); + + return hash; +} + +static __init u64 get_kaslr_seed(void *fdt) +{ + int node, len; + fdt64_t *prop; + u64 ret; + + node = fdt_path_offset(fdt, "/chosen"); + if (node < 0) + return 0; + + prop = fdt_getprop_w(fdt, node, "kaslr-seed", &len); + if (!prop || len != sizeof(u64)) + return 0; + + ret = fdt64_to_cpu(*prop); + *prop = 0; + return ret; +} + +static __init bool regions_overlap(u32 s1, u32 e1, u32 s2, u32 e2) +{ + return e1 >= s2 && e2 >= s1; +} + +static __init bool overlaps_reserved_region(const void *fdt, u32 start, + u32 end, struct regions *regions) +{ + int subnode, len, i; + u64 base, size; + + /* check for overlap with /memreserve/ entries */ + for (i = 0; i < fdt_num_mem_rsv(fdt); i++) { + if (fdt_get_mem_rsv(fdt, i, &base, &size) < 0) + continue; + if (regions_overlap(start, end, base, base + size)) + return true; + } + + if (
Re: [PATCH 1/2] HID: hiddev: avoid opening a disconnected device
On Tue, 6 Aug 2019, Hillf Danton wrote: > In order to avoid opening a disconnected device, we need to check exist > again after acquiring the existance lock, and bail out if necessary. > > Cc: Andrey Konovalov > Signed-off-by: Hillf Danton Could you please add proper Reported-by: reference to syzbot? (in 2/2 as well). Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH] ia64:unwind: fix double free for mod->arch.init_unw_table
On Tue, Aug 06, 2019 at 03:46:33PM +0800, chenzefeng wrote: > The function free_module in file kernel/module.c as follow: > > void free_module(struct module *mod) { > .. > module_arch_cleanup(mod); > .. > module_arch_freeing_init(mod); > .. > } > > Both module_arch_cleanup and module_arch_freeing_init function > would free the mod->arch.init_unw_table, which cause double free. > > Here, set mod->arch.init_unw_table = NULL after remove the unwind > table to avoid double free. > > Signed-off-by: chenzefeng > --- > arch/ia64/kernel/module.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
Re: [PATCH v4 09/10] powerpc/fsl_booke/kaslr: support nokaslr cmdline parameter
Le 05/08/2019 à 08:43, Jason Yan a écrit : One may want to disable kaslr when boot, so provide a cmdline parameter 'nokaslr' to support this. Signed-off-by: Jason Yan Cc: Diana Craciun Cc: Michael Ellerman Cc: Christophe Leroy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nicholas Piggin Cc: Kees Cook Reviewed-by: Diana Craciun Tested-by: Diana Craciun Reviewed-by: Christophe Leroy Tiny comment below. --- arch/powerpc/kernel/kaslr_booke.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/powerpc/kernel/kaslr_booke.c b/arch/powerpc/kernel/kaslr_booke.c index 4b3f19a663fc..7c3cb41e7122 100644 --- a/arch/powerpc/kernel/kaslr_booke.c +++ b/arch/powerpc/kernel/kaslr_booke.c @@ -361,6 +361,18 @@ static unsigned long __init kaslr_choose_location(void *dt_ptr, phys_addr_t size return kaslr_offset; } +static inline __init bool kaslr_disabled(void) +{ + char *str; + + str = strstr(boot_command_line, "nokaslr"); + if ((str == boot_command_line) || + (str > boot_command_line && *(str - 1) == ' ')) + return true; I don't think additional () are needed for the left part 'str == boot_command_line' + + return false; +} + /* * To see if we need to relocate the kernel to a random offset * void *dt_ptr - address of the device tree @@ -376,6 +388,8 @@ notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size) kernel_sz = (unsigned long)_end - KERNELBASE; kaslr_get_cmdline(dt_ptr); + if (kaslr_disabled()) + return; offset = kaslr_choose_location(dt_ptr, size, kernel_sz);
[PATCH 1/3] mm/migrate: clean up useless code in migrate_vma_collect_pmd()
Signed-off-by: Pingfan Liu Cc: "Jérôme Glisse" Cc: Andrew Morton Cc: Mel Gorman Cc: Jan Kara Cc: "Kirill A. Shutemov" Cc: Michal Hocko Cc: Mike Kravetz Cc: Andrea Arcangeli Cc: Matthew Wilcox To: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/migrate.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 8992741..c2ec614 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2230,7 +2230,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (pte_none(pte)) { mpfn = MIGRATE_PFN_MIGRATE; migrate->cpages++; - pfn = 0; goto next; } @@ -2255,7 +2254,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (is_zero_pfn(pfn)) { mpfn = MIGRATE_PFN_MIGRATE; migrate->cpages++; - pfn = 0; goto next; } page = vm_normal_page(migrate->vma, addr, pte); @@ -2265,10 +2263,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, /* FIXME support THP */ if (!page || !page->mapping || PageTransCompound(page)) { - mpfn = pfn = 0; + mpfn = 0; goto next; } - pfn = page_to_pfn(page); /* * By getting a reference on the page we pin it and that blocks -- 2.7.5
[PATCH 3/3] mm/migrate: remove the duplicated code migrate_vma_collect_hole()
After the previous patch which sees hole as invalid source, migrate_vma_collect_hole() has the same code as migrate_vma_collect_skip(). Removing the duplicated code. Signed-off-by: Pingfan Liu Cc: "Jérôme Glisse" Cc: Andrew Morton Cc: Mel Gorman Cc: Jan Kara Cc: "Kirill A. Shutemov" Cc: Michal Hocko Cc: Mike Kravetz Cc: Andrea Arcangeli Cc: Matthew Wilcox To: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/migrate.c | 22 +++--- 1 file changed, 3 insertions(+), 19 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 832483f..95e038d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2128,22 +2128,6 @@ struct migrate_vma { unsigned long end; }; -static int migrate_vma_collect_hole(unsigned long start, - unsigned long end, - struct mm_walk *walk) -{ - struct migrate_vma *migrate = walk->private; - unsigned long addr; - - for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) { - migrate->src[migrate->npages] = 0; - migrate->dst[migrate->npages] = 0; - migrate->npages++; - } - - return 0; -} - static int migrate_vma_collect_skip(unsigned long start, unsigned long end, struct mm_walk *walk) @@ -2173,7 +2157,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, again: if (pmd_none(*pmdp)) - return migrate_vma_collect_hole(start, end, walk); + return migrate_vma_collect_skip(start, end, walk); if (pmd_trans_huge(*pmdp)) { struct page *page; @@ -2206,7 +2190,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, return migrate_vma_collect_skip(start, end, walk); if (pmd_none(*pmdp)) - return migrate_vma_collect_hole(start, end, + return migrate_vma_collect_skip(start, end, walk); } } @@ -2337,7 +2321,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate) mm_walk.pmd_entry = migrate_vma_collect_pmd; mm_walk.pte_entry = NULL; - mm_walk.pte_hole = migrate_vma_collect_hole; + mm_walk.pte_hole = migrate_vma_collect_skip; mm_walk.hugetlb_entry = NULL; mm_walk.test_walk = NULL; mm_walk.vma = migrate->vma; -- 2.7.5
[PATCH 2/3] mm/migrate: see hole as invalid source page
MIGRATE_PFN_MIGRATE marks a valid pfn, further more, suitable to migrate. As for hole, there is no valid pfn, not to mention migration. Before this patch, hole has already relied on the following code to be filtered out. Hence it is more reasonable to see hole as invalid source page. migrate_vma_prepare() { struct page *page = migrate_pfn_to_page(migrate->src[i]); if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE)) \_ this condition } Signed-off-by: Pingfan Liu Cc: "Jérôme Glisse" Cc: Andrew Morton Cc: Mel Gorman Cc: Jan Kara Cc: "Kirill A. Shutemov" Cc: Michal Hocko Cc: Mike Kravetz Cc: Andrea Arcangeli Cc: Matthew Wilcox To: linux...@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/migrate.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index c2ec614..832483f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2136,10 +2136,9 @@ static int migrate_vma_collect_hole(unsigned long start, unsigned long addr; for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) { - migrate->src[migrate->npages] = MIGRATE_PFN_MIGRATE; + migrate->src[migrate->npages] = 0; migrate->dst[migrate->npages] = 0; migrate->npages++; - migrate->cpages++; } return 0; @@ -2228,8 +2227,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, pfn = pte_pfn(pte); if (pte_none(pte)) { - mpfn = MIGRATE_PFN_MIGRATE; - migrate->cpages++; + mpfn = 0; goto next; } -- 2.7.5
Re: [PATCH V2 05/10] ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier
On Tue, Aug 6, 2019 at 6:39 AM Viresh Kumar wrote: > > On 05-08-19, 11:42, Rafael J. Wysocki wrote: > > On Tuesday, July 23, 2019 8:14:05 AM CEST Viresh Kumar wrote: > > > @@ -310,8 +339,11 @@ static int __init acpi_processor_driver_init(void) > > > cpuhp_setup_state_nocalls(CPUHP_ACPI_CPUDRV_DEAD, "acpi/cpu-drv:dead", > > > NULL, acpi_soft_cpu_dead); > > > > > > - acpi_thermal_cpufreq_init(); > > > - acpi_processor_ppc_init(); > > > + if (!cpufreq_register_notifier(&acpi_processor_notifier_block, > > > + CPUFREQ_POLICY_NOTIFIER)) { > > > + acpi_processor_cpufreq_init = true; > > > > Can't that be set/cleared by acpi_processor_notifier() itself? > > This is required to be done only once at initialization and setting it > to true again and again on every invocation of the notifier callback > doesn't look right. > > I have updated the patch based on rest of your suggestions, please see > if it looks okay now. Yes, it does, thanks! [No need to resend, I'll take it from this message.]
Re: [PATCH v3 2/4] serial: mctrl_gpio: Add a NULL check to mctrl_gpio_to_gpiod()
On 06.08.19 09:45, Uwe Kleine-König wrote: > Hello Frieder, > > On Mon, Aug 05, 2019 at 09:01:39AM +, Schrempf Frieder wrote: >> On 02.08.19 14:12, Uwe Kleine-König wrote: >>> On Fri, Aug 02, 2019 at 10:04:10AM +, Schrempf Frieder wrote: From: Frieder Schrempf As it is allowed to use the mctrl_gpio_* functions before initialization (as the 8250 driver does according to 434be0ae7aa7), >>> >>> Actually I was surprised some time ago that 8250 used serial_mctrl >>> without first initializing it and expecting it to work. I didn't look in >>> detail, but I wouldn't go so far to call this "allowed". The commit >>> itself calls it "workaround" which seems a better match. >> >> Ok, but if this is considered to be a workaround and as the 8250 driver >> does not use mctrl_gpio_to_gpiod(), we should maybe just drop this patch >> instead of encouraging others to use mctrl_gpio before initialization. >> >> I'm really not sure what's best, so depending on what you will propose, >> I'll send a new version of this patch with adjusted commit message or not. > > I wouldn't encourage usage of mctrl-gpio before it's initialized. So I > suggest to drop this patch. Ok, thanks.
Re: Question about mfd_add_devices and platform_data
Hi Lee, Can you help me with this question? Thanks Lucas On Mon, Aug 5, 2019 at 2:43 PM Lucas Tanure wrote: > > Hi, > > I would like to understand mfd_add_devices call and platform_data section. > An mfd device can have platform_data, which is kmemdup at > platform_device_add_data from platform_device_add_data call inside > mfd_add_device. And after this kmemdup the new mfd device receives the > clone memory and the pointer given to platform_device_add_data is freed. > > All the drivers I read the platform_data is static, which in my view can > not be freed and kfrees says: > > "Don't free memory not originally allocated by kmalloc() or you will run > into trouble." > > So, my questions is : Should my driver kmalloc platform_data first and then > call mfd_add_devices ? Or it's fine to give static memory to it ? > > Example driver: > > drivers/mfd/vexpress-sysreg.c: > > static struct syscon_platform_data vexpress_sysreg_sys_id_pdata = { > .label = "sys_id", > }; > > static struct mfd_cell vexpress_sysreg_cells[] = { > { > .name = "syscon", > .num_resources = 1, > .resources = (struct resource []) { > DEFINE_RES_MEM(SYS_ID, 0x4), > }, > .platform_data = &vexpress_sysreg_sys_id_pdata, > .pdata_size = sizeof(vexpress_sysreg_sys_id_pdata), > }, > > For this case mfd_add_devices will free vexpress_sysreg_sys_id_pdata, but > it's static. > > Thanks > Lucas
Re: [PATCH v2 4/4] hugetlbfs: don't retry when pool page allocations start to fail
On 8/6/19 3:47 AM, Mike Kravetz wrote: > When allocating hugetlbfs pool pages via /proc/sys/vm/nr_hugepages, > the pages will be interleaved between all nodes of the system. If > nodes are not equal, it is quite possible for one node to fill up > before the others. When this happens, the code still attempts to > allocate pages from the full node. This results in calls to direct > reclaim and compaction which slow things down considerably. > > When allocating pool pages, note the state of the previous allocation > for each node. If previous allocation failed, do not use the > aggressive retry algorithm on successive attempts. The allocation > will still succeed if there is memory available, but it will not try > as hard to free up memory. > > Signed-off-by: Mike Kravetz Acked-by: Vlastimil Babka Thanks.
Re: [mm] 755d6edc1a: will-it-scale.per_process_ops -4.1% regression
On Tue 06-08-19 15:05:47, kernel test robot wrote: > Greeting, > > FYI, we noticed a -4.1% regression of will-it-scale.per_process_ops due to > commit: I have to confess I cannot make much sense from numbers because they seem to be too volatile and the main contributor doesn't stand up for me. Anyway, regressions on microbenchmarks like this are not all that surprising when a locking is slightly changed and the critical section made shorter. I have seen that in the past already. That being said I would still love to get to bottom of this bug rather than play with the lock duration by a magic. In other words http://lkml.kernel.org/r/20190730125751.gs9...@dhcp22.suse.cz -- Michal Hocko SUSE Labs
[PATCH v8 00/14] Guest LBR Enabling
Last Branch Recording (LBR) is a performance monitor unit (PMU) feature on Intel CPUs that captures branch related info. This patch series enables this feature to KVM guests. Each guest can be configured to expose this LBR feature to the guest via userspace setting the enabling param in KVM_CAP_X86_GUEST_LBR (patch 3). About the lbr emulation method: Since the vcpu get scheduled in, the lbr related msrs are made interceptible. This makes guest first access to a lbr related msr always vm-exit to kvm, so that kvm can know whether the lbr feature is used during the vcpu time slice. The kvm lbr msr handler does the following things: - create an lbr perf event (task pinned) for the vcpu thread. The perf event mainly serves 2 purposes: -- follow the host perf scheduling rules to manage the vcpu's usage of lbr (e.g. a cpu pinned lbr event could reclaim lbr and thus stopping the vcpu's use); -- have the host perf do context switching of the lbr state on the vcpu thread switching. - pass the lbr related msrs through to the guest. This enables the following guest accesses to the lbr related msrs without vm-exit, as long as the vcpu's lbr event owns the lbr feature. A cpu pinned lbr event on the host could come and take over the lbr feature via IPI calls. In this case, the pass-through will be cancelled (patch 13), and the guest following accesses to the lbr msrs will vm-exit to kvm and accesses will be forbidden in the handler. If the guest doesn't touch any of the lbr related msrs (likely the guest doesn't need to run lbr in the near future), the vcpu's lbr perf event will be freed (please see patch 12 commit for more details). * Tests Conclusion: the profiling results on the guest are similar to that on the host. Run: ./perf -b ./test_program - Test on the host: Overhead Command Source Shared Object Source SymbolTarget Symbol 22.35% ftestlibc-2.23.so [.] __random [.] __random 8.20% ftestftest [.] qux [.] qux 5.88% ftestftest [.] random@plt [.] __random 5.88% ftestlibc-2.23.so [.] __random [.] __random_r 5.79% ftestftest [.] main [.] random@plt 5.60% ftestftest [.] main [.] foo 5.24% ftestlibc-2.23.so [.] __random [.] main 5.20% ftestlibc-2.23.so [.] __random_r [.] __random 5.00% ftestftest [.] foo [.] qux 4.91% ftestftest [.] main [.] bar 4.83% ftestftest [.] bar [.] qux 4.57% ftestftest [.] main [.] main 4.38% ftestftest [.] foo [.] main 4.13% ftestftest [.] qux [.] foo 3.89% ftestftest [.] qux [.] bar 3.86% ftestftest [.] bar [.] main - Test on the guest: Overhead Command Source Shaged Object Source SymbolTarget Symbol 22.36% ftestlibc-2.23.so [.] random [.] random 8.55% ftestftest [.] qux [.] qux 5.79% ftestlibc-2.23.so [.] random [.] random_r 5.64% ftestftest [.] random@plt [.] random 5.58% ftestftest [.] main [.] random@plt 5.55% ftestftest [.] main [.] foo 5.41% ftestlibc-2.23.so [.] random [.] main 5.31% ftestlibc-2.23.so [.] random_r [.] random 5.11% ftestftest [.] foo [.] qux 4.93% ftestftest [.] main [.] main 4.59% ftestftest [.] qux [.] bar 4.49% ftestftest [.] bar [.] main 4.42% ftestftest [.] bar [.] qux 4.16% ftestftest [.] main [.] bar 3.95% ftestftest [.] qux [.] foo 3.79% ftestftest [.] foo [.] main (due to the lib version difference, "random" is equavlent to __random above) v7->v8 Changelog: - Patch 3: -- document KVM_CAP_X86_GUEST_LBR in api.txt -- make the check of KVM_CAP_X86_GUEST_LBR return the size of struct x86_perf_lbr_stack, to let userspace do a compatibility check.
[PATCH v8 01/14] perf/x86: fix the variable type of the lbr msrs
The msr variable type can be "unsigned int", which uses less memory than the longer unsigned long. The lbr nr won't be a negative number, so make it "unsigned int" as well. Cc: Peter Zijlstra Cc: Andi Kleen Suggested-by: Peter Zijlstra Signed-off-by: Wei Wang --- arch/x86/events/perf_event.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 8751008..27e4d32 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -660,8 +660,8 @@ struct x86_pmu { /* * Intel LBR */ - unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */ - int lbr_nr;/* hardware stack size */ + unsigned intlbr_tos, lbr_from, lbr_to, + lbr_nr;/* lbr stack and size */ u64 lbr_sel_mask; /* LBR_SELECT valid bits */ const int *lbr_sel_map; /* lbr_select mappings */ boollbr_double_abort; /* duplicated lbr aborts */ -- 2.7.4
[PATCH v8 03/14] KVM/x86: KVM_CAP_X86_GUEST_LBR
Introduce KVM_CAP_X86_GUEST_LBR to allow per-VM enabling of the guest lbr feature. Signed-off-by: Wei Wang --- Documentation/virt/kvm/api.txt | 26 ++ arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/x86.c | 16 include/uapi/linux/kvm.h| 1 + 4 files changed, 45 insertions(+) diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 2d06776..64632a8 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -5046,6 +5046,32 @@ it hard or impossible to use it correctly. The availability of KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed. Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT. +7.19 KVM_CAP_X86_GUEST_LBR +Architectures: x86 +Parameters: args[0] whether feature should be enabled or not +args[1] pointer to the userspace memory to load the lbr stack info + +The lbr stack info is described by +struct x86_perf_lbr_stack { + unsigned intnr; + unsigned inttos; + unsigned intfrom; + unsigned intto; + unsigned intinfo; +}; + +@nr: number of lbr stack entries +@tos: index of the top of stack msr +@from: index of the msr that stores a branch source address +@to: index of the msr that stores a branch destination address +@info: index of the msr that stores lbr related flags + +Enabling this capability allows guest accesses to the lbr feature. Otherwise, +#GP will be injected to the guest when it accesses to the lbr related msrs. + +After the feature is enabled, before exiting to userspace, kvm handlers should +fill the lbr stack info into the userspace memory pointed by args[1]. + 8. Other capabilities. -- diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7b0a4ee..d29 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -875,6 +875,7 @@ struct kvm_arch { atomic_t vapics_in_nmi_mode; struct mutex apic_map_lock; struct kvm_apic_map *apic_map; + struct x86_perf_lbr_stack lbr_stack; bool apic_access_page_done; @@ -884,6 +885,7 @@ struct kvm_arch { bool hlt_in_guest; bool pause_in_guest; bool cstate_in_guest; + bool lbr_in_guest; unsigned long irq_sources_bitmap; s64 kvmclock_offset; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c6d951c..e1eb1be 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3129,6 +3129,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_EXCEPTION_PAYLOAD: r = 1; break; + case KVM_CAP_X86_GUEST_LBR: + r = sizeof(struct x86_perf_lbr_stack); + break; case KVM_CAP_SYNC_REGS: r = KVM_SYNC_X86_VALID_FIELDS; break; @@ -4670,6 +4673,19 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, kvm->arch.exception_payload_enabled = cap->args[0]; r = 0; break; + case KVM_CAP_X86_GUEST_LBR: + r = -EINVAL; + if (cap->args[0] && + x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) + break; + + if (copy_to_user((void __user *)cap->args[1], +&kvm->arch.lbr_stack, +sizeof(struct x86_perf_lbr_stack))) + break; + kvm->arch.lbr_in_guest = cap->args[0]; + r = 0; + break; default: r = -EINVAL; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5e3f12d..dd53edc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -996,6 +996,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 #define KVM_CAP_PMU_EVENT_FILTER 173 +#define KVM_CAP_X86_GUEST_LBR 174 #ifdef KVM_CAP_IRQ_ROUTING -- 2.7.4
[PATCH v8 02/14] perf/x86: add a function to get the addresses of the lbr stack msrs
The lbr stack msrs are model specific. The perf subsystem has already assigned the abstracted msr address values based on the cpu model. So add a function to enable callers outside the perf subsystem to get the lbr stack addresses. This is useful for hypervisors to emulate the lbr feature for the guest. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 23 +++ arch/x86/include/asm/perf_event.h | 14 ++ 2 files changed, 37 insertions(+) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 6f814a2..9b2d05c 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -1311,3 +1311,26 @@ void intel_pmu_lbr_init_knl(void) if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_LIP) x86_pmu.intel_cap.lbr_format = LBR_FORMAT_EIP_FLAGS; } + +/** + * x86_perf_get_lbr_stack - get the lbr stack related msrs + * + * @stack: the caller's memory to get the lbr stack + * + * Returns: 0 indicates that the lbr stack has been successfully obtained. + */ +int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack) +{ + stack->nr = x86_pmu.lbr_nr; + stack->tos = x86_pmu.lbr_tos; + stack->from = x86_pmu.lbr_from; + stack->to = x86_pmu.lbr_to; + + if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) + stack->info = MSR_LBR_INFO_0; + else + stack->info = 0; + + return 0; +} +EXPORT_SYMBOL_GPL(x86_perf_get_lbr_stack); diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 1392d5e..2606100 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -318,7 +318,16 @@ struct perf_guest_switch_msr { u64 host, guest; }; +struct x86_perf_lbr_stack { + unsigned intnr; + unsigned inttos; + unsigned intfrom; + unsigned intto; + unsigned intinfo; +}; + extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); +extern int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack); extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap); extern void perf_check_microcode(void); extern int x86_perf_rdpmc_index(struct perf_event *event); @@ -329,6 +338,11 @@ static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) return NULL; } +static inline int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack) +{ + return -1; +} + static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { memset(cap, 0, sizeof(*cap)); -- 2.7.4
[PATCH v8 05/14] KVM/x86/vPMU: tweak kvm_pmu_get_msr
Change kvm_pmu_get_msr to get the msr_data struct, as the host_initiated field from the struct could be used by get_msr. This also makes this API consistent with kvm_pmu_set_msr. Cc: Paolo Bonzini Cc: Andi Kleen Signed-off-by: Wei Wang --- arch/x86/kvm/pmu.c | 4 ++-- arch/x86/kvm/pmu.h | 4 ++-- arch/x86/kvm/pmu_amd.c | 7 --- arch/x86/kvm/vmx/pmu_intel.c | 19 +++ arch/x86/kvm/x86.c | 4 ++-- 5 files changed, 21 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 26fac6c..1a291ed 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -350,9 +350,9 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) return kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr); } -int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) +int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { - return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr, data); + return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr_info); } int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index d9eec9a..f61024e 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -30,7 +30,7 @@ struct kvm_pmu_ops { int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx); bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr); bool (*lbr_enable)(struct kvm_vcpu *vcpu); - int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data); + int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void (*refresh)(struct kvm_vcpu *vcpu); void (*init)(struct kvm_vcpu *vcpu); @@ -114,7 +114,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu); int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx); bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr); -int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data); +int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void kvm_pmu_refresh(struct kvm_vcpu *vcpu); void kvm_pmu_reset(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c index c838838..4a64a3f 100644 --- a/arch/x86/kvm/pmu_amd.c +++ b/arch/x86/kvm/pmu_amd.c @@ -208,21 +208,22 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) return ret; } -static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) +static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_pmc *pmc; + u32 msr = msr_info->index; /* MSR_PERFCTRn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER); if (pmc) { - *data = pmc_read_counter(pmc); + msr_info->data = pmc_read_counter(pmc); return 0; } /* MSR_EVNTSELn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL); if (pmc) { - *data = pmc->eventsel; + msr_info->data = pmc->eventsel; return 0; } diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 6294a86..53bb95e 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -297,35 +297,38 @@ static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu) return true; } -static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) +static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_pmc *pmc; + u32 msr = msr_info->index; switch (msr) { case MSR_CORE_PERF_FIXED_CTR_CTRL: - *data = pmu->fixed_ctr_ctrl; + msr_info->data = pmu->fixed_ctr_ctrl; return 0; case MSR_CORE_PERF_GLOBAL_STATUS: - *data = pmu->global_status; + msr_info->data = pmu->global_status; return 0; case MSR_CORE_PERF_GLOBAL_CTRL: - *data = pmu->global_ctrl; + msr_info->data = pmu->global_ctrl; return 0; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: - *data = pmu->global_ovf_ctrl; + msr_info->data = pmu->global_ovf_ctrl; return 0; default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { u64 val = pmc_read_counter(pmc); - *data = val & pmu->counter_bitmask[KVM_PMC_GP]; + msr_info->data = + val & pmu->counter_bitmask[KVM_PMC_GP]; return 0; } else if ((pmc = get_fixed_
Re: [PATCH v1 2/3] ASoC: rsnd: Allow reconfiguration of clock rate
Hi Jiada > > 2nd, can we keep usrcnt setup as-is ? > > I guess we can just avoid rsnd_ssi_master_clk_start() if ssi->rate was not > > 0 ? > > I don't fully understand your 2nd question, > in case of rsnd_ssi_master_clk_stop(), if avoid > rsnd_ssi_master_clk_stop() when ssi->rate is 0 by apply following > change > > static void rsnd_ssi_master_clk_stop(struct rsnd_mod *mod, >struct rsnd_dai_stream *io) > { > ... > -if (ssi->usrcnt > 1) > +if (ssi->rate == 0) > return; > ... > } > > then when any IO stream with same SSI calls .hw_free, > the other IO stream's clock will be stopped too. I think we can find more simple solution if we can find good ideas. For example, how about to add new counter for hw_params/hw_free ? Anyway, [3/3] patch is too much over-kill to me. And, please don't exchange usrcnt inc/dec position at [2/3]. It is for open/close. Thank you for your help !! Best regards --- Kuninori Morimoto
[PATCH v8 04/14] KVM/x86: intel_pmu_lbr_enable
The lbr stack is model specific, for example, SKX has 32 lbr stack entries while HSW has 16 entries, so a HSW guest running on a SKX machine may not get accurate perf results. Currently, we forbid the guest lbr enabling when the guest and host see different lbr stack entries or the host and guest see different lbr stack msr indices. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Cc: Kan Liang Signed-off-by: Wei Wang --- arch/x86/kvm/pmu.c | 8 +++ arch/x86/kvm/pmu.h | 2 + arch/x86/kvm/vmx/pmu_intel.c | 136 +++ arch/x86/kvm/x86.c | 3 +- 4 files changed, 147 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 46875bb..26fac6c 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -331,6 +331,14 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data) return 0; } +bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu) +{ + if (kvm_x86_ops->pmu_ops->lbr_enable) + return kvm_x86_ops->pmu_ops->lbr_enable(vcpu); + + return false; +} + void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu) { if (lapic_in_kernel(vcpu)) diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index 58265f7..d9eec9a 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -29,6 +29,7 @@ struct kvm_pmu_ops { u64 *mask); int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx); bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr); + bool (*lbr_enable)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void (*refresh)(struct kvm_vcpu *vcpu); @@ -107,6 +108,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel); void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx); void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx); +bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu); void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu); void kvm_pmu_handle_event(struct kvm_vcpu *vcpu); int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 4dea0e0..6294a86 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -12,6 +12,7 @@ #include #include #include +#include #include "x86.h" #include "cpuid.h" #include "lapic.h" @@ -162,6 +163,140 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) return ret; } +static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + u8 vcpu_model = guest_cpuid_model(vcpu); + unsigned int vcpu_lbr_from, vcpu_lbr_nr; + + if (x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) + return false; + + if (guest_cpuid_family(vcpu) != boot_cpu_data.x86) + return false; + + /* +* It could be possible that people have vcpus of old model run on +* physcal cpus of newer model, for example a BDW guest on a SKX +* machine (but not possible to be the other way around). +* The BDW guest may not get accurate results on a SKX machine as it +* only reads 16 entries of the lbr stack while there are 32 entries +* of recordings. We currently forbid the lbr enabling when the vcpu +* and physical cpu see different lbr stack entries or the guest lbr +* msr indices are not compatible with the host. +*/ + switch (vcpu_model) { + case INTEL_FAM6_CORE2_MEROM: + case INTEL_FAM6_CORE2_MEROM_L: + case INTEL_FAM6_CORE2_PENRYN: + case INTEL_FAM6_CORE2_DUNNINGTON: + /* intel_pmu_lbr_init_core() */ + vcpu_lbr_nr = 4; + vcpu_lbr_from = MSR_LBR_CORE_FROM; + break; + case INTEL_FAM6_NEHALEM: + case INTEL_FAM6_NEHALEM_EP: + case INTEL_FAM6_NEHALEM_EX: + /* intel_pmu_lbr_init_nhm() */ + vcpu_lbr_nr = 16; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_ATOM_BONNELL: + case INTEL_FAM6_ATOM_BONNELL_MID: + case INTEL_FAM6_ATOM_SALTWELL: + case INTEL_FAM6_ATOM_SALTWELL_MID: + case INTEL_FAM6_ATOM_SALTWELL_TABLET: + /* intel_pmu_lbr_init_atom() */ + vcpu_lbr_nr = 8; + vcpu_lbr_from = MSR_LBR_CORE_FROM; + break; + case INTEL_FAM6_ATOM_SILVERMONT: + case INTEL_FAM6_ATOM_SILVERMONT_X: + case INTEL_FAM6_ATOM_SILVERMONT_MID: + case INTEL_FAM6_ATOM_AIRMONT: + case INTEL_FAM6_ATOM_AIRMONT_MID: + /* intel_pmu_lbr_init_slm() */ + vcpu_lbr_nr = 8; + vcpu_lbr_from = MSR_LBR_CORE_FROM; + break; + case INTEL_FAM6_ATOM_GOLDMONT: + c
[PATCH v8 06/14] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
Bits [0, 5] of MSR_IA32_PERF_CAPABILITIES tell about the format of the addresses stored in the lbr stack. Expose those bits to the guest when the guest lbr feature is enabled. Cc: Paolo Bonzini Cc: Andi Kleen Signed-off-by: Wei Wang --- arch/x86/include/asm/perf_event.h | 2 ++ arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx/pmu_intel.c | 16 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 2606100..aa77da2 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -95,6 +95,8 @@ #define PEBS_DATACFG_LBRS BIT_ULL(3) #define PEBS_DATACFG_LBR_SHIFT 24 +#define X86_PERF_CAP_MASK_LBR_FMT 0x3f + /* * Intel "Architectural Performance Monitoring" CPUID * detection/enumeration details: diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 22c2720..826b2dc 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -458,7 +458,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function, F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | 0 /* DS-CPL, VMX, SMX, EST */ | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | - F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | + F(FMA) | F(CX16) | 0 /* xTPR Update*/ | F(PDCM) | F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) | F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) | diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 53bb95e..f0ad78f 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -151,6 +151,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) case MSR_CORE_PERF_GLOBAL_STATUS: case MSR_CORE_PERF_GLOBAL_CTRL: case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + case MSR_IA32_PERF_CAPABILITIES: ret = pmu->version > 1; break; default: @@ -316,6 +317,19 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_CORE_PERF_GLOBAL_OVF_CTRL: msr_info->data = pmu->global_ovf_ctrl; return 0; + case MSR_IA32_PERF_CAPABILITIES: { + u64 data; + + if (!boot_cpu_has(X86_FEATURE_PDCM) || + (!msr_info->host_initiated && +!guest_cpuid_has(vcpu, X86_FEATURE_PDCM))) + return 1; + data = native_read_msr(MSR_IA32_PERF_CAPABILITIES); + msr_info->data = 0; + if (vcpu->kvm->arch.lbr_in_guest) + msr_info->data |= (data & X86_PERF_CAP_MASK_LBR_FMT); + return 0; + } default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { u64 val = pmc_read_counter(pmc); @@ -374,6 +388,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 0; } break; + case MSR_IA32_PERF_CAPABILITIES: + return 1; /* RO MSR */ default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { if (msr_info->host_initiated) -- 2.7.4
[PATCH v8 12/14] KVM/x86/lbr: lbr emulation
In general, the lbr emulation works in this way: Guest first access (since vcpu scheduled in) to the lbr related msr gets trapped to kvm, and the handler will do the following things: - create an lbr perf event to have the vcpu get the lbr feature from host perf following the perf scheduling rules; - pass the lbr related msrs through to the guest for direct accesses without vm-exits till the end of this vcpu time slice. The guest first access is made interceptible so that the kvm side lbr emulation can always get if the lbr feature has been used during the vcpu time slice. If the lbr feature isn't used during a time slice, the lbr event created for the vcpu will be freed. Some considerations: - Why not free the vcpu lbr event when the guest clears the lbr enable bit? Guest may frequently clear the lbr enable bit (in the debugctl msr) during its use of the lbr feature, e.g. in PMI handler. This will cause the kvm emulation to frequently alloc/free the vcpu lbr event, which is unnecessary. Technically, we want to free the vcpu lbr event when the guest doesn't need to run lbr anymore. Heuristically, we free the vcpu lbr event when the guest doesn't touch any of the lbr msrs during an entire vcpu time slice. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Suggested-by: Andi Kleen Signed-off-by: Wei Wang --- arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/pmu.c | 6 ++ arch/x86/kvm/pmu.h | 2 + arch/x86/kvm/vmx/pmu_intel.c| 206 arch/x86/kvm/vmx/vmx.c | 4 +- arch/x86/kvm/vmx/vmx.h | 2 + arch/x86/kvm/x86.c | 2 + 7 files changed, 222 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 692a0c2..ecd22b5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -469,6 +469,8 @@ struct kvm_pmu { u64 global_ctrl_mask; u64 global_ovf_ctrl_mask; u64 reserved_bits; + /* Indicate if the lbr msrs were accessed in this vcpu time slice */ + bool lbr_used; u8 version; struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 1a291ed..afad092 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -360,6 +360,12 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info); } +void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ + if (kvm_x86_ops->pmu_ops->sched_in) + kvm_x86_ops->pmu_ops->sched_in(vcpu, cpu); +} + /* refresh PMU settings. This function generally is called when underlying * settings are changed (such as changes of PMU CPUID by guest VMs), which * should rarely happen. diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index f61024e..f875721 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -32,6 +32,7 @@ struct kvm_pmu_ops { bool (*lbr_enable)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); + void (*sched_in)(struct kvm_vcpu *vcpu, int cpu); void (*refresh)(struct kvm_vcpu *vcpu); void (*init)(struct kvm_vcpu *vcpu); void (*reset)(struct kvm_vcpu *vcpu); @@ -116,6 +117,7 @@ int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx); bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr); int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); +void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu); void kvm_pmu_refresh(struct kvm_vcpu *vcpu); void kvm_pmu_reset(struct kvm_vcpu *vcpu); void kvm_pmu_init(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 89730f8..5580f1a 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -17,6 +17,7 @@ #include "cpuid.h" #include "lapic.h" #include "pmu.h" +#include "vmx.h" static struct kvm_event_hw_type_mapping intel_arch_events[] = { /* Index must match CPUID 0x0A.EBX bit vector */ @@ -141,6 +142,19 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, return &counters[idx]; } +/* Return true if it is one of the lbr related msrs. */ +static inline bool is_lbr_msr(struct kvm_vcpu *vcpu, u32 index) +{ + struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack; + int nr = stack->nr; + + return !!(index == MSR_LBR_SELECT || + index == stack->tos || + (index >= stack->from && index < stack->from + nr) || + (index >= stack->to && index < stack->to + nr) || + (index >= stack->info && index < stack->info))
[PATCH v8 14/14] KVM/x86: remove the common handling of the debugctl msr
The debugctl msr is not completely identical on AMD and Intel CPUs, for example, FREEZE_LBRS_ON_PMI is supported by Intel CPUs only. Now, this msr is handled separatedly in svm.c and intel_pmu.c. So remove the common debugctl msr handling code in kvm_get/set_msr_common. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Signed-off-by: Wei Wang --- arch/x86/kvm/x86.c | 13 - 1 file changed, 13 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index efaf0e8..3839ebd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2528,18 +2528,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; } break; - case MSR_IA32_DEBUGCTLMSR: - if (!data) { - /* We support the non-activated case already */ - break; - } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) { - /* Values other than LBR and BTF are vendor-specific, - thus reserved and should throw a #GP */ - return 1; - } - vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", - __func__, data); - break; case 0x200 ... 0x2ff: return kvm_mtrr_set_msr(vcpu, msr, data); case MSR_IA32_APICBASE: @@ -2800,7 +2788,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) switch (msr_info->index) { case MSR_IA32_PLATFORM_ID: case MSR_IA32_EBL_CR_POWERON: - case MSR_IA32_DEBUGCTLMSR: case MSR_IA32_LASTBRANCHFROMIP: case MSR_IA32_LASTBRANCHTOIP: case MSR_IA32_LASTINTFROMIP: -- 2.7.4
[PATCH v8 11/14] perf/x86: save/restore LBR_SELECT on vcpu switching
The regular host lbr perf event doesn't save/restore the LBR_SELECT msr during a thread context switching, because the LBR_SELECT value is generated from attr.branch_sample_type and already stored in event->hw.branch_reg (please see intel_pmu_setup_hw_filter), which doesn't get lost during thread context switching. The attr.branch_sample_type for the vcpu lbr event is deliberately set to the user call stack mode to enable the perf core to save/restore the lbr related msrs on vcpu switching. So the attr.branch_sample_type essentially doesn't represent what the guest pmu driver will write to LBR_SELECT. Meanwhile, the host lbr driver doesn't configure the lbr msrs, including the LBR_SELECT msr, for the vcpu thread case, as the pmu driver inside the vcpu will do that. So for the vcpu case, add the LBR_SELECT save/restore to ensure what the guest writes to the LBR_SELECT msr doesn't get lost during the vcpu context switching. Cc: Peter Zijlstra Cc: Andi Kleen Cc: Kan Liang Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 7 +++ arch/x86/events/perf_event.h | 1 + 2 files changed, 8 insertions(+) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index a0f3686..236f8352 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -390,6 +390,9 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx) wrmsrl(x86_pmu.lbr_tos, tos); task_ctx->lbr_stack_state = LBR_NONE; + + if (cpuc->vcpu_lbr) + wrmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel); } static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx) @@ -416,6 +419,10 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx) if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]); } + + if (cpuc->vcpu_lbr) + rdmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel); + task_ctx->valid_lbrs = i; task_ctx->tos = tos; task_ctx->lbr_stack_state = LBR_VALID; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 8b90a25..0b2f660 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -699,6 +699,7 @@ struct x86_perf_task_context { u64 lbr_from[MAX_LBR_ENTRIES]; u64 lbr_to[MAX_LBR_ENTRIES]; u64 lbr_info[MAX_LBR_ENTRIES]; + u64 lbr_sel; int tos; int valid_lbrs; int lbr_callstack_users; -- 2.7.4
[PATCH v8 13/14] KVM/x86/vPMU: check the lbr feature before entering guest
The guest can access the lbr related msrs only when the vcpu's lbr event has been assigned the lbr feature. A cpu pinned lbr event (though no such event usages in the current upstream kernel) could reclaim the lbr feature from the vcpu's lbr event (task pinned) via ipi calls. If the cpu is running in the non-root mode, this will cause the cpu to vm-exit to handle the host ipi and then vm-entry back to the guest. So on vm-entry (where interrupt has been disabled), we double confirm that the vcpu's lbr event is still assigned the lbr feature via checking event->oncpu. The pass-through of the lbr related msrs will be cancelled if the lbr is reclaimed, and the following guest accesses to the lbr related msrs will vm-exit to the related msr emulation handler in kvm, which will prevent the accesses. Signed-off-by: Wei Wang --- arch/x86/kvm/pmu.c | 6 ++ arch/x86/kvm/pmu.h | 3 +++ arch/x86/kvm/vmx/pmu_intel.c | 35 +++ arch/x86/kvm/x86.c | 13 + 4 files changed, 57 insertions(+) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index afad092..ed10a57 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -339,6 +339,12 @@ bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu) return false; } +void kvm_pmu_enabled_feature_confirm(struct kvm_vcpu *vcpu) +{ + if (kvm_x86_ops->pmu_ops->enabled_feature_confirm) + kvm_x86_ops->pmu_ops->enabled_feature_confirm(vcpu); +} + void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu) { if (lapic_in_kernel(vcpu)) diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index f875721..7467907 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -30,6 +30,7 @@ struct kvm_pmu_ops { int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx); bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr); bool (*lbr_enable)(struct kvm_vcpu *vcpu); + void (*enabled_feature_confirm)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void (*sched_in)(struct kvm_vcpu *vcpu, int cpu); @@ -126,6 +127,8 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); bool is_vmware_backdoor_pmc(u32 pmc_idx); +void kvm_pmu_enabled_feature_confirm(struct kvm_vcpu *vcpu); + extern struct kvm_pmu_ops intel_pmu_ops; extern struct kvm_pmu_ops amd_pmu_ops; #endif /* __KVM_X86_PMU_H */ diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 5580f1a..421051aa 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -781,6 +781,40 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) intel_pmu_free_lbr_event(vcpu); } +void intel_pmu_lbr_confirm(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + + /* +* Either lbr_event being NULL or lbr_used being false indicates that +* the lbr msrs haven't been passed through to the guest, so no need +* to cancel passthrough. +*/ + if (!pmu->lbr_event || !pmu->lbr_used) + return; + + /* +* The lbr feature gets reclaimed via IPI calls, so checking of +* lbr_event->oncpu needs to be in an atomic context. Just confirm +* that irq has been disabled already. +*/ + lockdep_assert_irqs_disabled(); + + /* +* Cancel the pass-through of the lbr msrs if lbr has been reclaimed +* by the host perf. +*/ + if (pmu->lbr_event->oncpu != -1) { + pmu->lbr_used = false; + intel_pmu_set_intercept_for_lbr_msrs(vcpu, true); + } +} + +void intel_pmu_enabled_feature_confirm(struct kvm_vcpu *vcpu) +{ + intel_pmu_lbr_confirm(vcpu); +} + struct kvm_pmu_ops intel_pmu_ops = { .find_arch_event = intel_find_arch_event, .find_fixed_event = intel_find_fixed_event, @@ -790,6 +824,7 @@ struct kvm_pmu_ops intel_pmu_ops = { .is_valid_msr_idx = intel_is_valid_msr_idx, .is_valid_msr = intel_is_valid_msr, .lbr_enable = intel_pmu_lbr_enable, + .enabled_feature_confirm = intel_pmu_enabled_feature_confirm, .get_msr = intel_pmu_get_msr, .set_msr = intel_pmu_set_msr, .sched_in = intel_pmu_sched_in, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b76f019..efaf0e8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7985,6 +7985,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_mb__after_srcu_read_unlock(); /* +* Higher priority host perf events (e.g. cpu pinned) could reclaim the +* pmu resources (e.g. lbr) that were assigned to the vcpu. This is +* usually done via ipi calls (see perf_install_in_context for +* details). +* +* Before entering the non-root mode (with irq disabled here), double +
[PATCH v8 09/14] KVM/x86/vPMU: APIs to create/free lbr perf event for a vcpu thread
VMX transition is much more frequent than vcpu switching, and saving/restoring tens of lbr msrs (e.g. 32 lbr stack entries) would add too much overhead to the frequent vmx transition, which is not necessary. So the vcpu's lbr state only gets saved/restored on the vcpu context switching. The main purposes of using the vcpu's lbr perf event are - follow the host perf scheduling rules to manage the vcpu's usage of lbr (e.g. a cpu pinned lbr event could reclaim lbr and thus stopping the vcpu's use); - have the host perf do context switching of the lbr state on the vcpu thread context switching. Please see the comments in intel_pmu_create_lbr_event for more details. To achieve the pure lbr emulation, the perf event is created only to claim for the lbr feature, and no perf counter is needed for it. The vcpu_lbr field is added to indicate to the host lbr driver that the lbr is currently assigned to a vcpu to use. The guest driver inside the vcpu has its own logic to use the lbr, thus the host side lbr driver doesn't need to enable and use the lbr feature in this case. Some design choice considerations: - Why using "is_kernel_event", instead of checking the PF_VCPU flag, to determine that it is a vcpu perf event for lbr emulation? This is because PF_VCPU is set right before vm-entry into the guest, and cleared after the guest vm-exits to the host. So that flag doesn't remain set when running the host code. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Co-developed-by: Like Xu Signed-off-by: Like Xu Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 38 ++-- arch/x86/events/perf_event.h| 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx/pmu_intel.c| 64 + include/linux/perf_event.h | 7 + kernel/events/core.c| 7 - 6 files changed, 108 insertions(+), 10 deletions(-) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 9b2d05c..4f4bd18 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -462,6 +462,14 @@ void intel_pmu_lbr_add(struct perf_event *event) if (!x86_pmu.lbr_nr) return; + /* +* An lbr event without a counter indicates this is for the vcpu lbr +* emulation, so set the vcpu_lbr flag when the vcpu lbr event +* gets scheduled on the lbr here. +*/ + if (is_no_counter_event(event)) + cpuc->vcpu_lbr = 1; + cpuc->br_sel = event->hw.branch_reg.reg; if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) { @@ -509,6 +517,14 @@ void intel_pmu_lbr_del(struct perf_event *event) task_ctx->lbr_callstack_users--; } + /* +* An lbr event without a counter indicates this is for the vcpu lbr +* emulation, so clear the vcpu_lbr flag when the vcpu's lbr event +* gets scheduled out from the lbr. +*/ + if (is_no_counter_event(event)) + cpuc->vcpu_lbr = 0; + if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip > 0) cpuc->lbr_pebs_users--; cpuc->lbr_users--; @@ -521,7 +537,12 @@ void intel_pmu_lbr_enable_all(bool pmi) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - if (cpuc->lbr_users) + /* +* The vcpu lbr emulation doesn't need host to enable lbr at this +* point, because the guest will set the enabling at a proper time +* itself. +*/ + if (cpuc->lbr_users && !cpuc->vcpu_lbr) __intel_pmu_lbr_enable(pmi); } @@ -529,7 +550,11 @@ void intel_pmu_lbr_disable_all(void) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - if (cpuc->lbr_users) + /* +* Same as intel_pmu_lbr_enable_all, the guest is responsible for +* clearing the enabling itself. +*/ + if (cpuc->lbr_users && !cpuc->vcpu_lbr) __intel_pmu_lbr_disable(); } @@ -668,8 +693,12 @@ void intel_pmu_lbr_read(void) * * This could be smarter and actually check the event, * but this simple approach seems to work for now. +* +* And no need to read the lbr msrs here if the vcpu lbr event +* is using it, as the guest will read them itself. */ - if (!cpuc->lbr_users || cpuc->lbr_users == cpuc->lbr_pebs_users) + if (!cpuc->lbr_users || cpuc->vcpu_lbr || + cpuc->lbr_users == cpuc->lbr_pebs_users) return; if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32) @@ -802,6 +831,9 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event) if (!x86_pmu.lbr_nr) return -EOPNOTSUPP; + if (event->attr.exclude_host && is_kernel_event(event)) + perf_event_set_no_counter(event); + /* * setup SW LBR filter */ diff --git a/ar
[PATCH v8 07/14] perf/x86: support to create a perf event without counter allocation
Hypervisors may create an lbr event for a vcpu's lbr emulation, and the emulation doesn't need a counter fundamentally. This makes the emulation follow the x86 SDM's description about lbr, which doesn't include a counter, and also avoids wasting a counter. The perf scheduler is supported to not assign a counter for a perf event which doesn't need a counter. Define a macro, X86_PMC_IDX_NA, to replace "-1", which represents a never assigned counter id. Cc: Andi Kleen Cc: Peter Zijlstra Signed-off-by: Wei Wang https://lkml.kernel.org/r/20180920162407.ga24...@hirez.programming.kicks-ass.net --- arch/x86/events/core.c| 36 +++- arch/x86/events/intel/core.c | 3 +++ arch/x86/include/asm/perf_event.h | 1 + include/linux/perf_event.h| 11 +++ 4 files changed, 42 insertions(+), 9 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 81b005e..ffa27bb 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -73,7 +73,7 @@ u64 x86_perf_event_update(struct perf_event *event) int idx = hwc->idx; u64 delta; - if (idx == INTEL_PMC_IDX_FIXED_BTS) + if ((idx == INTEL_PMC_IDX_FIXED_BTS) || (idx == X86_PMC_IDX_NA)) return 0; /* @@ -595,7 +595,7 @@ static int __x86_pmu_event_init(struct perf_event *event) atomic_inc(&active_events); event->destroy = hw_perf_event_destroy; - event->hw.idx = -1; + event->hw.idx = X86_PMC_IDX_NA; event->hw.last_cpu = -1; event->hw.last_tag = ~0ULL; @@ -763,6 +763,8 @@ static bool perf_sched_restore_state(struct perf_sched *sched) static bool __perf_sched_find_counter(struct perf_sched *sched) { struct event_constraint *c; + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct perf_event *e = cpuc->event_list[sched->state.event]; int idx; if (!sched->state.unassigned) @@ -772,6 +774,14 @@ static bool __perf_sched_find_counter(struct perf_sched *sched) return false; c = sched->constraints[sched->state.event]; + if (c == &emptyconstraint) + return false; + + if (is_no_counter_event(e)) { + idx = X86_PMC_IDX_NA; + goto done; + } + /* Prefer fixed purpose counters */ if (c->idxmsk64 & (~0ULL << INTEL_PMC_IDX_FIXED)) { idx = INTEL_PMC_IDX_FIXED; @@ -797,7 +807,7 @@ static bool __perf_sched_find_counter(struct perf_sched *sched) done: sched->state.counter = idx; - if (c->overlap) + if ((idx != X86_PMC_IDX_NA) && c->overlap) perf_sched_save_state(sched); return true; @@ -918,7 +928,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign) c = cpuc->event_constraint[i]; /* never assigned */ - if (hwc->idx == -1) + if (hwc->idx == X86_PMC_IDX_NA) break; /* constraint still honored */ @@ -969,7 +979,8 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign) if (!unsched && assign) { for (i = 0; i < n; i++) { e = cpuc->event_list[i]; - if (x86_pmu.commit_scheduling) + if (x86_pmu.commit_scheduling && + (assign[i] != X86_PMC_IDX_NA)) x86_pmu.commit_scheduling(cpuc, i, assign[i]); } } else { @@ -1038,7 +1049,8 @@ static inline void x86_assign_hw_event(struct perf_event *event, hwc->last_cpu = smp_processor_id(); hwc->last_tag = ++cpuc->tags[i]; - if (hwc->idx == INTEL_PMC_IDX_FIXED_BTS) { + if ((hwc->idx == INTEL_PMC_IDX_FIXED_BTS) || + (hwc->idx == X86_PMC_IDX_NA)) { hwc->config_base = 0; hwc->event_base = 0; } else if (hwc->idx >= INTEL_PMC_IDX_FIXED) { @@ -1115,7 +1127,7 @@ static void x86_pmu_enable(struct pmu *pmu) * - running on same CPU as last time * - no other event has used the counter since */ - if (hwc->idx == -1 || + if (hwc->idx == X86_PMC_IDX_NA || match_prev_assignment(hwc, cpuc, i)) continue; @@ -1169,7 +1181,7 @@ int x86_perf_event_set_period(struct perf_event *event) s64 period = hwc->sample_period; int ret = 0, idx = hwc->idx; - if (idx == INTEL_PMC_IDX_FIXED_BTS) + if ((idx == INTEL_PMC_IDX_FIXED_BTS) || (idx == X86_PMC_IDX_NA)) return 0; /* @@ -1306,7 +1318,7 @@ static void x86_pmu_start(struct perf_event *event, int flags) if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) return; - if (WARN_ON
[PATCH v8 08/14] perf/core: set the event->owner before event_init
Kernel and user events can be distinguished by checking event->owner. Some pmu driver implementation may need to know event->owner in event_init. For example, intel_pmu_setup_lbr_filter treats a kernel event with exclude_host set as an lbr event created for guest lbr emulation, which doesn't need a pmu counter. So move the event->owner assignment into perf_event_alloc to have it set before event_init is called. Signed-off-by: Wei Wang --- kernel/events/core.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 0463c11..7663f85 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10288,6 +10288,7 @@ static void account_event(struct perf_event *event) static struct perf_event * perf_event_alloc(struct perf_event_attr *attr, int cpu, struct task_struct *task, +struct task_struct *owner, struct perf_event *group_leader, struct perf_event *parent_event, perf_overflow_handler_t overflow_handler, @@ -10340,6 +10341,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, event->group_leader = group_leader; event->pmu = NULL; event->oncpu= -1; + event->owner= owner; event->parent = parent_event; @@ -10891,7 +10893,7 @@ SYSCALL_DEFINE5(perf_event_open, if (flags & PERF_FLAG_PID_CGROUP) cgroup_fd = pid; - event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, + event = perf_event_alloc(&attr, cpu, task, current, group_leader, NULL, NULL, NULL, cgroup_fd); if (IS_ERR(event)) { err = PTR_ERR(event); @@ -11153,8 +11155,6 @@ SYSCALL_DEFINE5(perf_event_open, perf_event__header_size(event); perf_event__id_header_size(event); - event->owner = current; - perf_install_in_context(ctx, event, event->cpu); perf_unpin_context(ctx); @@ -11231,16 +11231,13 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, * Get the target context (task or percpu): */ - event = perf_event_alloc(attr, cpu, task, NULL, NULL, + event = perf_event_alloc(attr, cpu, task, TASK_TOMBSTONE, NULL, NULL, overflow_handler, context, -1); if (IS_ERR(event)) { err = PTR_ERR(event); goto err; } - /* Mark owner so we could distinguish it from user events. */ - event->owner = TASK_TOMBSTONE; - ctx = find_get_context(event->pmu, task, event); if (IS_ERR(ctx)) { err = PTR_ERR(ctx); @@ -11677,6 +11674,7 @@ inherit_event(struct perf_event *parent_event, child_event = perf_event_alloc(&parent_event->attr, parent_event->cpu, + parent_event->owner, child, group_leader, parent_event, NULL, NULL, -1); -- 2.7.4
[PATCH v8 10/14] perf/x86/lbr: don't share lbr for the vcpu usage case
Perf event scheduling lets multiple lbr events share the lbr if they use the same config for LBR_SELECT. For the vcpu case, the vcpu's lbr event created on the host deliberately sets the config to the user callstack mode to have the host support to save/restore the lbr state on vcpu context switching, and the config won't be written to the LBR_SELECT, as the LBR_SELECT is configured by the guest, which might not be the same as the user callstack mode. So don't allow the vcpu's lbr event to share lbr with other host lbr events. Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 4f4bd18..a0f3686 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -45,6 +45,12 @@ static const enum { #define LBR_CALL_STACK_BIT 9 /* enable call stack */ /* + * Set this hardware reserved bit if the lbr perf event is for the vcpu lbr + * emulation. This makes the reg->config different from other regular lbr + * events' config, so that they will not share the lbr feature. + */ +#define LBR_VCPU_BIT 62 +/* * Following bit only exists in Linux; we mask it out before writing it to * the actual MSR. But it helps the constraint perf code to understand * that this is a separate configuration. @@ -62,6 +68,7 @@ static const enum { #define LBR_FAR(1 << LBR_FAR_BIT) #define LBR_CALL_STACK (1 << LBR_CALL_STACK_BIT) #define LBR_NO_INFO(1ULL << LBR_NO_INFO_BIT) +#define LBR_VCPU (1ULL << LBR_VCPU_BIT) #define LBR_PLM (LBR_KERNEL | LBR_USER) @@ -818,6 +825,26 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event) (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)) reg->config |= LBR_NO_INFO; + /* +* An lbr perf event without a counter indicates this is for the vcpu +* lbr emulation. The vcpu lbr emulation does not allow the lbr +* feature to be shared with other lbr events on the host, because the +* LBR_SELECT msr is configured by the guest itself. The reg->config +* is deliberately configured to be user call stack mode via the +* related attr fileds to get the host perf's help to save/restore the +* lbr state on vcpu context switching. It doesn't represent what +* LBR_SELECT will be configured. +* +* Set the reserved LBR_VCPU bit for the vcpu usage case, so that the +* vcpu's lbr perf event will not share the lbr feature with other perf +* events. (see __intel_shared_reg_get_constraints, failing to share +* makes it return the emptyconstraint, which finally makes +* x86_schedule_events fail to schedule the lower priority lbr event on +* the lbr feature). +*/ + if (is_no_counter_event(event)) + reg->config |= LBR_VCPU; + return 0; } -- 2.7.4
Re: [PATCH V3 2/2] cpufreq: intel_pstate: Implement ->resolve_freq()
On Tue, Aug 6, 2019 at 6:10 AM Viresh Kumar wrote: > > On 02-08-19, 11:28, Rafael J. Wysocki wrote: > > On Friday, August 2, 2019 11:17:55 AM CEST Rafael J. Wysocki wrote: > > > On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar > > > wrote: > > > > > > > > Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files, > > > > which can be used to force a limit on the min/max P state of the driver. > > > > Though these files eventually control the min/max frequencies that the > > > > CPUs will run at, they don't make a change to policy->min/max values. > > > > > > That's correct. > > > > > > > When the values of these files are changed (in passive mode of the > > > > driver), it leads to calling ->limits() callback of the cpufreq > > > > governors, like schedutil. On a call to it the governors shall > > > > forcefully update the frequency to come within the limits. > > > > > > OK, so the problem is that it is a bug to invoke the governor's ->limits() > > > callback without updating policy->min/max, because that's what > > > "limits" mean to the governors. > > > > > > Fair enough. > > > > AFAICS this can be addressed by adding PM QoS freq limits requests of each > > CPU to > > intel_pstate in the passive mode such that changing min_perf_pct or > > max_perf_pct > > will cause these requests to be updated. > > Right, that sounds like a good plan. > > But that will never make it to the stable kernels as there will be a > long dependency of otherwise unrelated patches to get that done. My > initial thought was to get this patch merged as it is and then later > migrate to QoS, but since this patch doesn't fix ondemand and > conservative, this patch isn't good enough as well. Right. > Maybe we should add the regular notifier based solution first, mark it > for stable kernels, and then add the QoS specific solution ? I'm not sure if -stable kernels really need a fix here. Let's just make sure that the mainline is OK and let's go straight for the final approach.
[PATCH v1] drivers/base/memory.c: Fixup documentation of removable/phys_index/block_size_bytes
Let's rephrase to memory block terminology and add some further clarifications. Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Andrew Morton Cc: Michal Hocko Cc: Oscar Salvador Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index cb80f2bdd7de..790b3bcd63a6 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -116,10 +116,8 @@ static unsigned long get_memory_block_size(void) } /* - * use this as the physical section index that this memsection - * uses. + * Show the first physical section index (number) of this memory block. */ - static ssize_t phys_index_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -131,7 +129,10 @@ static ssize_t phys_index_show(struct device *dev, } /* - * Show whether the section of memory is likely to be hot-removable + * Show whether the memory block is likely to be offlineable (or is already + * offline). Once offline, the memory block could be removed. The return + * value does, however, not indicate that there is a way to remove the + * memory block. */ static ssize_t removable_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -455,7 +456,7 @@ static DEVICE_ATTR_RO(phys_device); static DEVICE_ATTR_RO(removable); /* - * Block size attribute stuff + * Show the memory block size (shared by all memory blocks). */ static ssize_t block_size_bytes_show(struct device *dev, struct device_attribute *attr, char *buf) -- 2.21.0
Re: [PATCH v3 4/8] printk: Replace strncmp with str_has_prefix
Hi Chuhong, On Mon, Aug 5, 2019 at 2:24 PM Chuhong Yuan wrote: > strncmp(str, const, len) is error-prone because len > is easy to have typo. > The example is the hard-coded len has counting error > or sizeof(const) forgets - 1. > So we prefer using newly introduced str_has_prefix() > to substitute such strncmp to make code better. > > Signed-off-by: Chuhong Yuan Thanks for your patch! > --- a/kernel/printk/braille.c > +++ b/kernel/printk/braille.c > @@ -11,11 +11,13 @@ > > int _braille_console_setup(char **str, char **brl_options) > { > - if (!strncmp(*str, "brl,", 4)) { > + size_t len; > + > + if ((len = str_has_prefix(*str, "brl,"))) { Please write this as len = str_has_prefix(*str, "brl,"); if (len) { (everywhere) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v3 1/4] serial: mctrl_gpio: Avoid probe failures in case of missing gpiolib
On Fri, Aug 2, 2019 at 12:04 PM Schrempf Frieder wrote: > From: Frieder Schrempf > > If CONFIG_GPIOLIB is not enabled, mctrl_gpio_init() and > mctrl_gpio_init_noauto() will currently return an error pointer with > -ENOSYS. As the mctrl GPIOs are usually optional, drivers need to > check for this condition to allow continue probing. > > To avoid the need for this check in each driver, we return NULL > instead, as all the mctrl_gpio_*() functions are skipped anyway. > We also adapt mctrl_gpio_to_gpiod() to be in line with this change. > > Reviewed-by: Fabio Estevam > Signed-off-by: Frieder Schrempf Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v3 3/4] serial: sh-sci: Don't check for mctrl_gpio_init() returning -ENOSYS
On Fri, Aug 2, 2019 at 12:04 PM Schrempf Frieder wrote: > From: Frieder Schrempf > > Now that the mctrl_gpio code returns NULL instead of ERR_PTR(-ENOSYS) > if CONFIG_GPIOLIB is disabled, we can safely remove this check. > > Signed-off-by: Frieder Schrempf Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 1/2] vfio-mdev/mtty: Simplify interrupt generation
On Fri, 2 Aug 2019 01:59:04 -0500 Parav Pandit wrote: > While generating interrupt, mdev_state is already available for which > interrupt is generated. > Instead of doing indirect way from state->device->uuid-> to searching > state linearly in linked list on every interrupt generation, > directly use the available state. > > Hence, simplify the code to use mdev_state and remove unused helper > function with that. > > Signed-off-by: Parav Pandit > --- > samples/vfio-mdev/mtty.c | 39 --- > 1 file changed, 8 insertions(+), 31 deletions(-) This is sample code, so no high impact; but it makes sense to set a good example. Reviewed-by: Cornelia Huck
[PATCH] mm/mmap.c: refine data locality of find_vma_prev
When addr is out of the range of the whole rb_tree, pprev will points to the biggest node. find_vma_prev gets is by going through the right most node of the tree. Since only the last node is the one it is looking for, it is not necessary to assign pprev to those middle stage nodes. By assigning pprev to the last node directly, it tries to improve the function locality a little. Signed-off-by: Wei Yang --- mm/mmap.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index 7e8c3e8ae75f..284bc7e51f9c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2271,11 +2271,10 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, *pprev = vma->vm_prev; } else { struct rb_node *rb_node = mm->mm_rb.rb_node; - *pprev = NULL; - while (rb_node) { - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); + while (rb_node && rb_node->rb_right) rb_node = rb_node->rb_right; - } + *pprev = rb_node ? NULL +: rb_entry(rb_node, struct vm_area_struct, vm_rb); } return vma; } -- 2.17.1
Re: pivot_root(".", ".") and the fchdir() dance
Hello Michael, hello Aleksa, Am 05.08.19 um 14:29 schrieb Michael Kerrisk (man-pages): > On 8/5/19 12:36 PM, Aleksa Sarai wrote: >> On 2019-08-01, Michael Kerrisk (man-pages) wrote: >>> I'd like to add some documentation about the pivot_root(".", ".") >>> idea, but I have a doubt/question. In the lxc_pivot_root() code we >>> have these steps >>> >>> oldroot = open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC); >>> newroot = open(rootfs, O_DIRECTORY | O_RDONLY | O_CLOEXEC); >>> >>> fchdir(newroot); >>> pivot_root(".", "."); >>> >>> fchdir(oldroot); // >>> >>> mount("", ".", "", MS_SLAVE | MS_REC, NULL); >>> umount2(".", MNT_DETACH); >> >>> fchdir(newroot); // >> >> And this one is required because we are in @oldroot at this point, due >> to the first fchdir(2). If we don't have the first one, then switching >> from "." to "/" in the mount/umount2 calls should fix the issue. > > See my notes above for why I therefore think that the second fchdir() > is also not needed (and therefore why switching from "." to "/" in the > mount()/umount2() calls is unnecessary. > > Do you agree with my analysis? If both the second and third fchdir are not required, then we do not need to bother with file descriptors at all, right? Indeed, my tests show that the following seems to work fine: chdir(rootfs) pivot_root(".", ".") umount2(".", MNT_DETACH) I tested that with my own tool[1] that uses user namespaces and marks everything MS_PRIVATE before, so I do not need the mount(MS_SLAVE) here. And it works the same with both umount2("/") and umount2("."). Did I overlook something that makes the file descriptors required? If not, wouldn't the above snippet make sense as example in the man page? Greetings Philipp [1]: https://github.com/sosy-lab/benchexec/blob/b90aeb034b867711845a453587b73fbe8e4dca68/benchexec/container.py#L735
Re: [PATCH v2 1/4] clk: core: introduce clk_hw_set_parent()
On Wed 31 Jul 2019 at 10:40, Neil Armstrong wrote: > Introduce the clk_hw_set_parent() provider call to change parent of > a clock by using the clk_hw pointers. > > This eases the clock reparenting from clock rate notifiers and > implementing DVFS with simpler code avoiding the boilerplates > functions as __clk_lookup(clk_hw_get_name()) then clk_set_parent(). > > Signed-off-by: Neil Armstrong > Acked-by: Martin Blumenstingl Looks ok to me but we will obviously need Stephen's ack to apply it > --- > drivers/clk/clk.c| 6 ++ > include/linux/clk-provider.h | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index c0990703ce54..c11b1781d24a 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -2487,6 +2487,12 @@ static int clk_core_set_parent_nolock(struct clk_core > *core, > return ret; > } > > +int clk_hw_set_parent(struct clk_hw *hw, struct clk_hw *parent) > +{ > + return clk_core_set_parent_nolock(hw->core, parent->core); > +} > +EXPORT_SYMBOL_GPL(clk_hw_set_parent); > + > /** > * clk_set_parent - switch the parent of a mux clk > * @clk: the mux clk whose input we are switching > diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h > index 2ae7604783dd..dce5521a9bf6 100644 > --- a/include/linux/clk-provider.h > +++ b/include/linux/clk-provider.h > @@ -817,6 +817,7 @@ unsigned int clk_hw_get_num_parents(const struct clk_hw > *hw); > struct clk_hw *clk_hw_get_parent(const struct clk_hw *hw); > struct clk_hw *clk_hw_get_parent_by_index(const struct clk_hw *hw, > unsigned int index); > +int clk_hw_set_parent(struct clk_hw *hw, struct clk_hw *new_parent); > unsigned int __clk_get_enable_count(struct clk *clk); > unsigned long clk_hw_get_rate(const struct clk_hw *hw); > unsigned long __clk_get_flags(struct clk *clk); > -- > 2.22.0
Re: [PATCH 2/2] vfio/mdev: Removed unused and redundant API for mdev name
On Fri, 2 Aug 2019 01:59:05 -0500 Parav Pandit wrote: > There is no single production driver who is interested in mdev device > name. > Additionally mdev device name is already available using core kernel > API dev_name(). The patch description is a bit confusing: You talk about removing an api to access the device name, but what you are actually removing is the api to access the device's uuid. That uuid is, of course, used to generate the device name, but the two are not the same. Using dev_name() gives you a string containing the uuid, not the uuid. > > Hence removed unused exported symbol. I'm not really against removing this api if no driver has interest in the device's uuid (and I'm currently not seeing why they would need it; we can easily add it back, should the need arise); but this needs a different description. > > Signed-off-by: Parav Pandit > --- > drivers/vfio/mdev/mdev_core.c | 6 -- > include/linux/mdev.h | 1 - > 2 files changed, 7 deletions(-) > > diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c > index b558d4cfd082..c2b809cbe59f 100644 > --- a/drivers/vfio/mdev/mdev_core.c > +++ b/drivers/vfio/mdev/mdev_core.c > @@ -57,12 +57,6 @@ struct mdev_device *mdev_from_dev(struct device *dev) > } > EXPORT_SYMBOL(mdev_from_dev); > > -const guid_t *mdev_uuid(struct mdev_device *mdev) > -{ > - return &mdev->uuid; > -} > -EXPORT_SYMBOL(mdev_uuid); > - > /* Should be called holding parent_list_lock */ > static struct mdev_parent *__find_parent_device(struct device *dev) > { > diff --git a/include/linux/mdev.h b/include/linux/mdev.h > index 0ce30ca78db0..375a5830c3d8 100644 > --- a/include/linux/mdev.h > +++ b/include/linux/mdev.h > @@ -131,7 +131,6 @@ struct mdev_driver { > > void *mdev_get_drvdata(struct mdev_device *mdev); > void mdev_set_drvdata(struct mdev_device *mdev, void *data); > -const guid_t *mdev_uuid(struct mdev_device *mdev); > > extern struct bus_type mdev_bus_type; >
Re: kernel BUG at mm/vmscan.c:LINE! (2)
On Sat 03-08-19 05:06:43, Minchan Kim wrote: > On Fri, Aug 02, 2019 at 10:58:05AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:0d8b3265 Add linux-next specific files for 20190729 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=1663c7d060 > > kernel config: https://syzkaller.appspot.com/x/.config?x=ae96f3b8a7e885f7 > > dashboard link: https://syzkaller.appspot.com/bug?extid=8e6326965378936537c3 > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=133c437c60 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1564585460 > > > > The bug was bisected to: > > > > commit 06a833a1167e9cbb43a9a4317ec24585c6ec85cb > > Author: Minchan Kim > > Date: Sat Jul 27 05:12:38 2019 + > > > > mm: introduce MADV_PAGEOUT > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1545f76460 > > final crash:https://syzkaller.appspot.com/x/report.txt?x=1745f76460 > > console output: https://syzkaller.appspot.com/x/log.txt?x=1345f76460 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+8e632696537893653...@syzkaller.appspotmail.com > > Fixes: 06a833a1167e ("mm: introduce MADV_PAGEOUT") > > > > raw: 01fffc090025 dead0100 dead0122 88809c49f741 > > raw: 0002 0002 88821b6eaac0 > > page dumped because: VM_BUG_ON_PAGE(PageActive(page)) > > page->mem_cgroup:88821b6eaac0 > > [ cut here ] > > kernel BUG at mm/vmscan.c:1156! > > invalid opcode: [#1] PREEMPT SMP KASAN > > CPU: 1 PID: 9846 Comm: syz-executor110 Not tainted 5.3.0-rc2-next-20190729 > > #54 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > RIP: 0010:shrink_page_list+0x2872/0x5430 mm/vmscan.c:1156 > > My old version had PG_active flag clear but it seems to lose it with revising > patchsets. Thanks, Sizbot! > > >From 66d64988619ef7e86b0002b2fc20fdf5b84ad49c Mon Sep 17 00:00:00 2001 > From: Minchan Kim > Date: Sat, 3 Aug 2019 04:54:02 +0900 > Subject: [PATCH] mm: Clear PG_active on MADV_PAGEOUT > > shrink_page_list expects every pages as argument should be no active > LRU pages so we need to clear PG_active. Ups, missed that during review. > > Reported-by: syzbot+8e632696537893653...@syzkaller.appspotmail.com > Fixes: 06a833a1167e ("mm: introduce MADV_PAGEOUT") This is not a valid sha1 because it likely comes from linux-next. I guess Andrew will squash it into mm-introduce-madv_pageout.patch Just for the record Acked-by: Michal Hocko And thanks for syzkaller to exercise the new interface so quickly! > Signed-off-by: Minchan Kim > --- > mm/vmscan.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 47aa2158cfac2..e2a8d3f5bbe48 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2181,6 +2181,7 @@ unsigned long reclaim_pages(struct list_head *page_list) > } > > if (nid == page_to_nid(page)) { > + ClearPageActive(page); > list_move(&page->lru, &node_page_list); > continue; > } > -- > 2.22.0.770.g0f2c4a37fd-goog -- Michal Hocko SUSE Labs
[PATCH] scsi: fas216: Mark expected switch fall-throughs
Mark switch cases where we are expecting to fall through. Fix the following warnings (Building: rpc_defconfig arm): drivers/scsi/arm/fas216.c: In function ‘fas216_disconnect_intr’: drivers/scsi/arm/fas216.c:913:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (fas216_get_last_msg(info, info->scsi.msgin_fifo) == ABORT) { ^ drivers/scsi/arm/fas216.c:919:2: note: here default:/* huh? */ ^~~ drivers/scsi/arm/fas216.c: In function ‘fas216_kick’: drivers/scsi/arm/fas216.c:1959:3: warning: this statement may fall through [-Wimplicit-fallthrough=] fas216_allocate_tag(info, SCpnt); ^~~~ drivers/scsi/arm/fas216.c:1960:2: note: here case TYPE_OTHER: ^~~~ drivers/scsi/arm/fas216.c: In function ‘fas216_busservice_intr’: drivers/scsi/arm/fas216.c:1413:3: warning: this statement may fall through [-Wimplicit-fallthrough=] fas216_stoptransfer(info); ^ drivers/scsi/arm/fas216.c:1414:2: note: here case STATE(STAT_STATUS, PHASE_SELSTEPS):/* Sel w/ steps -> Status */ ^~~~ drivers/scsi/arm/fas216.c:1424:3: warning: this statement may fall through [-Wimplicit-fallthrough=] fas216_stoptransfer(info); ^ drivers/scsi/arm/fas216.c:1425:2: note: here case STATE(STAT_MESGIN, PHASE_COMMAND): /* Command -> Message In */ ^~~~ drivers/scsi/arm/fas216.c: In function ‘fas216_funcdone_intr’: drivers/scsi/arm/fas216.c:1573:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if ((stat & STAT_BUSMASK) == STAT_MESGIN) { ^ drivers/scsi/arm/fas216.c:1579:2: note: here default: ^~~ drivers/scsi/arm/fas216.c: In function ‘fas216_handlesync’: drivers/scsi/arm/fas216.c:605:20: warning: this statement may fall through [-Wimplicit-fallthrough=] info->scsi.phase = PHASE_MSGOUT_EXPECT; ~^ drivers/scsi/arm/fas216.c:607:2: note: here case async: ^~~~ Signed-off-by: Gustavo A. R. Silva --- drivers/scsi/arm/fas216.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/scsi/arm/fas216.c b/drivers/scsi/arm/fas216.c index aea4fd73c862..6c68c2303638 100644 --- a/drivers/scsi/arm/fas216.c +++ b/drivers/scsi/arm/fas216.c @@ -603,6 +603,7 @@ static void fas216_handlesync(FAS216_Info *info, char *msg) msgqueue_flush(&info->scsi.msgs); msgqueue_addmsg(&info->scsi.msgs, 1, MESSAGE_REJECT); info->scsi.phase = PHASE_MSGOUT_EXPECT; + /* fall through */ case async: dev->period = info->ifcfg.asyncperiod / 4; @@ -915,6 +916,7 @@ static void fas216_disconnect_intr(FAS216_Info *info) fas216_done(info, DID_ABORT); break; } + /* else, fall through */ default:/* huh? */ printk(KERN_ERR "scsi%d.%c: unexpected disconnect in phase %s\n", @@ -1411,6 +1413,8 @@ static void fas216_busservice_intr(FAS216_Info *info, unsigned int stat, unsigne case STATE(STAT_STATUS, PHASE_DATAOUT): /* Data Out -> Status */ case STATE(STAT_STATUS, PHASE_DATAIN): /* Data In -> Status */ fas216_stoptransfer(info); + /* fall through */ + case STATE(STAT_STATUS, PHASE_SELSTEPS):/* Sel w/ steps -> Status */ case STATE(STAT_STATUS, PHASE_MSGOUT): /* Message Out -> Status */ case STATE(STAT_STATUS, PHASE_COMMAND): /* Command -> Status */ @@ -1422,6 +1426,8 @@ static void fas216_busservice_intr(FAS216_Info *info, unsigned int stat, unsigne case STATE(STAT_MESGIN, PHASE_DATAOUT): /* Data Out -> Message In */ case STATE(STAT_MESGIN, PHASE_DATAIN): /* Data In -> Message In */ fas216_stoptransfer(info); + /* fall through */ + case STATE(STAT_MESGIN, PHASE_COMMAND): /* Command -> Message In */ case STATE(STAT_MESGIN, PHASE_SELSTEPS):/* Sel w/ steps -> Message In */ case STATE(STAT_MESGIN, PHASE_MSGOUT): /* Message Out -> Message In */ @@ -1575,6 +1581,7 @@ static void fas216_funcdone_intr(FAS216_Info *info, unsigned int stat, unsigned fas216_message(info); break; } + /* else, fall through */ default: fas216_log(info, 0, "internal phase %s for function done?" @@ -1957,6 +1964,7 @@ static void fas216_kick(FAS216_Info *info) switch (where_from) { case TYPE_QUEUE: fas216_allocate_tag(info, SCpnt); + /* fall through */ case TYPE_OTHER: fas216_start_command(info, SCpnt); break; -- 2.22.0
Re: [PATCH v9 04/11] x86/entry/64: Adapt assembly for PIE support
On Tue, Aug 06, 2019 at 07:08:51AM +0200, Borislav Petkov wrote: > On Mon, Aug 05, 2019 at 10:50:30AM -0700, Thomas Garnier wrote: > > I saw that %rdx was used for temporary usage and restored before the > > end so I assumed that it was not an option. > > PUSH_AND_CLEAR_REGS saves all regs earlier so I think you should be > able to use others. Like SAVE_AND_SWITCH_TO_KERNEL_CR3/RESTORE_CR3, for > example, uses %r15 and %r14. AFAICT the CONFIG_DEBUG_ENTRY thing he's changing is before we setup pt_regs. Also consider the UNWIND hint that's in there, it states we only have the IRET frame on stack, not a full regs set.
Re: [PATCH V2] fork: Improve error message for corrupted page tables
On 08/06/2019 01:23 PM, Vlastimil Babka wrote: > > On 8/6/19 5:05 AM, Sai Praneeth Prakhya wrote: >> When a user process exits, the kernel cleans up the mm_struct of the user >> process and during cleanup, check_mm() checks the page tables of the user >> process for corruption (E.g: unexpected page flags set/cleared). For >> corrupted page tables, the error message printed by check_mm() isn't very >> clear as it prints the loop index instead of page table type (E.g: Resident >> file mapping pages vs Resident shared memory pages). The loop index in >> check_mm() is used to index rss_stat[] which represents individual memory >> type stats. Hence, instead of printing index, print memory type, thereby >> improving error message. >> >> Without patch: >> -- >> [ 204.836425] mm/pgtable-generic.c:29: bad p4d >> 89eb4e92(80025f941467) >> [ 204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2 >> [ 204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5 >> [ 204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480 >> >> With patch: >> --- >> [ 69.815453] mm/pgtable-generic.c:29: bad p4d >> 84653642(80025ca37467) >> [ 69.815872] BUG: Bad rss-counter state mm:014a6c03 >> type:MM_FILEPAGES val:2 >> [ 69.815962] BUG: Bad rss-counter state mm:014a6c03 >> type:MM_ANONPAGES val:5 >> [ 69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480 >> >> Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so >> that it matches the other print statement. >> >> Cc: Ingo Molnar >> Cc: Vlastimil Babka >> Cc: Peter Zijlstra >> Cc: Andrew Morton >> Cc: Anshuman Khandual >> Acked-by: Dave Hansen >> Suggested-by: Dave Hansen >> Signed-off-by: Sai Praneeth Prakhya > > Acked-by: Vlastimil Babka > > I would also add something like this to reduce risk of breaking it in the > future: > > 8< > diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h > index d7016dcb245e..a6f83cbe4603 100644 > --- a/include/linux/mm_types_task.h > +++ b/include/linux/mm_types_task.h > @@ -36,6 +36,9 @@ struct vmacache { > struct vm_area_struct *vmas[VMACACHE_SIZE]; > }; > > +/* > + * When touching this, update also resident_page_types in kernel/fork.c > + */ > enum { > MM_FILEPAGES, /* Resident file mapping pages */ > MM_ANONPAGES, /* Resident anonymous pages */ > Agreed and with that Reviewed-by: Anshuman Khandual
[PATCH] ata: rb532_cf: Fix unused variable warning in rb532_pata_driver_probe
Fix the following warning (Building: rb532_defconfig mips): drivers/ata/pata_rb532_cf.c: In function ‘rb532_pata_driver_remove’: drivers/ata/pata_rb532_cf.c:161:24: warning: unused variable ‘info’ [-Wunused-variable] struct rb532_cf_info *info = ah->private_data; ^~~~ Fixes: cd56f35e52d9 ("ata: rb532_cf: Convert to use GPIO descriptors") Signed-off-by: Gustavo A. R. Silva --- drivers/ata/pata_rb532_cf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/ata/pata_rb532_cf.c b/drivers/ata/pata_rb532_cf.c index 7c37f2ff09e4..deae466395de 100644 --- a/drivers/ata/pata_rb532_cf.c +++ b/drivers/ata/pata_rb532_cf.c @@ -158,7 +158,6 @@ static int rb532_pata_driver_probe(struct platform_device *pdev) static int rb532_pata_driver_remove(struct platform_device *pdev) { struct ata_host *ah = platform_get_drvdata(pdev); - struct rb532_cf_info *info = ah->private_data; ata_host_detach(ah); -- 2.22.0