linux-next: Tree for Oct 9th
For my birthday I've gone and got myself a linux-next tree: Changes since 20170929: The net-next and drm trees lost their build failures but the rcu tree gained one. I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with an arm64 defconfig, an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, and arm64 allnoconfig signature.asc Description: PGP signature
Re: [PATCH v2 15/16] gpio: tegra: Use banked GPIO infrastructure
On 09/28/2017 04:56 AM, Thierry Reding wrote: > From: Thierry Reding > > Convert the Tegra GPIO driver to use the banked GPIO infrastructure, > which simplifies some parts of the driver. > > Signed-off-by: Thierry Reding > --- > drivers/gpio/Kconfig | 1 + > drivers/gpio/gpio-tegra.c | 203 > ++ > 2 files changed, 98 insertions(+), 106 deletions(-) > > diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig > index db3e446ad9b3..458157d6d491 100644 > --- a/drivers/gpio/Kconfig > +++ b/drivers/gpio/Kconfig > @@ -439,6 +439,7 @@ config GPIO_TEGRA > default ARCH_TEGRA > depends on ARCH_TEGRA || COMPILE_TEST ... > @@ -616,76 +617,66 @@ static int tegra_gpio_probe(struct platform_device > *pdev) > tgi->ic.irq_set_wake= tegra_gpio_irq_set_wake; > #endif > > + irq = &tgi->gc.irq; > + irq->chip = &tgi->ic; > + irq->handler = handle_simple_irq; > + irq->lock_key = &tegra_gpio_lock_class; As per current gpiolib design lockdep lock_class_key is assigned automatically and hidden from gpiolib users. Why do you need to do the same manually? > + irq->default_type = IRQ_TYPE_NONE; > + irq->parent_handler = gpio_irq_chip_banked_chained_handler; > + irq->update_bank = tegra_gpio_update_bank; > + > + irq->parents = devm_kcalloc(&pdev->dev, tgi->gc.num_banks, > + sizeof(unsigned int), GFP_KERNEL); > + if (!irq->parents) > + return -ENOMEM; > + ... > > tegra_gpio_debuginit(tgi); > > -- regards, -grygorii
Re: [PATCH 1/2] crypto: lrw - Fix an error handling path in 'create()'
Am 08.10.2017 11:39, schrieb Christophe JAILLET: > All error handling paths 'goto err_drop_spawn' except this one. > In order to avoid some resources leak, we should do it as well here. > > Fixes: 700cb3f5fe75 ("crypto: lrw - Convert to skcipher") > Signed-off-by: Christophe JAILLET > --- > crypto/lrw.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/crypto/lrw.c b/crypto/lrw.c > index a8bfae4451bf..eb681e9fe574 100644 > --- a/crypto/lrw.c > +++ b/crypto/lrw.c > @@ -610,8 +610,10 @@ static int create(struct crypto_template *tmpl, struct > rtattr **tb) > ecb_name[len - 1] = 0; > > if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, > - "lrw(%s)", ecb_name) >= CRYPTO_MAX_ALG_NAME) this check can be done more easy, the length of ecb_name is len the length of inst->alg.base.cra_name is CRYPTO_MAX_ALG_NAME if CRYPTO_MAX_ALG_NAME-len < "lrw()" < 5 no need to involve snprintf() just my 2 cents re, wh > - return -ENAMETOOLONG; > + "lrw(%s)", ecb_name) >= CRYPTO_MAX_ALG_NAME) { > + err = -ENAMETOOLONG; > + goto err_drop_spawn; > + } > } > > inst->alg.base.cra_flags = alg->base.cra_flags & CRYPTO_ALG_ASYNC;
Re: [PATCH v5 2/2] staging: ion: create one device entry per heap
On 10/05/2017 06:06 AM, Benjamin Gaignard wrote: > 2017-10-04 12:17 GMT+02:00 Mark Brown : >> On Tue, Oct 03, 2017 at 04:08:30PM -0700, Sandeep Patil wrote: >> >>> It is entirely possible and easy in android/ueventd to create those nodes >>> under "/dev/ion/". (assuming the heap 'subsystem' for these new devices >>> will >>> point to 'ion'). > > I think it is the same problem than for webcam under v4l framework. > Each time you plug a webcam you got a v4l node but android/uevent rules > the plug order doesn't have impact. > The same think will happen for ion nodes it may be even easier because > the heap will always being created in the smae order for a given product > configuration. > Relying on the heap being created in the same order seems troublesome. If for some reason it changes in the kernel we might break something in userspace. Anyway, to move this forward I think we need to see a proof of concept of using selinux to protect access to specific heaps. Thanks, Laura >> >> The reason I didn't say /dev/ion/foo initially is that if people want to >> keep the existing /dev/ion around for compatibility reasons then the >> /dev/ion name isn't available which might cause issues. Otherwise just >> dumping everything under a directory (perhaps with a different name) was >> my first thought as well. >> >>> (Also FWIW, the SELinux permissions are also possible with the current ion >>> implementation by adding rules to disallow specific ioctls instead of adding >>> permissions to access device node as this change would do) >> >> AIUI the request is to limit access to specific heaps, and obviously not >> everyone wants to deal with SELinux at all.
Re: [PATCH] scsi: libiscsi: fix shifting of DID_REQUEUE host byte
On 10/09/2017 04:33 AM, Johannes Thumshirn wrote: > The SCSI host byte should be shifted left by 16 in order to have > scsi_decide_disposition() do the right thing (.i.e. requeue the command). > > Signed-off-by: Johannes Thumshirn > Fixes: 661134ad3765 ("[SCSI] libiscsi, bnx2i: make bound ep check common") > Cc: Lee Duncan > Cc: Hannes Reinecke > Cc: Bart Van Assche > Cc: Chris Leech > --- > drivers/scsi/libiscsi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index bd4605a34f54..9cba4913b43c 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1728,7 +1728,7 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct > scsi_cmnd *sc) > > if (test_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx)) { > reason = FAILURE_SESSION_IN_RECOVERY; > - sc->result = DID_REQUEUE; > + sc->result = DID_REQUEUE << 16; > goto fault; > } > > Good catch. I know you're working on fixing all drivers to use the correct macros rather than rolling there own. Acked-by: Lee Duncan -- Lee Duncan SUSE Labs
Re: [kernel-hardening] [PATCH] lib/vsprintf: add default case to 'i' specifier
On Mon, Oct 09, 2017 at 09:16:09AM -0600, Tycho Andersen wrote: > On Mon, Oct 09, 2017 at 01:59:05PM +1100, Tobin C. Harding wrote: > > %pi leaks kernel addresses if incorrectly specified. > > > > Currently the printk specifier %pi (%pI) contains a switch statement > > without a default clause. The %pi specifier requires a subsequent > > character (4, 6, or S) controlling the output. If the specifier is > > incomplete the switch statement will fall through and print the variable > > argument address in hex instead of the value of the argument (as an IP > > address). > > > > If uncaught this leaks kernel addresses into dmesg. We can return an > > error string to make the bug visible and stop addresses leaking. > > > > Add a default clause returning an error string, stops leaking addresses > > and makes the buggy code > > ...? :) > > > Signed-off-by: Tobin C. Harding > > --- > > lib/vsprintf.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/lib/vsprintf.c b/lib/vsprintf.c > > index 86c3385b9eb3..155702f05b14 100644 > > --- a/lib/vsprintf.c > > +++ b/lib/vsprintf.c > > @@ -1775,6 +1775,8 @@ char *pointer(const char *fmt, char *buf, char *end, > > void *ptr, > > default: > > return string(buf, end, "(invalid address)", > > spec); > > }} > > + default: > > Maybe a WARN(1, "invalid pointer format")? That way it'll be easy for > people to figure out where to fix. Thanks for the review. vsprintf.c uses custom error return logic so as not to create a function call cycle, I assume the printf versions of WARN call into vsprintf to do their formatting also. Hence (and without studying the WARN code) I avoided the printf versions of WARN. Open to correction if I am wrong. thanks, Tobin.
[PATCHSET] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
Hello, btrfs has different ways to issue metadata IOs and may end up issuing metadata or otherwise shared IOs from a non-root cgroup, which can lead to priority inversion and ineffective IO isolation. This patchset makes sure that btrfs issues all metadata and shared IOs from the root cgroup by exempting btree_inodes from cgroup writeback and explicitly associating shared IOs with the root cgroup. This patchset containst he following three patches [PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode [PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css() [PATCH 3/3] btrfs: ensure that metadata and flush are issued from the and is also available in the following git branch git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-btrfs-metadata diffstat follows. Thanks. fs/block_dev.c |3 +-- fs/btrfs/check-integrity.c |2 +- fs/btrfs/disk-io.c |4 fs/btrfs/ioctl.c|6 +- fs/btrfs/super.c|1 - fs/buffer.c | 12 fs/ext2/inode.c |3 ++- fs/ext2/super.c |1 - fs/ext4/inode.c |5 - fs/ext4/super.c |2 -- fs/fs-writeback.c |1 + include/linux/backing-dev.h |2 +- include/linux/buffer_head.h | 11 +++ include/linux/fs.h |3 ++- include/linux/writeback.h |6 -- 15 files changed, 48 insertions(+), 14 deletions(-) -- tejun
[PATCH 1/3] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB
Currently, filesystem can indiate cgroup writeback support per superblock; however, depending on the filesystem, especially if inodes are used to carry metadata, it can be useful to indicate cgroup writeback support per inode. This patch replaces the superblock flag SB_I_CGROUPWB with per-inode S_CGROUPWB, so that cgroup writeback can be enabled selectively. * block_dev sets the new flag in bdget() when initializing new inode. * ext2/4 set the new flag in ext?_set_inode_flags() function. * btrfs sets the new flag in btrfs_update_iflags() function. Note that this automatically excludes btree_inode which doesn't use btrfs_update_iflags() during initialization. This is an intended behavior change. Signed-off-by: Tejun Heo Cc: Jan Kara Cc: Jens Axboe Cc: Chris Mason Cc: Josef Bacik Cc: linux-bt...@vger.kernel.org Cc: "Theodore Ts'o" Cc: Andreas Dilger Cc: linux-e...@vger.kernel.org --- fs/block_dev.c | 3 +-- fs/btrfs/ioctl.c| 4 +++- fs/btrfs/super.c| 1 - fs/ext2/inode.c | 3 ++- fs/ext2/super.c | 1 - fs/ext4/inode.c | 5 - fs/ext4/super.c | 2 -- include/linux/backing-dev.h | 2 +- include/linux/fs.h | 3 ++- 9 files changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 93d088f..ff9c282 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -800,8 +800,6 @@ static struct dentry *bd_mount(struct file_system_type *fs_type, { struct dentry *dent; dent = mount_pseudo(fs_type, "bdev:", &bdev_sops, NULL, BDEVFS_MAGIC); - if (!IS_ERR(dent)) - dent->d_sb->s_iflags |= SB_I_CGROUPWB; return dent; } @@ -893,6 +891,7 @@ struct block_device *bdget(dev_t dev) inode->i_mode = S_IFBLK; inode->i_rdev = dev; inode->i_bdev = bdev; + inode->i_flags |= S_CGROUPWB; inode->i_data.a_ops = &def_blk_aops; mapping_set_gfp_mask(&inode->i_data, GFP_USER); spin_lock(&bdev_lock); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6c7a49f..117cc63 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -150,9 +150,11 @@ void btrfs_update_iflags(struct inode *inode) new_fl |= S_NOATIME; if (ip->flags & BTRFS_INODE_DIRSYNC) new_fl |= S_DIRSYNC; + new_fl |= S_CGROUPWB; set_mask_bits(&inode->i_flags, - S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC, + S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC | + S_CGROUPWB, new_fl); } diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 35a128a..46a1488 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1136,7 +1136,6 @@ static int btrfs_fill_super(struct super_block *sb, sb->s_flags |= MS_POSIXACL; #endif sb->s_flags |= MS_I_VERSION; - sb->s_iflags |= SB_I_CGROUPWB; err = super_setup_bdi(sb); if (err) { diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 4dca6f3..3c5d822 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -1372,7 +1372,7 @@ void ext2_set_inode_flags(struct inode *inode) unsigned int flags = EXT2_I(inode)->i_flags; inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | - S_DIRSYNC | S_DAX); + S_DIRSYNC | S_DAX | S_CGROUPWB); if (flags & EXT2_SYNC_FL) inode->i_flags |= S_SYNC; if (flags & EXT2_APPEND_FL) @@ -1385,6 +1385,7 @@ void ext2_set_inode_flags(struct inode *inode) inode->i_flags |= S_DIRSYNC; if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode)) inode->i_flags |= S_DAX; + inode->i_flags |= S_CGROUPWB; } struct inode *ext2_iget (struct super_block *sb, unsigned long ino) diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 1458706..e6ba669e 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -922,7 +922,6 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent) sb->s_flags = (sb->s_flags & ~MS_POSIXACL) | ((EXT2_SB(sb)->s_mount_opt & EXT2_MOUNT_POSIX_ACL) ? MS_POSIXACL : 0); - sb->s_iflags |= SB_I_CGROUPWB; if (le32_to_cpu(es->s_rev_level) == EXT2_GOOD_OLD_REV && (EXT2_HAS_COMPAT_FEATURE(sb, ~0U) || diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 31db875..344f12b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4591,8 +4591,11 @@ void ext4_set_inode_flags(struct inode *inode) !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) && !ext4_encrypted_inode(inode)) new_fl |= S_DAX; + if (test_opt(inode->i_sb, DATA_FLAGS) != EXT4_MOUNT_JOURNAL_DATA) + new_fl |= S_CGROUPWB; inode_set_flags(inode, new_fl, -
[PATCH 3/3] btrfs: ensure that metadata and flush are issued from the root cgroup
Issuing metdata or otherwise shared IOs from !root cgroup can lead to priority inversion. This patch ensures that those IOs are always issued from the root cgroup. This patch updates btrfs_update_iflags() to not set S_CGROUPWB on btree_inodes. This isn't strictly necessary as those inodes don't call the function during init; however, this serves as documentation and prevents possible future mistakes. If this isn't desirable, please feel free to drop the section. Signed-off-by: Tejun Heo Cc: Chris Mason Cc: Josef Bacik --- fs/btrfs/check-integrity.c | 2 +- fs/btrfs/disk-io.c | 4 fs/btrfs/ioctl.c | 4 +++- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 7d5a9b5..058dea6 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh) struct btrfsic_dev_state *dev_state; if (!btrfsic_is_initialized) - return submit_bh(op, op_flags, bh); + return submit_bh_blkcg_css(op, op_flags, blkcg_root_css); mutex_lock(&btrfsic_mutex); /* since btrfsic_submit_bh() might also be called before diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dfdab84..fe8bbe1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hook(void *private_data, struct bio *bio, int async = check_async_write(bio_flags); blk_status_t ret; + bio_associate_blkcg(bio, blkcg_root_css); + if (bio_op(bio) != REQ_OP_WRITE) { /* * called for a read, do the setup so that checksum validation @@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs_device *device) return; bio_reset(bio); + bio_associate_blkcg(bio, blkcg_root_css); + bio->bi_end_io = btrfs_end_empty_barrier; bio_set_dev(bio, device->bdev); bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 117cc63..8a7db6c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -150,7 +150,9 @@ void btrfs_update_iflags(struct inode *inode) new_fl |= S_NOATIME; if (ip->flags & BTRFS_INODE_DIRSYNC) new_fl |= S_DIRSYNC; - new_fl |= S_CGROUPWB; + /* btree_inodes are always in the root cgroup */ + if (btrfs_ino(ip) != BTRFS_BTREE_INODE_OBJECTID) + new_fl |= S_CGROUPWB; set_mask_bits(&inode->i_flags, S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC | -- 2.9.5
[PATCH 2/3] cgroup, writeback: implement submit_bh_blkcg_css()
Add wbc->blkcg_css so that the blkcg_css association can be specified independently and implement submit_bh_blkcg_css() using it. This will be used to override cgroup membership on specific buffer_heads. Signed-off-by: Tejun Heo Cc: Jan Kara Cc: Jens Axboe --- fs/buffer.c | 12 fs/fs-writeback.c | 1 + include/linux/buffer_head.h | 11 +++ include/linux/writeback.h | 6 -- 4 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 170df85..fac4f9a 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3143,6 +3143,18 @@ int submit_bh(int op, int op_flags, struct buffer_head *bh) } EXPORT_SYMBOL(submit_bh); +#ifdef CONFIG_CGROUP_WRITEBACK +int submit_bh_blkcg_css(int op, int op_flags, struct buffer_head *bh, + struct cgroup_subsys_state *blkcg_css) +{ + struct writeback_control wbc = { .blkcg_css = blkcg_css }; + + /* @wbc is used just to override the bio's blkcg_css */ + return submit_bh_wbc(op, op_flags, bh, 0, &wbc); +} +EXPORT_SYMBOL(submit_bh_blkcg_css); +#endif + /** * ll_rw_block: low-level access to block devices (DEPRECATED) * @op: whether to %READ or %WRITE diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 245c430..cd882e6 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -538,6 +538,7 @@ void wbc_attach_and_unlock_inode(struct writeback_control *wbc, } wbc->wb = inode_to_wb(inode); + wbc->blkcg_css = wbc->wb->blkcg_css; wbc->inode = inode; wbc->wb_id = wbc->wb->memcg_css->id; diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index c8dae55..abb4dd4 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -13,6 +13,7 @@ #include #include #include +#include #ifdef CONFIG_BLOCK @@ -205,6 +206,16 @@ int bh_submit_read(struct buffer_head *bh); loff_t page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, int whence); +#ifdef CONFIG_CGROUP_WRITEBACK +int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh, + struct cgroup_subsys_state *blkcg_css); +#else +static inline int submit_bh_blkcg(int op, int op_flags, struct buffer_head *bh, + struct cgroup_subsys_state *blkcg_css) +{ + return submit_bh(op, op_flags, bh); +} +#endif extern int buffer_heads_over_limit; /* diff --git a/include/linux/writeback.h b/include/linux/writeback.h index d581579..81e5946 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -91,6 +91,8 @@ struct writeback_control { unsigned for_sync:1;/* sync(2) WB_SYNC_ALL writeback */ #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ + struct cgroup_subsys_state *blkcg_css; /* usually wb->blkcg_css but + may be overridden */ struct inode *inode;/* inode being written out */ /* foreign inode detection, see wbc_detach_inode() */ @@ -277,8 +279,8 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio) * behind a slow cgroup. Ultimately, we want pageout() to kick off * regular writeback instead of writing things out itself. */ - if (wbc->wb) - bio_associate_blkcg(bio, wbc->wb->blkcg_css); + if (wbc->blkcg_css) + bio_associate_blkcg(bio, wbc->blkcg_css); } #else /* CONFIG_CGROUP_WRITEBACK */ -- 2.9.5
[PATCH net-next] once: switch to new jump label API
From: Eric Biggers Switch the DO_ONCE() macro from the deprecated jump label API to the new one. The new one is more readable, and for DO_ONCE() it also makes the generated code more icache-friendly: now the one-time initialization code is placed out-of-line at the jump target, rather than at the inline fallthrough case. Acked-by: Hannes Frederic Sowa Signed-off-by: Eric Biggers --- include/linux/once.h | 6 +++--- lib/once.c | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/once.h b/include/linux/once.h index 9c98aaa87cbc..724724918e8b 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -5,7 +5,7 @@ #include bool __do_once_start(bool *done, unsigned long *flags); -void __do_once_done(bool *done, struct static_key *once_key, +void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags); /* Call a function exactly once. The idea of DO_ONCE() is to perform @@ -38,8 +38,8 @@ void __do_once_done(bool *done, struct static_key *once_key, ({ \ bool ___ret = false; \ static bool ___done = false; \ - static struct static_key ___once_key = STATIC_KEY_INIT_TRUE; \ - if (static_key_true(&___once_key)) { \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ unsigned long ___flags; \ ___ret = __do_once_start(&___done, &___flags); \ if (unlikely(___ret)) { \ diff --git a/lib/once.c b/lib/once.c index 05c8604627eb..831c5a6b0bb2 100644 --- a/lib/once.c +++ b/lib/once.c @@ -5,7 +5,7 @@ struct once_work { struct work_struct work; - struct static_key *key; + struct static_key_true *key; }; static void once_deferred(struct work_struct *w) @@ -14,11 +14,11 @@ static void once_deferred(struct work_struct *w) work = container_of(w, struct once_work, work); BUG_ON(!static_key_enabled(work->key)); - static_key_slow_dec(work->key); + static_branch_disable(work->key); kfree(work); } -static void once_disable_jump(struct static_key *key) +static void once_disable_jump(struct static_key_true *key) { struct once_work *w; @@ -51,7 +51,7 @@ bool __do_once_start(bool *done, unsigned long *flags) } EXPORT_SYMBOL(__do_once_start); -void __do_once_done(bool *done, struct static_key *once_key, +void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags) __releases(once_lock) { -- 2.14.2.920.gcf0c67979c-goog
Re: Read-only `slaves` with shared subtrees?
On Sun, Oct 8, 2017 at 5:15 PM, Ram Pai wrote: >> >> One thing that I don't plan to use, but might be worth thinking about is >> `slave + RW + STICKY` combination. If `master` mounts something RO, >> and `slave` >> is `RW + STICKY`, should the mount appear RW inside the slave? I don't >> find it particularly useful, >> but still... > > As per the implemented semantics it will become "RW". Should it be "RO" > aswell? Will that open up security holes? It is a mechanism that could be used to potentially increase the scope of privileges, which is a fertile ground for security issues. There is some room for using it to circumvent mechanisms that were unaware of this new feature. I guess for this reason alone, it might be worth limiting to RO only.l >> >> Another thing that popped into my head: Is it worth considering any >> dynamic changes to `slave`'s >> RO status? It complicates everything a lot (it seems to me), since it >> adds a retroactive >> dynamic propagation. I don't currently have any plans to use it, but I >> could imagine scenarios >> in which a slave mount with all it's sub-mounts is remounted from RO >> to RW, in response to >> some external authorization trigger. > > The sematics should be something like this. Check if it makes sense. > > a) anything mounted under stick-mount will be a sticky-mount and will > inherit the mount's access-attribute;i.e RO RW attribute. > b) a mount when made sticky will propagate its sticky attribute > as well as its access-attribute recursively to its children > c) anything mounted under non-sticky mount will not inherit the > mount's access-attribute and will be non-sticky aswell. > d) a mount when made non-sticky will just change itself to non-sticky. > (will NOT propagate its non-sticky attribute and its > access-attribue recursively to its children.) a), b) and c), seem uncontroversial. d) seems OK, but I'm unsure as it is asymmetrical to b). Both recursive and non-recursive D seem to make sense. I'm just unsure if any is more useful than the other. What happens when a sticky RO slave mount is remounted as RW? Does it loose stickiness? Does this change propagate to its children? Another angle, that just appeared to me: If we have a double link A (master) -> (slave) B (master) -> (slave) C If A is RW and B is RO + sticky, does mount propagated to C will also be RO? It seems to me it should. Regards, Dawid
Re: [RESEND x2][PATCH 0/3] dwc2 fixes for edge cases on hikey
On Tue, Oct 3, 2017 at 2:58 AM, Minas Harutyunyan wrote: > > Could you please apply patch from Vardan Mikayelyan "usb: dwc2: Fix > dwc2_hsotg_core_init_disconnected()" submitted at 02/25/2017 > (https://marc.info/?l=linux-usb&m=148801589931039&w=2) instead of your > "usb: dwc2: Improve gadget state disconnection handling" and test again > failing scenario. I may be misunderstanding htings, but I don't believe that patch addresses the same issue I'm trying to fix (I've tested with it, and it doesn't cause any trouble for me, but it also doesn't seem to address the corner-cases I'm hitting). My "Improve gadget state disconnection handling" patch handles the fact that when we switch from B/gadget mode to A/Host mode, we don't always go through a gadget disconnect path. So instead of calling the dwc2_hsotg_disconnect() path initially when switching to gadget mode (to avoid the state complaining that we set it up twice), we should instead be calling dwc2_hsotg_disconnect() when we are setting up the A/host mode. So for example, the follow-on fix to the UDC state won't properly work without my "Improve gadget state disconnection handling" patch, and "cat /sys/class/udc/f72c.usb/state" will always return configured (assuming gadget mode was used once) regardless of the gadget state, since we don't actually call dwc2_hsotg_disconnect when the otg plug is removed. > Other 2 patches from series "[PATCH 0/3] dwc2 fixes for edge cases on > hikey" are Ok. Thanks for the review/feedback! thanks -john
Re: [v11 3/6] mm, oom: cgroup-aware OOM killer
On Thu, 5 Oct 2017, Roman Gushchin wrote: > Traditionally, the OOM killer is operating on a process level. > Under oom conditions, it finds a process with the highest oom score > and kills it. > > This behavior doesn't suit well the system with many running > containers: > > 1) There is no fairness between containers. A small container with > few large processes will be chosen over a large one with huge > number of small processes. > > 2) Containers often do not expect that some random process inside > will be killed. In many cases much safer behavior is to kill > all tasks in the container. Traditionally, this was implemented > in userspace, but doing it in the kernel has some advantages, > especially in a case of a system-wide OOM. > I'd move the second point to the changelog for the next patch since this patch doesn't implement any support for memory.oom_group. > To address these issues, the cgroup-aware OOM killer is introduced. > > Under OOM conditions, it looks for the biggest leaf memory cgroup > and kills the biggest task belonging to it. The following patches > will extend this functionality to consider non-leaf memory cgroups > as well, and also provide an ability to kill all tasks belonging > to the victim cgroup. > > The root cgroup is treated as a leaf memory cgroup, so it's score > is compared with leaf memory cgroups. > Due to memcg statistics implementation a special algorithm > is used for estimating it's oom_score: we define it as maximum > oom_score of the belonging tasks. > This seems to unfairly bias the root mem cgroup depending on process size. It isn't treated fairly as a leaf mem cgroup if they are being compared based on different criteria: the root mem cgroup as (mostly) the largest rss of a single process vs leaf mem cgroups as all anon, unevictable, and unreclaimable slab pages charged to it by all processes. I imagine a configuration where the root mem cgroup has 100 processes attached each with rss of 80MB, compared to a leaf cgroup with 100 processes of 1MB rss each. How does this logic prevent repeatedly oom killing the processes of 1MB rss? In this case, "the root cgroup is treated as a leaf memory cgroup" isn't quite fair, it can simply hide large processes from being selected. Users who configure cgroups in a unified hierarchy for other resource constraints are penalized for this choice even though the mem cgroup with 100 processes of 1MB rss each may not be limited itself. I think for this comparison to be fair, it requires accounting for the root mem cgroup itself or for a different accounting methodology for leaf memory cgroups. > +/* > + * Checks if the given memcg is a valid OOM victim and returns a number, > + * which means the folowing: > + * -1: there are inflight OOM victim tasks, belonging to the memcg > + *0: memcg is not eligible, e.g. all belonging tasks are protected > + * by oom_score_adj set to OOM_SCORE_ADJ_MIN > + * >0: memcg is eligible, and the returned value is an estimation > + * of the memory footprint > + */ > +static long oom_evaluate_memcg(struct mem_cgroup *memcg, > +const nodemask_t *nodemask, > +unsigned long totalpages) > +{ > + struct css_task_iter it; > + struct task_struct *task; > + int eligible = 0; > + > + /* > + * Memcg is OOM eligible if there are OOM killable tasks inside. > + * > + * We treat tasks with oom_score_adj set to OOM_SCORE_ADJ_MIN > + * as unkillable. > + * > + * If there are inflight OOM victim tasks inside the memcg, > + * we return -1. > + */ > + css_task_iter_start(&memcg->css, 0, &it); > + while ((task = css_task_iter_next(&it))) { > + if (!eligible && > + task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) > + eligible = 1; > + > + if (tsk_is_oom_victim(task) && > + !test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags)) { > + eligible = -1; > + break; > + } > + } > + css_task_iter_end(&it); > + > + if (eligible <= 0) > + return eligible; > + > + return memcg_oom_badness(memcg, nodemask, totalpages); > +} > + > +static void select_victim_memcg(struct mem_cgroup *root, struct oom_control > *oc) > +{ > + struct mem_cgroup *iter; > + > + oc->chosen_memcg = NULL; > + oc->chosen_points = 0; > + > + /* > + * The oom_score is calculated for leaf memory cgroups (including > + * the root memcg). > + */ > + rcu_read_lock(); > + for_each_mem_cgroup_tree(iter, root) { > + long score; > + > + if (memcg_has_children(iter) && iter != root_mem_cgroup) > + continue; I'll reiterate what I did on the last version of the patchset: considering only leaf memory cgroups easily allows users to defeat this heuristic and bias against al
Re: [PATCH v2 14/16] gpio: Add support for banked GPIO controllers
On 09/28/2017 04:56 AM, Thierry Reding wrote: > From: Thierry Reding > > Some GPIO controllers are subdivided into multiple logical blocks called > banks (or ports). This is often caused by the design assigning separate > resources, such as register regions or interrupts, to each bank, or some > set of banks. > > This commit adds support for describing controllers that have such a > banked design and provides common code for dealing with them. > > Signed-off-by: Thierry Reding > --- > drivers/gpio/gpiolib-of.c | 101 + > drivers/gpio/gpiolib.c | 98 > include/linux/gpio/driver.h | 108 > > include/linux/of_gpio.h | 10 > 4 files changed, 317 insertions(+) > > diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c > index bfcd20699ec8..9baabe00966d 100644 > --- a/drivers/gpio/gpiolib-of.c > +++ b/drivers/gpio/gpiolib-of.c > @@ -309,6 +309,107 @@ int of_gpio_simple_xlate(struct gpio_chip *gc, > } > EXPORT_SYMBOL(of_gpio_simple_xlate); > > +/** > + * gpio_banked_irq_domain_xlate - decode an IRQ specifier for banked chips > + * @domain: IRQ domain > + * @np: device tree node > + * @spec: IRQ specifier > + * @size: number of cells in IRQ specifier > + * @hwirq: return location for the hardware IRQ number > + * @type: return location for the IRQ type > + * > + * Translates the IRQ specifier found in device tree into a hardware IRQ > + * number and an interrupt type. > + * > + * Returns: > + * 0 on success or a negative error code on failure. > + */ > +int gpio_banked_irq_domain_xlate(struct irq_domain *domain, > + struct device_node *np, > + const u32 *spec, unsigned int size, > + unsigned long *hwirq, > + unsigned int *type) > +{ > + struct gpio_chip *gc = domain->host_data; > + unsigned int bank, line, i, offset = 0; > + > + if (size < 2) > + return -EINVAL; > + > + bank = (spec[0] >> gc->of_gpio_bank_mask) & gc->of_gpio_bank_shift; > + line = (spec[0] >> gc->of_gpio_line_mask) & gc->of_gpio_line_shift; > + > + if (bank >= gc->num_banks) { > + dev_err(gc->parent, "invalid bank number: %u\n", bank); > + return -EINVAL; > + } > + > + if (line >= gc->banks[bank]->num_lines) { > + dev_err(gc->parent, "invalid line number: %u\n", line); > + return -EINVAL; > + } > + > + for (i = 0; i < bank; i++) > + offset += gc->banks[i]->num_lines; Just to clarify, why is above iteration required? > + > + *type = spec[1] & IRQ_TYPE_SENSE_MASK; > + *hwirq = offset + line; > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(gpio_banked_irq_domain_xlate); > + > +/** > + * of_gpio_banked_xlate - translate GPIO specifier to a GPIO number and flags > + * @gc: GPIO chip > + * @gpiospec: GPIO specifier > + * @flags: return location for flags parsed from the GPIO specifier > + * > + * This translation function takes into account multiple banks that can make > + * up a single controller. Each bank can contain one or more pins. A single > + * cell in the specifier is used to represent a (bank, pin) pair, with each > + * encoded in different fields. The &gpio_chip.of_gpio_bank_shift and > + * &gpio_chip.of_gpio_bank_mask fields, and &gpio_chip.of_gpio_line_shift and > + * &gpio_chip.of_gpio_line_mask are used to specify the encoding. > + * > + * Returns: > + * The chip-relative index of the pin given by the GPIO specifier. > + */ > +int of_gpio_banked_xlate(struct gpio_chip *gc, > + const struct of_phandle_args *gpiospec, u32 *flags) > +{ > + unsigned int offset = 0, bank, line, i; > + const u32 *spec = gpiospec->args; > + > + if (WARN_ON(gc->of_gpio_n_cells < 2)) > + return -EINVAL; > + > + if (WARN_ON(gpiospec->args_count < gc->of_gpio_n_cells)) > + return -EINVAL; > + > + bank = (spec[0] >> gc->of_gpio_bank_shift) & gc->of_gpio_bank_mask; > + line = (spec[0] >> gc->of_gpio_line_shift) & gc->of_gpio_line_mask; > + > + if (bank >= gc->num_banks) { > + dev_err(gc->parent, "invalid bank number: %u\n", bank); > + return -EINVAL; > + } > + > + if (line >= gc->banks[bank]->num_lines) { > + dev_err(gc->parent, "invalid line number: %u\n", line); > + return -EINVAL; > + } > + > + for (i = 0; i < bank; i++) > + offset += gc->banks[i]->num_lines; > + > + if (flags) > + *flags = spec[1]; > + > + return offset + line; > +} > +EXPORT_SYMBOL(of_gpio_banked_xlate); Adding above two functions means adding new GPIO bindings (may be optional, but common). > + > /** >* of_mm_gpiochip_add_data - Add memory mapped GPIO chip (bank) >* @np: device node of the GPIO chi
Re: [PATCH v2 00/16] gpio: Tight IRQ chip integration and banked infrastructure
On 10/06/2017 06:07 AM, Thierry Reding wrote: > On Thu, Sep 28, 2017 at 09:22:17AM -0500, Grygorii Strashko wrote: >> >> >> On 09/28/2017 04:56 AM, Thierry Reding wrote: >>> From: Thierry Reding >>> >>> Hi Linus, >>> >>> here's the latest series of patches that implement the tighter IRQ chip >>> integration as well as the banked GPIO infrastructure that we had >>> discussed a couple of weeks/months back. >>> >>> The first couple of patches are mostly preparatory work in order to >>> consolidate all IRQ chip related fields in a new structure and create >>> the base functionality for adding IRQ chips. >>> >>> After that, I've added the Tegra186 GPIO support patch that makes use of >>> the new tight integration. >>> >>> To round things off the new banked GPIO infrastructure is added (along >>> with some more preparatory work), followed by the conversion of the two >>> Tegra GPIO drivers to the new infrastructure. >> >> Hm. So you've ignored my comments [1]. >> >> Sry, but I do not agree with this series. >> - no prof that it can be re-used by other drivers than tegra >> (at least I do not see reasons to re-use it for any TI drivers) > > I had done some research based on Linus' feedback from an earlier series > and identified the following potential candidates[0] that could move to > this new infrastructure: > below based on code check: > - gpio-intel-mid.c one irq per all gpios in controller > - gpio-merrifield.c one irq per all gpios in controller > - gpio-pca953x.c one irq per all gpios in controller > - gpio-stmpe.c one irq per all gpios in controller > - gpio-tc3589x.c one irq per all gpios in controller > - gpio-ws16c48.c one irq per all gpios in controller > > Note that this is based on code inspection rather than DT inspection, > because that's fundamentally flawed. If you look at this from a DT > perspective you're going to be tempted to change the DT bindings, but > you can't do that because of backwards compatiblity. This new framework > also doesn't address the issues at that level, but rather tries to be > some common code that is otherwise duplicated in one way or another in > various drivers and therefore hard to maintain. This is what Linus had > originally requested, and that's what the series does. I've looked at this again, and again. I've looked on drivers listed above. Sry, I do not see how this change can improve/simplify above drivers :( May be it will clean up my doubts, if it will be possible to convert more drivers? > >> - no split > > What does this mean? The series is nicely split into separate patches, > so each one individually is easy to review. I've also gone through quite > some trouble to make sure everything builds fine after each patch, so > it's possible to apply individual bits of the series. For example we > could opt to apply everything up to the banked GPIO support if that's > still contentious. i've commented it in [1]. copy paste here >> So, can it be split? I think, patches which reorganize gpio irqchip specific fields placement and move them in gpio_irq_chip can be considered separately if they will not introduce functional changes. Also, omap changes can be considered separately. (Pay attention that new fields introduced in patch 1). >> This will reduce size of your series and concentrate review attention on actual functional changes. [1] https://lkml.org/lkml/2017/9/15/442 > >> - all GPIO IRQs mapped statically > > This series predates your work on the dynamic IRQ mapping, so I hadn't > picked up those changes. This should be easily solved by the attached > patch, though. > >> - irq->map[offset + j] = irq->parents[parent]; holds IRQs for all pins >>which is waste of memory > > It's the only way to generically do this. Also I don't think this wastes > that much memory. We have one unsigned int per pin, which even for very > large GPIO controllers is unlikely to exceed one 4 KiB page. for system with <128M of memory even 4k is a win. > >> - DT binding changes not documented and no DT examples > > That's because this is completely orthogonal to DT bindings. We can't > make any changes to the bindings because of ABI stability. > >> - below is ugly ;) >> +bank = (spec[0] >> gc->of_gpio_bank_mask) & gc->of_gpio_bank_shift; >> +pin = (spec[0] >> gc->of_gpio_pin_mask) & gc->of_gpio_pin_shift; > > If by ugly you mean wrong, then yes, it's actually the wrong way around. > It should be: > > bank = (spec[0] >> gc->of_gpio_bank_shift) & gc->of_gpio_bank_mask; > line = (spec[0] >> gc->of_gpio_line_shift) & gc->of_gpio_line_mask; Wrong yep. And No. What I do not like is encoding bank & line in the same field. It creates some not clear DT standard bindings requirements as for me, comparing to the current well known GPIO bindings gpios = <&[controller] [line number in controller] [flags]>; line number in controller ::= [0..max lines] Actually, as per gpio.txt: "Note that
Re: [PATCH 1/4] kbuild: replace $(hdr-arch) with $(SRCARCH)
Hi, On Tue, Oct 3, 2017 at 8:56 PM, Masahiro Yamada wrote: > Since commit 5e53879008b9 ("sparc,sparc64: unify Makefile"), hdr-arch > and SRCARCH always match. > > Signed-off-by: Masahiro Yamada > --- > > Makefile | 21 + > scripts/Makefile.headersinst | 2 +- > 2 files changed, 10 insertions(+), 13 deletions(-) Looks sane/correct to me. Reviewed-by: Douglas Anderson
Re: [PATCH 2/4] kbuild: move "_all" target out of $(KBUILD_SRC) conditional
Hi, On Tue, Oct 3, 2017 at 8:56 PM, Masahiro Yamada wrote: > The first "_all" occurrence around line 120 is only visible when > KBUILD_SRC is unset. > > If O=... is specified, the working directory is relocated, then the > only second occurrence around line 193 is visible, that is not set > to PHONY. > > Move the first one to an always visible place. This clarifies "_all" > is our default target and it is always set to PHONY. > > Signed-off-by: Masahiro Yamada > --- > > Makefile | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) I'm a professed non-expert on the kernel build so take my review FWIW... I'd definitely agree that it looks like it was a bug (though not a huge one) that it was possible for "_all" not to get marked PHONY. Other than that this change just makes things a little more readable since (if I followed all the Makefile code properly) prior to your change "_all" always ended up being the default rule, just in a very roundabout way. Reviewed-by: Douglas Anderson
Re: [PATCH 3/4] kbuild: re-order the code to not parse unnecessary variables
Hi, On Tue, Oct 3, 2017 at 8:56 PM, Masahiro Yamada wrote: > The top Makefile is divided into some sections such as mixed targets, > config targets, build targets, etc. > > When we build mixed targets, Kbuild just invokes submake to process > them one by one. In this case, compiler-related variables like CC, > KBUILD_CFLAGS, etc. are unneeded. > > Check what kind of targets we are building first, then parse necessary > variables for building them. > > Signed-off-by: Masahiro Yamada > --- > > Makefile | 233 > --- > 1 file changed, 118 insertions(+), 115 deletions(-) I'm even further outside my comfort zone with this big of a change, but I will say that as far as I can tell this seems like a good change. If it were me I'd have probably broken it up into several tinier changes that were each massively easy to check/verify, but presumably for those who know the Makefile better then rolling them together is fine. ;) I're spent some time reviewing this (not tons--maybe an hour or so), but IMHO I don't know this well enough to add a Reviewed-by tag. > diff --git a/Makefile b/Makefile > index 39a7c03..a4fd682 100644 > --- a/Makefile > +++ b/Makefile > @@ -186,15 +186,6 @@ ifeq ("$(origin M)", "command line") >KBUILD_EXTMOD := $(M) > endif > > -# If building an external module we do not care about the all: rule > -# but instead _all depend on modules > -PHONY += all > -ifeq ($(KBUILD_EXTMOD),) > -_all: all > -else > -_all: modules > -endif > - > ifeq ($(KBUILD_SRC),) > # building in the source tree > srctree := . > @@ -206,6 +197,9 @@ else > srctree := $(KBUILD_SRC) > endif > endif > + > +export KBUILD_CHECKSRC KBUILD_EXTMOD KBUILD_SRC The old Makefile also used to export KBUILD_SRC, but I'm not sure why. Shouldn't this be implicit because KBUILD_SRC always comes from a command line parameter or environment variable? -Doug
Re: [RFC PATCH 4/4] kbuild: evaluate cc-option and friends only when building kernel
Hi, On Tue, Oct 3, 2017 at 8:56 PM, Masahiro Yamada wrote: > diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include > index 9ffd3dd..222d0a2 100644 > --- a/scripts/Kbuild.include > +++ b/scripts/Kbuild.include > @@ -96,6 +96,13 @@ try-run = $(shell set -e;\ > fi; \ > rm -f "$$TMP" "$$TMPO") > > +# hostcc-option > +# Usage: cflags-y += $(call hostcc-option,-march=winchip-c6,-march=i586) > +hostcc-option = $(call __cc-option, $(HOSTCC),\ > + $(HOSTCFLAGS) $(HOST_EXTRACFLAGS),$(1),$(2)) I believe you've got a bug here. You're calling "__cc-option" which isn't defined if "need-compiler" is not 1, right? > + > +ifeq ($(need-compiler),1) The way this patch works is a bit non-obvious I think. I wonder if anyone else will be confused like I am... Basically if "need-compiler" is not 1 then things like "cc-option" won't be defined at all. ...but we'll still _call_ them in other Makefiles. This call of an undefined variable will just evaluate to an empty string. Thus, for instance: CFLAGS_KCOV := $(call cc-option,-fsanitize-coverage=trace-pc,) ...will just set CFLAGS_KCOV to the empty string if "need-compiler" isn't 1, right? I guess that's fine, but maybe at least document it somewhere? IMHO it would be even better if somehow you still defined each of these to something bogus in an else clause, like: as-option = --err_noncompile_target as-instr = --err_noncompile_target cc-option = --err_noncompile_target ... ... The idea being that if someone accidentally invoked the C compiler (or if there was some other conditional code in the Makefile based on the result of one of these functions) it would be obvious what was going on. -Doug
Re: [PATCH net-next RFC 4/9] net: dsa: mv88e6xxx: add support for event capture
On Sun, Oct 8, 2017 at 9:06 AM, Richard Cochran wrote: >> +static int mv88e6xxx_ptp_enable_perout(struct mv88e6xxx_chip *chip, >> +struct ptp_clock_request *rq, int on) >> +{ >> + struct timespec ts; >> + u64 ns; >> + int pin; >> + int err; >> + >> + pin = ptp_find_pin(chip->ptp_clock, PTP_PF_PEROUT, rq->extts.index); >> + >> + if (pin < 0) >> + return -EBUSY; >> + >> + ts.tv_sec = rq->perout.period.sec; >> + ts.tv_nsec = rq->perout.period.nsec; >> + ns = timespec_to_ns(&ts); >> + >> + if (ns > U32_MAX) >> + return -ERANGE; >> + >> + mutex_lock(&chip->reg_lock); >> + >> + err = mv88e6xxx_config_periodic_trig(chip, (u32)ns, 0); > > Here you ignore the phase of the signal given in the trq->perout.start > field. That is not what the user expects. For periodic outputs where > the phase cannot be set, we really would need a new ioctl. > > However, in this case, you should just drop this functionality. I > understand that this works with your adjustable external oscillator, > but we cannot support that in mainline (at least, not yet). I've been working with this patchset and just came across this limitation as well. The periodic timer output is the basis of the Qbv t0 timer in devices that support Qbv, and setting this up with the correct start time is pretty important in that context. The hardware does support setting a start time, but it must be specified according to the cycle count of the free-running timer rather than a nanosecond value. I think this can be worked out from the values stored in the timecounter struct and I'm writing some code for it now, but if you've already written something I'd be happy to integrate that instead. Another issue related to this is that while the free-running counter in the hardware can't be easily adjusted, the periodic event generator *can* be finely adjusted (via picosecond and sub-picosecond accumulators) to correct for drift between the local clock and the PTP grandmaster time. So to be semantically correct, this needs to be both started at the right time *and* it needs to have the periodic corrections made so that the fine correction parameters in the hardware keep it adjusted to be synchronous with PTP grandmaster time. So, taking this functionality out in the first pass seems like a good move for Brandon to take, but I'm working on a complete implementation for it. I think I've got a handle on how to do it, but if you have any suggestions, I'm open to them. Levi
Re: [PATCH v5 2/2] staging: ion: create one device entry per heap
On Mon, Oct 09, 2017 at 02:25:47PM -0700, Laura Abbott wrote: > Anyway, to move this forward I think we need to see a proof of concept > of using selinux to protect access to specific heaps. Aren't Unix permissions enough with separate files or am I misunderstanding what you're looking to see a proof of concept for? signature.asc Description: PGP signature
ATENCIÓN
ATENCIÓN; Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación: nombre: Nombre de usuario: contraseña: Confirmar contraseña: E-mail: teléfono: Si usted no puede revalidar su buzón, el buzón se deshabilitará! Disculpa las molestias. Código de verificación: es: 006524 Correo Soporte Técnico © 2017 ¡gracias Sistemas administrador (null)
Re: usb/sound: use-after-free in snd_usb_mixer_interrupt
On Mon, 09 Oct 2017 19:50:39 +0200, Andrey Konovalov wrote: > > Hi! > > I've got the following report while fuzzing the kernel with syzkaller. > > On commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f (4.14-rc4). > > gadgetfs: bound to dummy_udc driver > usb 1-1: new full-speed USB device number 2 using dummy_hcd > gadgetfs: connected > gadgetfs: disconnected > gadgetfs: connected > usb 1-1: config 1 has an invalid interface number: 1 but max is 0 > usb 1-1: config 1 has 2 interfaces, different from the descriptor's value: 1 > usb 1-1: config 1 interface 0 altsetting 0 endpoint 0x81 has an > invalid bInterval 0, changing to 10 > usb 1-1: too many endpoints for config 1 interface 1 altsetting 174: > 131, using maximum allowed: 30 > usb 1-1: config 1 interface 1 altsetting 174 has 0 endpoint > descriptors, different from the interface de > scriptor's value: 131 > usb 1-1: config 1 interface 1 has no altsetting 0 > usb 1-1: New USB device found, idVendor=0dba, idProduct=1000 > usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=252 > usb 1-1: SerialNumber: a > gadgetfs: configuration #1 > hub 1-1:1.0: USB hub found > hub 1-1:1.0: config failed, can't read hub descriptor (err -22) > snd-usb-audio: probe of 1-1:1.0 failed with error -22 > == > BUG: KASAN: use-after-free in snd_usb_mixer_interrupt+0x604/0x6f0 > Read of size 8 at addr 8800636f8900 by task swapper/1/0 A bad news for a sleepless night... It's a stray URB that isn't properly killed when the usb-audio mixer interface gets the error at probe. The patch below should fix it. I'm going to wrap up with a proper description later. thanks, Takashi --- diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 9732edf77f86..91bc8f18791e 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2234,6 +2234,9 @@ static int parse_audio_unit(struct mixer_build *state, int unitid) static void snd_usb_mixer_free(struct usb_mixer_interface *mixer) { + /* kill pending URBs */ + snd_usb_mixer_disconnect(mixer); + kfree(mixer->id_elems); if (mixer->urb) { kfree(mixer->urb->transfer_buffer); @@ -2584,8 +2587,13 @@ int snd_usb_create_mixer(struct snd_usb_audio *chip, int ctrlif, void snd_usb_mixer_disconnect(struct usb_mixer_interface *mixer) { - usb_kill_urb(mixer->urb); - usb_kill_urb(mixer->rc_urb); + if (mixer->disconnected) + return; + if (mixer->urb) + usb_kill_urb(mixer->urb); + if (mixer->rc_urb) + usb_kill_urb(mixer->rc_urb); + mixer->disconnected = true; } #ifdef CONFIG_PM diff --git a/sound/usb/mixer.h b/sound/usb/mixer.h index 2b4b067646ab..545d99b09706 100644 --- a/sound/usb/mixer.h +++ b/sound/usb/mixer.h @@ -22,6 +22,8 @@ struct usb_mixer_interface { struct urb *rc_urb; struct usb_ctrlrequest *rc_setup_packet; u8 rc_buffer[6]; + + bool disconnected; }; #define MAX_CHANNELS 16 /* max logical channels */
[PATCH 1/2 v2] typec: tcpm: Validate source and sink caps
The source and sink caps should follow the following rules. This patch validates whether the src_caps/snk_caps adheres to it. 6.4.1 Capabilities Message A Capabilities message (Source Capabilities message or Sink Capabilities message) shall have at least one Power Data Object for vSafe5V. The Capabilities message shall also contain the sending Port’s information followed by up to 6 additional Power Data Objects. Power Data Objects in a Capabilities message shall be sent in the following order: 1. The vSafe5V Fixed Supply Object shall always be the first object. 2. The remaining Fixed Supply Objects, if present, shall be sent in voltage order; lowest to highest. 3. The Battery Supply Objects, if present shall be sent in Minimum Voltage order; lowest to highest. 4. The Variable Supply (non-battery) Objects, if present, shall be sent in Minimum Voltage order; lowest to highest. Errors in source/sink_caps of the local port will prevent the port registration. Whereas, errors in source caps of partner device would only log them. Signed-off-by: Badhri Jagan Sridharan --- Changelog since v1: - Rebased the patch on top of drivers/usb/type/tcpm.c - Added duplicate pdo check for variable/batt pdo. - Fixed tabs as suggested by dan.carpen...@oracle.com drivers/usb/typec/tcpm.c | 114 +++ include/linux/usb/pd.h | 2 + include/linux/usb/tcpm.h | 4 +- 3 files changed, 109 insertions(+), 11 deletions(-) diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c index 8483d3e33853..75deac3ee58d 100644 --- a/drivers/usb/typec/tcpm.c +++ b/drivers/usb/typec/tcpm.c @@ -1256,6 +1256,82 @@ static void vdm_state_machine_work(struct work_struct *work) mutex_unlock(&port->lock); } +static int tcpm_validate_caps(struct tcpm_port *port, const u32 *pdo, + unsigned int nr_pdo) +{ + unsigned int i; + + /* Should at least contain vSafe5v */ + if (nr_pdo < 1) { + tcpm_log_force(port, + " err: source/sink caps should atleast have vSafe5V"); + return -EINVAL; + } + + /* The vSafe5V Fixed Supply Object Shall always be the first object */ + if (pdo_type(pdo[0]) != PDO_TYPE_FIXED || + pdo_fixed_voltage(pdo[0]) != VSAFE5V) { + tcpm_log_force(port, + " err: vSafe5V Fixed Supply Object Shall always be the first object"); + return -EINVAL; + } + + for (i = 1; i < nr_pdo; i++) { + if (pdo_type(pdo[i]) < pdo_type(pdo[i - 1])) { + tcpm_log_force(port, + " err:PDOs should be in the following order: Fixed; Battery; Variable. pdo index:%u" + , i); + return -EINVAL; + } else if (pdo_type(pdo[i]) == pdo_type(pdo[i - 1])) { + enum pd_pdo_type type = pdo_type(pdo[i]); + + switch (type) { + /* +* The remaining Fixed Supply Objects, if +* present, shall be sent in voltage order; +* lowest to highest. +*/ + case PDO_TYPE_FIXED: + if (pdo_fixed_voltage(pdo[i]) <= + pdo_fixed_voltage(pdo[i - 1])) { + tcpm_log_force(port, + " err: Fixed supply pdos should be in increasing order, pdo index:%u" + , i); + return -EINVAL; + } + break; + /* +* The Battery Supply Objects and Variable +* supply, if present shall be sent in Minimum +* Voltage order; lowest to highest. +*/ + case PDO_TYPE_VAR: + case PDO_TYPE_BATT: + if (pdo_min_voltage(pdo[i]) < + pdo_min_voltage(pdo[i - 1])) { + tcpm_log_force(port, + " err: Variable supply pdos should be in increasing order, pdo index:%u" + , i); + return -EINVAL; + } else if ((pdo_min_voltage(pdo[i]) == + pdo_min_voltage(pdo[i - 1])) && + (pdo_max_voltage(pdo[i]) == + pdo_min_voltage(pdo[i - 1]))) { + tcpm_log_force(port, +
Re: [PATCH 0/5] trace-cmd: Fixes for four small bugs plus minor cleanup
Ping? Were there any concerns with these or things needed to be done before they could be merged? Thanks. On Sat, Aug 12, 2017, at 11:30 AM, Michael Sartain wrote: > Thanks much. > -Mike > > --- > > Michael Sartain (5): > trace-cmd: Fix incorrect malloc size arg: *item instead of item > trace-cmd: Fix NULL pointer being passed to memcpy > trace-cmd: Add ULL suffix to MISSING_EVENTS since ints shouldn't be > left shifted by 31 > trace-cmd: Use unsigned values in Hsieh's trace_hash fast hash > function > trace-cmd: Remove unused view_width variable > > kbuffer-parse.c| 4 ++-- > trace-dialog.c | 2 +- > trace-graph.c | 2 -- > trace-hash-local.h | 4 ++-- > trace-output.c | 6 +- > 5 files changed, 10 insertions(+), 8 deletions(-) > > -- > 2.13.2 >
[PATCH 2/2 v2] typec: tcpm: Only request matching pdos
At present, TCPM code assumes that local device supports variable/batt pdos and always selects the pdo with highest possible power within the board limit. This assumption might not hold good for all devices. To overcome this, this patch makes TCPM only accept a source_pdo when there is a matching sink pdo. For Fixed pdos: The voltage should match between the incoming source_cap and the registered snk_pdo For Variable/Batt pdos: The incoming source_cap voltage range should fall within the registered snk_pdo's voltage range. Also, when the cap_mismatch bit is set, the max_power/current should be set to the max_current/power of the sink_pdo. This is according to: "If the Capability Mismatch bit is set to one The Maximum Operating Current/Power field may contain a value larger than the maximum current/power offered in the Source Capabilities message’s PDO as referenced by the Object position field. This enables the Sink to indicate that it requires more current/power than is being offered. If the Sink requires a different voltage this will be indicated by its Sink Capabilities message. Signed-off-by: Badhri Jagan Sridharan --- Changelog since v1: - rebased on top drivers/usb/typec/tcpm.c as suggested by gre...@linuxfoundation.org - renamed nr_snk_*_pdo as suggested by dan.carpen...@oracle.com - removed stale comment as suggested by li...@roeck-us.net - removed the tests for nr_snk_*_pdo as suggested by dan.carpen...@oracle.com - Fixed sytling as suggested by dan.carpen...@oracle.com - renamed tcpm_get_nr_type_pdos to nr_type_pdos as suggested by dan.carpen...@oracle.com - fixed nr_type_pdos as suggested by dan.carpen...@oracle.com - tcpm_pd_select_pdo now checks for all matching variable/batt pdos instead of the first matching one. drivers/usb/typec/tcpm.c | 155 --- 1 file changed, 120 insertions(+), 35 deletions(-) diff --git a/drivers/usb/typec/tcpm.c b/drivers/usb/typec/tcpm.c index 75deac3ee58d..62d6fad8602f 100644 --- a/drivers/usb/typec/tcpm.c +++ b/drivers/usb/typec/tcpm.c @@ -261,6 +261,9 @@ struct tcpm_port { unsigned int nr_src_pdo; u32 snk_pdo[PDO_MAX_OBJECTS]; unsigned int nr_snk_pdo; + unsigned int nr_fixed; /* number of fixed sink PDOs */ + unsigned int nr_var; /* number of variable sink PDOs */ + unsigned int nr_batt; /* number of battery sink PDOs */ u32 snk_vdo[VDO_MAX_OBJECTS]; unsigned int nr_snk_vdo; @@ -1758,39 +1761,92 @@ static int tcpm_pd_check_request(struct tcpm_port *port) return 0; } -static int tcpm_pd_select_pdo(struct tcpm_port *port) +static void tcpm_pd_evaluate_pdo(struct tcpm_port *port, int src_pdo_index, +int snk_pdo_index, int *max_mw, int *max_mv, +int *selected_src_pdo, +int *matching_snk_pdo) { - unsigned int i, max_mw = 0, max_mv = 0; + unsigned int mv = 0, ma = 0, mw = 0; + u32 src_pdo = port->source_caps[src_pdo_index]; + u32 snk_pdo = port->snk_pdo[snk_pdo_index]; + enum pd_pdo_type type = pdo_type(src_pdo); + + mv = (type == PDO_TYPE_FIXED) ? pdo_fixed_voltage(src_pdo) : + pdo_min_voltage(src_pdo); + + if (type == PDO_TYPE_BATT) { + mw = min(pdo_max_power(src_pdo), pdo_max_power(snk_pdo)); + } else { + ma = min(pdo_max_current(src_pdo), pdo_max_current(snk_pdo)); + mw = ma * mv / 1000; + } + + /* Perfer higher voltages if available */ + if (mw > *max_mw || (mw == *max_mw && mv > *max_mv)) { + *selected_src_pdo = src_pdo_index; + *matching_snk_pdo = snk_pdo_index; + *max_mw = mw; + *max_mv = mv; + } +} + +static int tcpm_pd_select_pdo(struct tcpm_port *port, int *sink_pdo) +{ + unsigned int i, j, max_mw = 0, max_mv = 0; int ret = -EINVAL; /* -* Select the source PDO providing the most power while staying within -* the board's voltage limits. Prefer PDO providing exp +* Select the source PDO providing the most power which has a +* matchig sink cap. */ for (i = 0; i < port->nr_source_caps; i++) { u32 pdo = port->source_caps[i]; enum pd_pdo_type type = pdo_type(pdo); - unsigned int mv, ma, mw; - - if (type == PDO_TYPE_FIXED) - mv = pdo_fixed_voltage(pdo); - else - mv = pdo_min_voltage(pdo); - - if (type == PDO_TYPE_BATT) { - mw = pdo_max_power(pdo); - } else { - ma = min(pdo_max_current(pdo), -port->max_snk_ma); - mw = ma * mv / 1000; - } - /* Perfer higher voltages if available */ - if
[PATCH] bcache: mark expected switch fall-throughs in STRTO_H
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Cc: Kent Overstreet Cc: Shaohua Li Cc: linux-bca...@vger.kernel.org Cc: linux-r...@vger.kernel.org Signed-off-by: Gustavo A. R. Silva --- drivers/md/bcache/util.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c index 176d3c2..da9194b 100644 --- a/drivers/md/bcache/util.c +++ b/drivers/md/bcache/util.c @@ -32,20 +32,27 @@ int bch_ ## name ## _h(const char *cp, type *res) \ case 'y': \ case 'z': \ u++;\ + /* fall through */ \ case 'e': \ u++;\ + /* fall through */ \ case 'p': \ u++;\ + /* fall through */ \ case 't': \ u++;\ + /* fall through */ \ case 'g': \ u++;\ + /* fall through */ \ case 'm': \ u++;\ + /* fall through */ \ case 'k': \ u++;\ if (e++ == cp) \ return -EINVAL; \ + /* fall through */ \ case '\n': \ case '\0': \ if (*e == '\n') \ -- 2.7.4
[PATCH] gpiolib: drop irq_base field from gpio_chip struct
Hence, the last user of irq_base field was removed by commit b4c495f03ae3 ("gpio: mockup: use irq_sim") it can be removed safely. Signed-off-by: Grygorii Strashko --- include/linux/gpio/driver.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index c97f832..1de5dd1 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -165,7 +165,6 @@ struct gpio_chip { */ struct irq_chip *irqchip; struct irq_domain *irqdomain; - unsigned intirq_base; irq_flow_handler_t irq_handler; unsigned intirq_default_type; unsigned intirq_chained_parent; -- 2.10.5
Re: [RFT PATCH v2] gpiolib: allow gpio irqchip to map irqs dynamically
On 10/09/2017 02:57 PM, Linus Walleij wrote: On Mon, Oct 9, 2017 at 8:10 PM, Grygorii Strashko wrote: At the moment it was merged there were no user of irq_base except gpio-mockup.c. OK I feel calmer. And actually there are shouldn't as calling irq_create_mapping() in cycle will not guarantee sequential Linux IRQ numbers allocation. Yeah. I always felt that was bad practice. But we should really kill off that irq_base member from gpio_chip. It looks dangerous to have arond. patch posted. -- regards, -grygorii
Re: [PATCH 1/2] staging: typec: tcpm: Validate source and sink caps
Sorry about delay. Just sent the rebased patchset along with the comments addressed. Thanks & Regards, Badhri. On Mon, Sep 18, 2017 at 3:20 AM, Greg Kroah-Hartman wrote: > On Thu, Sep 07, 2017 at 06:22:13PM -0700, Badhri Jagan Sridharan wrote: >> The source and sink caps should follow the following rules. >> This patch validates whether the src_caps/snk_caps adheres >> to it. >> >> 6.4.1 Capabilities Message >> A Capabilities message (Source Capabilities message or Sink >> Capabilities message) shall have at least one Power >> Data Object for vSafe5V. The Capabilities message shall also >> contain the sending Port’s information followed by up to >> 6 additional Power Data Objects. Power Data Objects in a >> Capabilities message shall be sent in the following order: >> >> 1. The vSafe5V Fixed Supply Object shall always be the first object. >> 2. The remaining Fixed Supply Objects, if present, shall be sent >>in voltage order; lowest to highest. >> 3. The Battery Supply Objects, if present shall be sent in Minimum >>Voltage order; lowest to highest. >> 4. The Variable Supply (non-battery) Objects, if present, shall be >>sent in Minimum Voltage order; lowest to highest. >> >> Errors in source/sink_caps of the local port will prevent >> the port registration. Whereas, errors in source caps of partner >> device would only log them. >> >> Signed-off-by: Badhri Jagan Sridharan >> --- >> drivers/staging/typec/pd.h | 2 + >> drivers/staging/typec/tcpm.c | 107 >> +++ >> drivers/staging/typec/tcpm.h | 16 +++ >> 3 files changed, 108 insertions(+), 17 deletions(-) > > > Now that these files are moved in my usb tree, can you rebase and resend > them? > > thanks, > > greg k-h
Re: [PATCH 2/2] media: venus: venc: fix bytesused v4l2_plane field
Hans, On 9.10.2017 15:31, Hans Verkuil wrote: On 09/10/17 14:24, Stanimir Varbanov wrote: This fixes wrongly filled bytesused field of v4l2_plane structure by include data_offset in the plane, Also fill data_offset and bytesused for capture type of buffers only. Signed-off-by: Stanimir Varbanov --- drivers/media/platform/qcom/venus/venc.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/media/platform/qcom/venus/venc.c b/drivers/media/platform/qcom/venus/venc.c index 6f123a387cf9..9445ad492966 100644 --- a/drivers/media/platform/qcom/venus/venc.c +++ b/drivers/media/platform/qcom/venus/venc.c @@ -963,15 +963,16 @@ static void venc_buf_done(struct venus_inst *inst, unsigned int buf_type, if (!vbuf) return; - vb = &vbuf->vb2_buf; - vb->planes[0].bytesused = bytesused; - vb->planes[0].data_offset = data_offset; - vbuf->flags = flags; if (type == V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE) { + vb = &vbuf->vb2_buf; + vb2_set_plane_payload(vb, 0, bytesused + data_offset); + vb->planes[0].data_offset = data_offset; vb->timestamp = timestamp_us * NSEC_PER_USEC; vbuf->sequence = inst->sequence_cap++; + + WARN_ON(vb2_get_plane_payload(vb, 0) > vb2_plane_size(vb, 0)); It's good to have this, but this really should never happen. Because if it is, then you'll have a memory overwrite. I hope the DMA engine will prevent this? > Just wondering how this works. The patch looks good otherwise, but that WARN_ON is a bit scary. Infact it is not so scary as it looks like, the IOMMU will catch this and generate a fault. So most probably we will never come to the WARN, thus the warning is pointless so will delete it. regards, Stan
[PATCH v11 1/9] x86/mm: setting fields in deferred pages
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT), flags and other fields in "struct page"es are never changed prior to first initializing struct pages by going through __init_single_page(). With deferred struct page feature enabled, however, we set fields in register_page_bootmem_info that are subsequently clobbered right after in free_all_bootmem: mem_init() { register_page_bootmem_info(); free_all_bootmem(); ... } When register_page_bootmem_info() is called only non-deferred struct pages are initialized. But, this function goes through some reserved pages which might be part of the deferred, and thus are not yet initialized. mem_init register_page_bootmem_info register_page_bootmem_info_node get_page_bootmem .. setting fields here .. such as: page->freelist = (void *)type; free_all_bootmem() free_low_memory_core_early() for_each_reserved_mem_region() reserve_bootmem_region() init_reserved_page() <- Only if this is deferred reserved page __init_single_pfn() __init_single_page() memset(0) <-- Loose the set fields here We end-up with issue where, currently we do not observe problem as memory is explicitly zeroed. But, if flag asserts are changed we can start hitting issues. Also, because in this patch series we will stop zeroing struct page memory during allocation, we must make sure that struct pages are properly initialized prior to using them. The deferred-reserved pages are initialized in free_all_bootmem(). Therefore, the fix is to switch the above calls. Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco Acked-by: Michal Hocko --- arch/x86/mm/init_64.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 5ea1c3c2636e..8822523fdcd7 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1182,12 +1182,18 @@ void __init mem_init(void) /* clear_bss() already clear the empty_zero_page */ - register_page_bootmem_info(); - /* this will put all memory onto the freelists */ free_all_bootmem(); after_bootmem = 1; + /* +* Must be done after boot memory is put on freelist, because here we +* might set fields in deferred struct pages that have not yet been +* initialized, and free_all_bootmem() initializes all the reserved +* deferred pages for us. +*/ + register_page_bootmem_info(); + /* Register memory areas for /proc/kcore */ kclist_add(&kcore_vsyscall, (void *)VSYSCALL_ADDR, PAGE_SIZE, KCORE_OTHER); -- 2.14.2
Re: [PATCH 0/5] trace-cmd: Fixes for four small bugs plus minor cleanup
On Mon, 9 Oct 2017 16:13:42 -0600 Michael Sartain wrote: > Ping? Were there any concerns with these or things needed to be > done before they could be merged? > Thanks for the ping, this fell back in the todo list of my emails. I'll look at it this week. -- Steve
[PATCH v11 2/9] sparc64/mm: setting fields in deferred pages
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT), flags and other fields in "struct page"es are never changed prior to first initializing struct pages by going through __init_single_page(). With deferred struct page feature enabled there is a case where we set some fields prior to initializing: mem_init() { register_page_bootmem_info(); free_all_bootmem(); ... } When register_page_bootmem_info() is called only non-deferred struct pages are initialized. But, this function goes through some reserved pages which might be part of the deferred, and thus are not yet initialized. mem_init register_page_bootmem_info register_page_bootmem_info_node get_page_bootmem .. setting fields here .. such as: page->freelist = (void *)type; free_all_bootmem() free_low_memory_core_early() for_each_reserved_mem_region() reserve_bootmem_region() init_reserved_page() <- Only if this is deferred reserved page __init_single_pfn() __init_single_page() memset(0) <-- Loose the set fields here We end-up with similar issue as in the previous patch, where currently we do not observe problem as memory is zeroed. But, if flag asserts are changed we can start hitting issues. Also, because in this patch series we will stop zeroing struct page memory during allocation, we must make sure that struct pages are properly initialized prior to using them. The deferred-reserved pages are initialized in free_all_bootmem(). Therefore, the fix is to switch the above calls. Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco Acked-by: David S. Miller Acked-by: Michal Hocko --- arch/sparc/mm/init_64.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 6034569e2c0d..caed495544e9 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2548,9 +2548,16 @@ void __init mem_init(void) { high_memory = __va(last_valid_pfn << PAGE_SHIFT); - register_page_bootmem_info(); free_all_bootmem(); + /* +* Must be done after boot memory is put on freelist, because here we +* might set fields in deferred struct pages that have not yet been +* initialized, and free_all_bootmem() initializes all the reserved +* deferred pages for us. +*/ + register_page_bootmem_info(); + /* * Set up the zero page, mark it reserved, so that page count * is not manipulated when freeing the page from user ptes. -- 2.14.2
[PATCH v11 5/9] mm: zero reserved and unavailable struct pages
Some memory is reserved but unavailable: not present in memblock.memory (because not backed by physical pages), but present in memblock.reserved. Such memory has backing struct pages, but they are not initialized by going through __init_single_page(). In some cases these struct pages are accessed even if they do not contain any data. One example is page_to_pfn() might access page->flags if this is where section information is stored (CONFIG_SPARSEMEM, SECTION_IN_PAGE_FLAGS). One example of such memory: trim_low_memory_range() unconditionally reserves from pfn 0, but e820__memblock_setup() might provide the exiting memory from pfn 1 (i.e. KVM). Since, struct pages are zeroed in __init_single_page(), and not during allocation time, we must zero such struct pages explicitly. The patch involves adding a new memblock iterator: for_each_resv_unavail_range(i, p_start, p_end) Which iterates through reserved && !memory lists, and we zero struct pages explicitly by calling mm_zero_struct_page(). Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco --- include/linux/memblock.h | 16 include/linux/mm.h | 15 +++ mm/page_alloc.c | 38 ++ 3 files changed, 69 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index bae11c7e7bf3..ce8bfa5f3e9b 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -237,6 +237,22 @@ unsigned long memblock_next_valid_pfn(unsigned long pfn, unsigned long max_pfn); for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved, \ nid, flags, p_start, p_end, p_nid) +/** + * for_each_resv_unavail_range - iterate through reserved and unavailable memory + * @i: u64 used as loop variable + * @flags: pick from blocks based on memory attributes + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL + * + * Walks over unavailable but reserved (reserved && !memory) areas of memblock. + * Available as soon as memblock is initialized. + * Note: because this memory does not belong to any physical node, flags and + * nid arguments do not make sense and thus not exported as arguments. + */ +#define for_each_resv_unavail_range(i, p_start, p_end) \ + for_each_mem_range(i, &memblock.reserved, &memblock.memory, \ + NUMA_NO_NODE, MEMBLOCK_NONE, p_start, p_end, NULL) + static inline void memblock_set_region_flags(struct memblock_region *r, unsigned long flags) { diff --git a/include/linux/mm.h b/include/linux/mm.h index 065d99deb847..04c8b2e5aff4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -94,6 +94,15 @@ extern int mmap_rnd_compat_bits __read_mostly; #define mm_forbids_zeropage(X) (0) #endif +/* + * On some architectures it is expensive to call memset() for small sizes. + * Those architectures should provide their own implementation of "struct page" + * zeroing by defining this macro in . + */ +#ifndef mm_zero_struct_page +#define mm_zero_struct_page(pp) ((void)memset((pp), 0, sizeof(struct page))) +#endif + /* * Default maximum number of active map areas, this limits the number of vmas * per mm struct. Users can overwrite this number by sysctl but there is a @@ -2001,6 +2010,12 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn, struct mminit_pfnnid_cache *state); #endif +#ifdef CONFIG_HAVE_MEMBLOCK +void zero_resv_unavail(void); +#else +static inline void zero_resv_unavail(void) {} +#endif + extern void set_dma_reserve(unsigned long new_dma_reserve); extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long, enum memmap_context); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 20b0bace2235..5f0013bbbe9d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6209,6 +6209,42 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size, free_area_init_core(pgdat); } +#ifdef CONFIG_HAVE_MEMBLOCK +/* + * Only struct pages that are backed by physical memory are zeroed and + * initialized by going through __init_single_page(). But, there are some + * struct pages which are reserved in memblock allocator and their fields + * may be accessed (for example page_to_pfn() on some configuration accesses + * flags). We must explicitly zero those struct pages. + */ +void __paginginit zero_resv_unavail(void) +{ + phys_addr_t start, end; + unsigned long pfn; + u64 i, pgcnt; + + /* Loop through ranges that are reserved, but do not have reported +* physical memory backing. +*/ + pgcnt = 0; + for_each_resv_unavail_range(i, &start, &end) { + for (pfn = PFN_DOWN(start); pfn <
[PATCH v11 0/9] complete deferred page initialization
Changelog: v11 - v10 - Moved kasan_map_populate() implementation from common code into arch specific as discussed with Will Deacon. We do not need "mm/kasan: kasan specific map populate function" anymore, so only 9 patches left. v10 - v9 - Addressed new comments from Michal Hocko. - Sent "mm: deferred_init_memmap improvements" as a separate patch as it is also fixing existing problem. - Merged "mm: stop zeroing memory during allocation in vmemmap" with "mm: zero struct pages during initialization". - Added more comments "mm: zero reserved and unavailable struct pages" v9 - v8 - Addressed comments raised by Mark Rutland and Ard Biesheuvel: changed kasan implementation. Added a new function: kasan_map_populate() that zeroes the allocated and mapped memory v8 - v7 - Added Acked-by's from Dave Miller for SPARC changes - Fixed a minor compiling issue on tile architecture reported by kbuild v7 - v6 - Addressed comments from Michal Hocko - memblock_discard() patch was removed from this series and integrated separately - Fixed bug reported by kbuild test robot new patch: mm: zero reserved and unavailable struct pages - Removed patch x86/mm: reserve only exiting low pages As, it is not needed anymore, because of the previous fix - Re-wrote deferred_init_memmap(), found and fixed an existing bug, where page variable is not reset when zone holes present. - Merged several patches together per Michal request - Added performance data including raw logs v6 - v5 - Fixed ARM64 + kasan code, as reported by Ard Biesheuvel - Tested ARM64 code in qemu and found few more issues, that I fixed in this iteration - Added page roundup/rounddown to x86 and arm zeroing routines to zero the whole allocated range, instead of only provided address range. - Addressed SPARC related comment from Sam Ravnborg - Fixed section mismatch warnings related to memblock_discard(). v5 - v4 - Fixed build issues reported by kbuild on various configurations v4 - v3 - Rewrote code to zero sturct pages in __init_single_page() as suggested by Michal Hocko - Added code to handle issues related to accessing struct page memory before they are initialized. v3 - v2 - Addressed David Miller comments about one change per patch: * Splited changes to platforms into 4 patches * Made "do not zero vmemmap_buf" as a separate patch v2 - v1 - Per request, added s390 to deferred "struct page" zeroing - Collected performance data on x86 which proofs the importance to keep memset() as prefetch (see below). SMP machines can benefit from the DEFERRED_STRUCT_PAGE_INIT config option, which defers initializing struct pages until all cpus have been started so it can be done in parallel. However, this feature is sub-optimal, because the deferred page initialization code expects that the struct pages have already been zeroed, and the zeroing is done early in boot with a single thread only. Also, we access that memory and set flags before struct pages are initialized. All of this is fixed in this patchset. In this work we do the following: - Never read access struct page until it was initialized - Never set any fields in struct pages before they are initialized - Zero struct page at the beginning of struct page initialization == Performance improvements on x86 machine with 8 nodes: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz and 1T of memory: TIME SPEED UP base no deferred: 95.796233s fix no deferred:79.978956s19.77% base deferred: 77.254713s fix deferred: 55.050509s40.34% == SPARC M6 3600 MHz with 15T of memory TIME SPEED UP base no deferred: 358.335727s fix no deferred:302.320936s 18.52% base deferred: 237.534603s fix deferred: 182.103003s 30.44% == Raw dmesg output with timestamps: x86 base no deferred:https://hastebin.com/ofunepurit.scala x86 base deferred: https://hastebin.com/ifazegeyas.scala x86 fix no deferred: https://hastebin.com/pegocohevo.scala x86 fix deferred:https://hastebin.com/ofupevikuk.scala sparc base no deferred: https://hastebin.com/ibobeteken.go sparc base deferred: https://hastebin.com/fariqimiyu.go sparc fix no deferred: https://hastebin.com/muhegoheyi.go sparc fix deferred: https://hastebin.com/xadinobutu.go Pavel Tatashin (9): x86/mm: setting fields in deferred pages sparc64/mm: setting fields in deferred pages sparc64: simplify vmemmap_populate mm: defining memblock_virt_alloc_try_nid_raw mm: zero reserved and unavailable struct pages x86/kasan: add and use kasan_map_populate() arm64/kasan: add and use kasan_map_populate() mm: stop zeroing memory during allocation in vmemmap sparc64: optimized
[PATCH v11 8/9] mm: stop zeroing memory during allocation in vmemmap
vmemmap_alloc_block() will no longer zero the block, so zero memory at its call sites for everything except struct pages. Struct page memory is zero'd by struct page initialization. Replace allocators in sprase-vmemmap to use the non-zeroing version. So, we will get the performance improvement by zeroing the memory in parallel when struct pages are zeroed. Add struct page zeroing as a part of initialization of other fields in __init_single_page(). This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes): BASEFIX sparse_init 11.244671836s 0.007199623s zone_sizes_init 4.879775891s 8.355182299s -- Total 16.124447727s 8.362381922s sparse_init is where memory for struct pages is zeroed, and the zeroing part is moved later in this patch into __init_single_page(), which is called from zone_sizes_init(). Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco Acked-by: Michal Hocko --- include/linux/mm.h | 11 +++ mm/page_alloc.c | 1 + mm/sparse-vmemmap.c | 15 +++ mm/sparse.c | 6 +++--- 4 files changed, 22 insertions(+), 11 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 04c8b2e5aff4..fd045a3b243a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2501,6 +2501,17 @@ static inline void *vmemmap_alloc_block_buf(unsigned long size, int node) return __vmemmap_alloc_block_buf(size, node, NULL); } +static inline void *vmemmap_alloc_block_zero(unsigned long size, int node) +{ + void *p = vmemmap_alloc_block(size, node); + + if (!p) + return NULL; + memset(p, 0, size); + + return p; +} + void vmemmap_verify(pte_t *, int, unsigned long, unsigned long); int vmemmap_populate_basepages(unsigned long start, unsigned long end, int node); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5f0013bbbe9d..85e038e1e941 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1170,6 +1170,7 @@ static void free_one_page(struct zone *zone, static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { + mm_zero_struct_page(page); set_page_links(page, zone, nid, pfn); init_page_count(page); page_mapcount_reset(page); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index d1a39b8051e0..c2f5654e7c9d 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -41,7 +41,7 @@ static void * __ref __earlyonly_bootmem_alloc(int node, unsigned long align, unsigned long goal) { - return memblock_virt_alloc_try_nid(size, align, goal, + return memblock_virt_alloc_try_nid_raw(size, align, goal, BOOTMEM_ALLOC_ACCESSIBLE, node); } @@ -54,9 +54,8 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node) if (slab_is_available()) { struct page *page; - page = alloc_pages_node(node, - GFP_KERNEL | __GFP_ZERO | __GFP_RETRY_MAYFAIL, - get_order(size)); + page = alloc_pages_node(node, GFP_KERNEL | __GFP_RETRY_MAYFAIL, + get_order(size)); if (page) return page_address(page); return NULL; @@ -183,7 +182,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node) { pmd_t *pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) { - void *p = vmemmap_alloc_block(PAGE_SIZE, node); + void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; pmd_populate_kernel(&init_mm, pmd, p); @@ -195,7 +194,7 @@ pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node) { pud_t *pud = pud_offset(p4d, addr); if (pud_none(*pud)) { - void *p = vmemmap_alloc_block(PAGE_SIZE, node); + void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; pud_populate(&init_mm, pud, p); @@ -207,7 +206,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node) { p4d_t *p4d = p4d_offset(pgd, addr); if (p4d_none(*p4d)) { - void *p = vmemmap_alloc_block(PAGE_SIZE, node); + void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; p4d_populate(&init_mm, p4d, p); @@ -219,7 +218,7 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node) {
[PATCH v11 6/9] x86/kasan: add and use kasan_map_populate()
During early boot, kasan uses vmemmap_populate() to establish its shadow memory. But, that interface is intended for struct pages use. Because of the current project, vmemmap won't be zeroed during allocation, but kasan expects that memory to be zeroed. We are adding a new kasan_map_populate() function to resolve this difference. Therefore, we must use a new interface to allocate and map kasan shadow memory, that also zeroes memory for us. Signed-off-by: Pavel Tatashin --- arch/x86/mm/kasan_init_64.c | 75 ++--- 1 file changed, 71 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index bc84b73684b7..9778fec8a5dc 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -15,6 +15,73 @@ extern struct range pfn_mapped[E820_MAX_ENTRIES]; +/* Creates mappings for kasan during early boot. The mapped memory is zeroed */ +static int __meminit kasan_map_populate(unsigned long start, unsigned long end, + int node) +{ + unsigned long addr, pfn, next; + unsigned long long size; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + int ret; + + ret = vmemmap_populate(start, end, node); + /* +* We might have partially populated memory, so check for no entries, +* and zero only those that actually exist. +*/ + for (addr = start; addr < end; addr = next) { + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) { + next = pgd_addr_end(addr, end); + continue; + } + + p4d = p4d_offset(pgd, addr); + if (p4d_none(*p4d)) { + next = p4d_addr_end(addr, end); + continue; + } + + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) { + next = pud_addr_end(addr, end); + continue; + } + if (pud_large(*pud)) { + /* This is PUD size page */ + next = pud_addr_end(addr, end); + size = PUD_SIZE; + pfn = pud_pfn(*pud); + } else { + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) { + next = pmd_addr_end(addr, end); + continue; + } + if (pmd_large(*pmd)) { + /* This is PMD size page */ + next = pmd_addr_end(addr, end); + size = PMD_SIZE; + pfn = pmd_pfn(*pmd); + } else { + pte = pte_offset_kernel(pmd, addr); + next = addr + PAGE_SIZE; + if (pte_none(*pte)) + continue; + /* This is base size page */ + size = PAGE_SIZE; + pfn = pte_pfn(*pte); + } + } + memset(phys_to_virt(PFN_PHYS(pfn)), 0, size); + } + return ret; +} + static int __init map_range(struct range *range) { unsigned long start; @@ -23,7 +90,7 @@ static int __init map_range(struct range *range) start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start)); end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end)); - return vmemmap_populate(start, end, NUMA_NO_NODE); + return kasan_map_populate(start, end, NUMA_NO_NODE); } static void __init clear_pgds(unsigned long start, @@ -136,9 +203,9 @@ void __init kasan_init(void) kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), kasan_mem_to_shadow((void *)__START_KERNEL_map)); - vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext), - (unsigned long)kasan_mem_to_shadow(_end), - NUMA_NO_NODE); + kasan_map_populate((unsigned long)kasan_mem_to_shadow(_stext), + (unsigned long)kasan_mem_to_shadow(_end), + NUMA_NO_NODE); kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), (void *)KASAN_SHADOW_END); -- 2.14.2
[PATCH v11 7/9] arm64/kasan: add and use kasan_map_populate()
During early boot, kasan uses vmemmap_populate() to establish its shadow memory. But, that interface is intended for struct pages use. Because of the current project, vmemmap won't be zeroed during allocation, but kasan expects that memory to be zeroed. We are adding a new kasan_map_populate() function to resolve this difference. Therefore, we must use a new interface to allocate and map kasan shadow memory, that also zeroes memory for us. Signed-off-by: Pavel Tatashin --- arch/arm64/mm/kasan_init.c | 72 ++ 1 file changed, 66 insertions(+), 6 deletions(-) diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c index 81f03959a4ab..cb4af2951c90 100644 --- a/arch/arm64/mm/kasan_init.c +++ b/arch/arm64/mm/kasan_init.c @@ -28,6 +28,66 @@ static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE); +/* Creates mappings for kasan during early boot. The mapped memory is zeroed */ +static int __meminit kasan_map_populate(unsigned long start, unsigned long end, + int node) +{ + unsigned long addr, pfn, next; + unsigned long long size; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + int ret; + + ret = vmemmap_populate(start, end, node); + /* +* We might have partially populated memory, so check for no entries, +* and zero only those that actually exist. +*/ + for (addr = start; addr < end; addr = next) { + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) { + next = pgd_addr_end(addr, end); + continue; + } + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) { + next = pud_addr_end(addr, end); + continue; + } + if (pud_sect(*pud)) { + /* This is PUD size page */ + next = pud_addr_end(addr, end); + size = PUD_SIZE; + pfn = pud_pfn(*pud); + } else { + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) { + next = pmd_addr_end(addr, end); + continue; + } + if (pmd_sect(*pmd)) { + /* This is PMD size page */ + next = pmd_addr_end(addr, end); + size = PMD_SIZE; + pfn = pmd_pfn(*pmd); + } else { + pte = pte_offset_kernel(pmd, addr); + next = addr + PAGE_SIZE; + if (pte_none(*pte)) + continue; + /* This is base size page */ + size = PAGE_SIZE; + pfn = pte_pfn(*pte); + } + } + memset(phys_to_virt(PFN_PHYS(pfn)), 0, size); + } + return ret; +} + /* * The p*d_populate functions call virt_to_phys implicitly so they can't be used * directly on kernel symbols (bm_p*d). All the early functions are called too @@ -161,11 +221,11 @@ void __init kasan_init(void) clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END); - vmemmap_populate(kimg_shadow_start, kimg_shadow_end, -pfn_to_nid(virt_to_pfn(lm_alias(_text; + kasan_map_populate(kimg_shadow_start, kimg_shadow_end, + pfn_to_nid(virt_to_pfn(lm_alias(_text; /* -* vmemmap_populate() has populated the shadow region that covers the +* kasan_map_populate() has populated the shadow region that covers the * kernel image with SWAPPER_BLOCK_SIZE mappings, so we have to round * the start and end addresses to SWAPPER_BLOCK_SIZE as well, to prevent * kasan_populate_zero_shadow() from replacing the page table entries @@ -191,9 +251,9 @@ void __init kasan_init(void) if (start >= end) break; - vmemmap_populate((unsigned long)kasan_mem_to_shadow(start), - (unsigned long)kasan_mem_to_shadow(end), - pfn_to_nid(virt_to_pfn(start))); + kasan_map_populate((unsigned long)kasan_mem_to_shadow(start), + (unsigned long)kasan_mem_to_shadow(end), + pfn_to_nid(virt_to_pfn(start))); } /* -- 2.14.2
[PATCH v11 3/9] sparc64: simplify vmemmap_populate
Remove duplicating code by using common functions vmemmap_pud_populate and vmemmap_pgd_populate. Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco Acked-by: David S. Miller Acked-by: Michal Hocko --- arch/sparc/mm/init_64.c | 23 ++- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index caed495544e9..6839db3ffe1d 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2652,30 +2652,19 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend, vstart = vstart & PMD_MASK; vend = ALIGN(vend, PMD_SIZE); for (; vstart < vend; vstart += PMD_SIZE) { - pgd_t *pgd = pgd_offset_k(vstart); + pgd_t *pgd = vmemmap_pgd_populate(vstart, node); unsigned long pte; pud_t *pud; pmd_t *pmd; - if (pgd_none(*pgd)) { - pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node); + if (!pgd) + return -ENOMEM; - if (!new) - return -ENOMEM; - pgd_populate(&init_mm, pgd, new); - } - - pud = pud_offset(pgd, vstart); - if (pud_none(*pud)) { - pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node); - - if (!new) - return -ENOMEM; - pud_populate(&init_mm, pud, new); - } + pud = vmemmap_pud_populate(pgd, vstart, node); + if (!pud) + return -ENOMEM; pmd = pmd_offset(pud, vstart); - pte = pmd_val(*pmd); if (!(pte & _PAGE_VALID)) { void *block = vmemmap_alloc_block(PMD_SIZE, node); -- 2.14.2
[PATCH v11 9/9] sparc64: optimized struct page zeroing
Add an optimized mm_zero_struct_page(), so struct page's are zeroed without calling memset(). We do eight to ten regular stores based on the size of struct page. Compiler optimizes out the conditions of switch() statement. SPARC-M6 with 15T of memory, single thread performance: BASEFIX OPTIMIZED_FIX bootmem_init 28.440467985s 2.305674818s 2.305161615s free_area_init_nodes 202.845901673s 225.343084508s 172.556506560s Total 231.286369658s 227.648759326s 174.861668175s BASE: current linux FIX: This patch series without "optimized struct page zeroing" OPTIMIZED_FIX: This patch series including the current patch. bootmem_init() is where memory for struct pages is zeroed during allocation. Note, about two seconds in this function is a fixed time: it does not increase as memory is increased. Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco Acked-by: David S. Miller --- arch/sparc/include/asm/pgtable_64.h | 30 ++ 1 file changed, 30 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 4fefe3762083..8ed478abc630 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -230,6 +230,36 @@ extern unsigned long _PAGE_ALL_SZ_BITS; extern struct page *mem_map_zero; #define ZERO_PAGE(vaddr) (mem_map_zero) +/* This macro must be updated when the size of struct page grows above 80 + * or reduces below 64. + * The idea that compiler optimizes out switch() statement, and only + * leaves clrx instructions + */ +#definemm_zero_struct_page(pp) do { \ + unsigned long *_pp = (void *)(pp); \ + \ +/* Check that struct page is either 64, 72, or 80 bytes */ \ + BUILD_BUG_ON(sizeof(struct page) & 7); \ + BUILD_BUG_ON(sizeof(struct page) < 64); \ + BUILD_BUG_ON(sizeof(struct page) > 80); \ + \ + switch (sizeof(struct page)) { \ + case 80:\ + _pp[9] = 0; /* fallthrough */ \ + case 72:\ + _pp[8] = 0; /* fallthrough */ \ + default:\ + _pp[7] = 0; \ + _pp[6] = 0; \ + _pp[5] = 0; \ + _pp[4] = 0; \ + _pp[3] = 0; \ + _pp[2] = 0; \ + _pp[1] = 0; \ + _pp[0] = 0; \ + } \ +} while (0) + /* PFNs are real physical page numbers. However, mem_map only begins to record * per-page information starting at pfn_base. This is to handle systems where * the first physical page in the machine is at some huge physical address, -- 2.14.2
[PATCH v11 4/9] mm: defining memblock_virt_alloc_try_nid_raw
* A new variant of memblock_virt_alloc_* allocations: memblock_virt_alloc_try_nid_raw() - Does not zero the allocated memory - Does not panic if request cannot be satisfied * optimize early system hash allocations Clients can call alloc_large_system_hash() with flag: HASH_ZERO to specify that memory that was allocated for system hash needs to be zeroed, otherwise the memory does not need to be zeroed, and client will initialize it. If memory does not need to be zero'd, call the new memblock_virt_alloc_raw() interface, and thus improve the boot performance. * debug for raw alloctor When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no places excpect zeroed memory. Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco Acked-by: Michal Hocko --- include/linux/bootmem.h | 27 ++ mm/memblock.c | 60 +++-- mm/page_alloc.c | 15 ++--- 3 files changed, 87 insertions(+), 15 deletions(-) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index e223d91b6439..ea30b3987282 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -160,6 +160,9 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, #define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0) /* FIXME: Move to memblock.h at a point where we remove nobootmem.c */ +void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align, + phys_addr_t min_addr, + phys_addr_t max_addr, int nid); void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid); @@ -176,6 +179,14 @@ static inline void * __init memblock_virt_alloc( NUMA_NO_NODE); } +static inline void * __init memblock_virt_alloc_raw( + phys_addr_t size, phys_addr_t align) +{ + return memblock_virt_alloc_try_nid_raw(size, align, BOOTMEM_LOW_LIMIT, + BOOTMEM_ALLOC_ACCESSIBLE, + NUMA_NO_NODE); +} + static inline void * __init memblock_virt_alloc_nopanic( phys_addr_t size, phys_addr_t align) { @@ -257,6 +268,14 @@ static inline void * __init memblock_virt_alloc( return __alloc_bootmem(size, align, BOOTMEM_LOW_LIMIT); } +static inline void * __init memblock_virt_alloc_raw( + phys_addr_t size, phys_addr_t align) +{ + if (!align) + align = SMP_CACHE_BYTES; + return __alloc_bootmem_nopanic(size, align, BOOTMEM_LOW_LIMIT); +} + static inline void * __init memblock_virt_alloc_nopanic( phys_addr_t size, phys_addr_t align) { @@ -309,6 +328,14 @@ static inline void * __init memblock_virt_alloc_try_nid(phys_addr_t size, min_addr); } +static inline void * __init memblock_virt_alloc_try_nid_raw( + phys_addr_t size, phys_addr_t align, + phys_addr_t min_addr, phys_addr_t max_addr, int nid) +{ + return ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size, align, + min_addr, max_addr); +} + static inline void * __init memblock_virt_alloc_try_nid_nopanic( phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, phys_addr_t max_addr, int nid) diff --git a/mm/memblock.c b/mm/memblock.c index 91205780e6b1..1f299fb1eb08 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1327,7 +1327,6 @@ static void * __init memblock_virt_alloc_internal( return NULL; done: ptr = phys_to_virt(alloc); - memset(ptr, 0, size); /* * The min_count is set to 0 so that bootmem allocated blocks @@ -1340,6 +1339,45 @@ static void * __init memblock_virt_alloc_internal( return ptr; } +/** + * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing + * memory and without panicking + * @size: size of memory block to be allocated in bytes + * @align: alignment of the region and block's size + * @min_addr: the lower bound of the memory region from where the allocation + * is preferred (phys address) + * @max_addr: the upper bound of the memory region from where the allocation + * is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to + * allocate only from memory limited by memblock.current_limit value + * @nid: nid of the free area to find, %NUMA_NO_NODE for any node + * + * Public function, provides additional debug information (including caller + * info), if enabled. Does not zero allocated mem
Re: [PATCH 2/5] trace-cmd: Fix NULL pointer being passed to memcpy
On Sat, 12 Aug 2017 11:30:44 -0600 Michael Sartain wrote: > Signed-off-by: Michael Sartain > --- > trace-output.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/trace-output.c b/trace-output.c > index bfe6331..84b21b0 100644 > --- a/trace-output.c > +++ b/trace-output.c > @@ -929,7 +929,11 @@ tracecmd_add_option(struct tracecmd_output *handle, > free(option); > return NULL; > } > - memcpy(option->data, data, size); > + > + /* Some IDs (like TRACECMD_OPTION_TRACECLOCK) pass NULL data */ > + if (data) > + memcpy(option->data, data, size); Is this a problem, as when this happens, size should be zero. Does it crash with data=NULL and size=0, or have you seen size not be zero? -- Steve > + > list_add_tail(&option->list, &handle->options); > > return option;
Re: [PATCH 2/5] trace-cmd: Fix NULL pointer being passed to memcpy
On Mon, Oct 09, 2017 at 06:24:32PM -0400, Steven Rostedt wrote: > On Sat, 12 Aug 2017 11:30:44 -0600 > Michael Sartain wrote: > > > Signed-off-by: Michael Sartain > > --- > > trace-output.c | 6 +- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/trace-output.c b/trace-output.c > > index bfe6331..84b21b0 100644 > > --- a/trace-output.c > > +++ b/trace-output.c > > @@ -929,7 +929,11 @@ tracecmd_add_option(struct tracecmd_output *handle, > > free(option); > > return NULL; > > } > > - memcpy(option->data, data, size); > > + > > + /* Some IDs (like TRACECMD_OPTION_TRACECLOCK) pass NULL data */ > > + if (data) > > + memcpy(option->data, data, size); > > Is this a problem, as when this happens, size should be zero. Does it > crash with data=NULL and size=0, or have you seen size not be zero? I got an ASAN warning, but you are correct - the size was 0 and it did not crash.
Re: [PATCH 3/5] trace-cmd: Add ULL suffix to MISSING_EVENTS since ints shouldn't be left shifted by 31
On Sat, 12 Aug 2017 11:30:45 -0600 Michael Sartain wrote: > Signed-off-by: Michael Sartain > --- > kbuffer-parse.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kbuffer-parse.c b/kbuffer-parse.c > index 4e6e95e..dde642c 100644 > --- a/kbuffer-parse.c > +++ b/kbuffer-parse.c > @@ -24,8 +24,8 @@ > > #include "kbuffer.h" > > -#define MISSING_EVENTS (1 << 31) Actually, why not? This could also be just UL, because it's fine to shift 31, that would give us: 0x8000 And that bit is all we care for. -- Steve > -#define MISSING_STORED (1 << 30) > +#define MISSING_EVENTS (1ULL << 31) > +#define MISSING_STORED (1ULL << 30) > > #define COMMIT_MASK ((1 << 27) - 1) >
Re: [PATCH v2 0/5] Switch arm64 over to qrwlock
Hi, On 10/06/2017 08:34 AM, Will Deacon wrote: Hi all, This is version two of the patches I posted yesterday: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html I'd normally leave it longer before posting again, but Peter had a good suggestion to rework the layout of the lock word, so I wanted to post a version that follows that approach. I've updated my branch if you're after the full patch stack: git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock As before, all comments (particularly related to testing and performance) welcome! I've been doing perf comparisons with the rwlock fairness patch I posted last week on a single socket thunderx and the baseline rwlock. For most cases where the ratio of read/writers is similar (uncontended readers/writers, single writer/lots readers, etc) the absolute number of lock acquisitions is very similar (within a few percent one way or the other). In this regard both patches are light years ahead of the current arm64 rwlock. The unfairness of the current rwlock allows significantly higher lock acquisition counts (say 4x at 30Readers:1Writer) at the expense of complete writer starvation (or a ~43k:1 ratio at a 30R:1W per locktorture). This is untenable. The qrwlock does an excellent job of maintaining the ratio of reader/writer acquisitions proportional to the number of readers/writers until the total lockers exceeds the number of cores where the ratios start to far exceed the reader/writer ratios (440:1 acquisitions @ 96R:1W) In comparison the other patch tends to favor the writers more, so at a ratio of 48R/1W, the readers are only grabbing the lock at a ratio of 15:1. This flatter curve continues past the number of cores with the readers having a 48:1 advantage with 96R/1W. That said the total lock acquisition counts remain very similar (with maybe a slight advantage to the non queued patch with 1 writer and 12-30 readers) despite the writer acquiring the lock at a higher frequency. OTOH, if the number of writers is closer to the number of readers (24R:24W) then the writers have about a 1.5X bias over the readers independent of the number of reader/writers. This bias seems to be the common multiplier given a reader/writer ratio with those patches and doesn't account for possible single thread starvation. Of course, I've been running other tests as well and the system seems to be behaving as expected (likely better than the rwlock patches under stress). I will continue to test this on a couple other platforms. In the meantime: Tested-by: Jeremy Linton Cheers, Will --->8 Will Deacon (5): kernel/locking: Use struct qrwlock instead of struct __qrwlock locking/atomic: Add atomic_cond_read_acquire kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock arm64: locking: Move rwlock implementation over to qrwlocks kernel/locking: Prevent slowpath writers getting held up by fastpath arch/arm64/Kconfig | 17 arch/arm64/include/asm/Kbuild | 1 + arch/arm64/include/asm/spinlock.h | 164 +--- arch/arm64/include/asm/spinlock_types.h | 6 +- include/asm-generic/atomic-long.h | 3 + include/asm-generic/qrwlock.h | 20 +--- include/asm-generic/qrwlock_types.h | 15 ++- include/linux/atomic.h | 4 + kernel/locking/qrwlock.c| 83 +++- 9 files changed, 58 insertions(+), 255 deletions(-)
Re: [PATCH 4/5] trace-cmd: Use unsigned values in Hsieh's trace_hash fast hash function
On Sat, 12 Aug 2017 11:30:46 -0600 Michael Sartain wrote: > Signed int values were being used where the original code used uint32_t types: > > http://www.azillionmonkeys.com/qed/hash.html > > Right shifting negative int values has implementation-defined and left > shifting > has undefined behavior. > > On my platform (x86_64) right shifting was doing sign extension and filling > high bits with 1s, which is different than the original algorithm. > Heh, nice catch. Although the hash was never used for anything too important. Mostly just colors of the graph. > Signed-off-by: Michael Sartain > --- > trace-hash-local.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/trace-hash-local.h b/trace-hash-local.h > index b2a1002..b3f9b06 100644 > --- a/trace-hash-local.h > +++ b/trace-hash-local.h > @@ -22,7 +22,7 @@ > > static inline unsigned int trace_hash(int val) > { > - int hash, tmp; > + unsigned int hash, tmp; > > hash = 12546869;/* random prime */ > > @@ -34,7 +34,7 @@ static inline unsigned int trace_hash(int val) >*/ > > hash += (val & 0x); > - tmp = (val >> 16) ^ hash; > + tmp = ((unsigned int)val >> 16) ^ hash; Why not just have val be unsigned int, and not do the typecast? -- Steve > hash = (hash << 16) ^ tmp; > hash += hash >> 11; >
Re: [PATCH 5/5] trace-cmd: Remove unused view_width variable
On Sat, 12 Aug 2017 11:30:47 -0600 Michael Sartain wrote: This is an annoying warning, but I've kept from doing this because it reminds me that I have an idea to actually use that variable someday. But I've just been procrastinating on doing that. ;-) -- Steve > Signed-off-by: Michael Sartain > --- > trace-graph.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/trace-graph.c b/trace-graph.c > index 1db342f..2c49549 100644 > --- a/trace-graph.c > +++ b/trace-graph.c > @@ -1263,7 +1263,6 @@ static void draw_info_box(struct graph_info *ginfo, > const gchar *buffer, > gint width, height; > GdkPixmap *pix; > static GdkGC *pix_bg; > - gint view_width; > gint view_start; > > if (!pix_bg) { > @@ -1284,7 +1283,6 @@ static void draw_info_box(struct graph_info *ginfo, > const gchar *buffer, > height += PLOT_BOARDER * 2; > > view_start = gtk_adjustment_get_value(ginfo->hadj); > - view_width = gtk_adjustment_get_page_size(ginfo->hadj); > if (x > view_start + width) > x -= width; >
Re: [PATCH 2/5] trace-cmd: Fix NULL pointer being passed to memcpy
On Mon, 9 Oct 2017 16:27:10 -0600 Michael Sartain wrote: > > > - memcpy(option->data, data, size); > > > + > > > + /* Some IDs (like TRACECMD_OPTION_TRACECLOCK) pass NULL data */ > > > + if (data) > > > + memcpy(option->data, data, size); > > > > Is this a problem, as when this happens, size should be zero. Does it > > crash with data=NULL and size=0, or have you seen size not be zero? > > I got an ASAN warning, but you are correct - the size was 0 and it did > not crash. OK, but it's almost like dividing zero from zero. Can you send another patch, but this time check if (size) instead of if (data). -- Steve
[PATCH 1/3] Input: factor out and export input_device_id matching code
Factor out and export input_match_device_id() so that modules may use it. It will be needed by joydev to blacklist accelerometers in composite devices. Signed-off-by: Dmitry Torokhov --- drivers/input/input.c | 83 +++ include/linux/input.h | 3 ++ 2 files changed, 41 insertions(+), 45 deletions(-) diff --git a/drivers/input/input.c b/drivers/input/input.c index f157ad5784a1..a8d4f1653588 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -938,58 +938,51 @@ int input_set_keycode(struct input_dev *dev, } EXPORT_SYMBOL(input_set_keycode); +bool input_match_device_id(const struct input_dev *dev, + const struct input_device_id *id) +{ + if (id->flags & INPUT_DEVICE_ID_MATCH_BUS) + if (id->bustype != dev->id.bustype) + return false; + + if (id->flags & INPUT_DEVICE_ID_MATCH_VENDOR) + if (id->vendor != dev->id.vendor) + return false; + + if (id->flags & INPUT_DEVICE_ID_MATCH_PRODUCT) + if (id->product != dev->id.product) + return false; + + if (id->flags & INPUT_DEVICE_ID_MATCH_VERSION) + if (id->version != dev->id.version) + return false; + + if (!bitmap_subset(id->evbit, dev->evbit, EV_MAX) || + !bitmap_subset(id->keybit, dev->keybit, KEY_MAX) || + !bitmap_subset(id->relbit, dev->relbit, REL_MAX) || + !bitmap_subset(id->absbit, dev->absbit, ABS_MAX) || + !bitmap_subset(id->mscbit, dev->mscbit, MSC_MAX) || + !bitmap_subset(id->ledbit, dev->ledbit, LED_MAX) || + !bitmap_subset(id->sndbit, dev->sndbit, SND_MAX) || + !bitmap_subset(id->ffbit, dev->ffbit, FF_MAX) || + !bitmap_subset(id->swbit, dev->swbit, SW_MAX)) { + return false; + } + + return true; +} +EXPORT_SYMBOL(input_match_device_id); + static const struct input_device_id *input_match_device(struct input_handler *handler, struct input_dev *dev) { const struct input_device_id *id; for (id = handler->id_table; id->flags || id->driver_info; id++) { - - if (id->flags & INPUT_DEVICE_ID_MATCH_BUS) - if (id->bustype != dev->id.bustype) - continue; - - if (id->flags & INPUT_DEVICE_ID_MATCH_VENDOR) - if (id->vendor != dev->id.vendor) - continue; - - if (id->flags & INPUT_DEVICE_ID_MATCH_PRODUCT) - if (id->product != dev->id.product) - continue; - - if (id->flags & INPUT_DEVICE_ID_MATCH_VERSION) - if (id->version != dev->id.version) - continue; - - if (!bitmap_subset(id->evbit, dev->evbit, EV_MAX)) - continue; - - if (!bitmap_subset(id->keybit, dev->keybit, KEY_MAX)) - continue; - - if (!bitmap_subset(id->relbit, dev->relbit, REL_MAX)) - continue; - - if (!bitmap_subset(id->absbit, dev->absbit, ABS_MAX)) - continue; - - if (!bitmap_subset(id->mscbit, dev->mscbit, MSC_MAX)) - continue; - - if (!bitmap_subset(id->ledbit, dev->ledbit, LED_MAX)) - continue; - - if (!bitmap_subset(id->sndbit, dev->sndbit, SND_MAX)) - continue; - - if (!bitmap_subset(id->ffbit, dev->ffbit, FF_MAX)) - continue; - - if (!bitmap_subset(id->swbit, dev->swbit, SW_MAX)) - continue; - - if (!handler->match || handler->match(handler, dev)) + if (input_match_device_id(dev, id) && + (!handler->match || handler->match(handler, dev))) { return id; + } } return NULL; diff --git a/include/linux/input.h b/include/linux/input.h index 9b03f34807a7..066093fb8b3c 100644 --- a/include/linux/input.h +++ b/include/linux/input.h @@ -477,6 +477,9 @@ int input_get_keycode(struct input_dev *dev, struct input_keymap_entry *ke); int input_set_keycode(struct input_dev *dev, const struct input_keymap_entry *ke); +bool input_match_device_id(const struct input_dev *dev, + const struct input_device_id *id); + void input_enable_softrepeat(struct input_dev *dev, int delay, int period); extern struct class input_class; -- 2.14.2.920.gcf0c67979c-goog
[PATCH 3/3] Input: joydev - blacklist ds3/ds4/udraw motion sensors
From: Roderick Colenbrander Introduce a device table used for blacklisting devices. We currently blacklist the motion sensor subdevice of THQ Udraw and Sony ds3/ds4. Signed-off-by: Roderick Colenbrander Signed-off-by: Dmitry Torokhov --- drivers/input/joydev.c | 70 +- 1 file changed, 64 insertions(+), 6 deletions(-) diff --git a/drivers/input/joydev.c b/drivers/input/joydev.c index 29d677c714d2..7b29a8944039 100644 --- a/drivers/input/joydev.c +++ b/drivers/input/joydev.c @@ -747,6 +747,68 @@ static void joydev_cleanup(struct joydev *joydev) input_close_device(handle); } +/* + * These codes are copied from from hid-ids.h, unfortunately there is no common + * usb_ids/bt_ids.h header. + */ +#define USB_VENDOR_ID_SONY 0x054c +#define USB_DEVICE_ID_SONY_PS3_CONTROLLER 0x0268 +#define USB_DEVICE_ID_SONY_PS4_CONTROLLER 0x05c4 +#define USB_DEVICE_ID_SONY_PS4_CONTROLLER_20x09cc +#define USB_DEVICE_ID_SONY_PS4_CONTROLLER_DONGLE 0x0ba0 + +#define USB_VENDOR_ID_THQ 0x20d6 +#define USB_DEVICE_ID_THQ_PS3_UDRAW0xcb17 + +#define ACCEL_DEV(vnd, prd)\ + { \ + .flags = INPUT_DEVICE_ID_MATCH_VENDOR | \ + INPUT_DEVICE_ID_MATCH_PRODUCT | \ + INPUT_DEVICE_ID_MATCH_PROPBIT, \ + .vendor = (vnd),\ + .product = (prd), \ + .propbit = { BIT_MASK(INPUT_PROP_ACCELEROMETER) }, \ + } + +static const struct input_device_id joydev_blacklist[] = { + /* Avoid touchpads and touchscreens */ + { + .flags = INPUT_DEVICE_ID_MATCH_EVBIT | + INPUT_DEVICE_ID_MATCH_KEYBIT, + .evbit = { BIT_MASK(EV_KEY) }, + .keybit = { [BIT_WORD(BTN_TOUCH)] = BIT_MASK(BTN_TOUCH) }, + }, + /* Avoid tablets, digitisers and similar devices */ + { + .flags = INPUT_DEVICE_ID_MATCH_EVBIT | + INPUT_DEVICE_ID_MATCH_KEYBIT, + .evbit = { BIT_MASK(EV_KEY) }, + .keybit = { [BIT_WORD(BTN_DIGI)] = BIT_MASK(BTN_DIGI) }, + }, + /* Disable accelerometers on composite devices */ + ACCEL_DEV(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS3_CONTROLLER), + ACCEL_DEV(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER), + ACCEL_DEV(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER_2), + ACCEL_DEV(USB_VENDOR_ID_SONY, USB_DEVICE_ID_SONY_PS4_CONTROLLER_DONGLE), + ACCEL_DEV(USB_VENDOR_ID_THQ, USB_DEVICE_ID_THQ_PS3_UDRAW), + { /* sentinel */ } +}; + +static bool joydev_dev_is_blacklisted(struct input_dev *dev) +{ + const struct input_device_id *id; + + for (id = joydev_blacklist; id->flags; id++) { + if (input_match_device_id(dev, id)) { + dev_dbg(&dev->dev, + "joydev: blacklisting '%s'\n", dev->name); + return true; + } + } + + return false; +} + static bool joydev_dev_is_absolute_mouse(struct input_dev *dev) { DECLARE_BITMAP(jd_scratch, KEY_CNT); @@ -807,12 +869,8 @@ static bool joydev_dev_is_absolute_mouse(struct input_dev *dev) static bool joydev_match(struct input_handler *handler, struct input_dev *dev) { - /* Avoid touchpads and touchscreens */ - if (test_bit(EV_KEY, dev->evbit) && test_bit(BTN_TOUCH, dev->keybit)) - return false; - - /* Avoid tablets, digitisers and similar devices */ - if (test_bit(EV_KEY, dev->evbit) && test_bit(BTN_DIGI, dev->keybit)) + /* Disable blacklisted devices */ + if (joydev_dev_is_blacklisted(dev)) return false; /* Avoid absolute mice */ -- 2.14.2.920.gcf0c67979c-goog
[PATCH 2/3] Input: allow matching device IDs on property bits
Let's allow matching input devices on their property bits, both in-kernel and when generating module aliases. Signed-off-by: Dmitry Torokhov --- drivers/input/input.c | 3 ++- include/linux/input.h | 4 include/linux/mod_devicetable.h | 3 +++ scripts/mod/devicetable-offsets.c | 1 + scripts/mod/file2alias.c | 6 +- 5 files changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/input/input.c b/drivers/input/input.c index a8d4f1653588..58c0b7344fdf 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -965,7 +965,8 @@ bool input_match_device_id(const struct input_dev *dev, !bitmap_subset(id->ledbit, dev->ledbit, LED_MAX) || !bitmap_subset(id->sndbit, dev->sndbit, SND_MAX) || !bitmap_subset(id->ffbit, dev->ffbit, FF_MAX) || - !bitmap_subset(id->swbit, dev->swbit, SW_MAX)) { + !bitmap_subset(id->swbit, dev->swbit, SW_MAX) || + !bitmap_subset(id->propbit, dev->propbit, INPUT_PROP_MAX)) { return false; } diff --git a/include/linux/input.h b/include/linux/input.h index 066093fb8b3c..9e27e6d1d7d2 100644 --- a/include/linux/input.h +++ b/include/linux/input.h @@ -238,6 +238,10 @@ struct input_dev { #error "SW_MAX and INPUT_DEVICE_ID_SW_MAX do not match" #endif +#if INPUT_PROP_MAX != INPUT_DEVICE_ID_PROP_MAX +#error "INPUT_PROP_MAX and INPUT_DEVICE_ID_PROP_MAX do not match" +#endif + #define INPUT_DEVICE_ID_MATCH_DEVICE \ (INPUT_DEVICE_ID_MATCH_BUS | INPUT_DEVICE_ID_MATCH_VENDOR | INPUT_DEVICE_ID_MATCH_PRODUCT) #define INPUT_DEVICE_ID_MATCH_DEVICE_AND_VERSION \ diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h index 3f74ef2281e8..72f0b7f19c59 100644 --- a/include/linux/mod_devicetable.h +++ b/include/linux/mod_devicetable.h @@ -293,6 +293,7 @@ struct pcmcia_device_id { #define INPUT_DEVICE_ID_SND_MAX0x07 #define INPUT_DEVICE_ID_FF_MAX 0x7f #define INPUT_DEVICE_ID_SW_MAX 0x0f +#define INPUT_DEVICE_ID_PROP_MAX 0x1f #define INPUT_DEVICE_ID_MATCH_BUS 1 #define INPUT_DEVICE_ID_MATCH_VENDOR 2 @@ -308,6 +309,7 @@ struct pcmcia_device_id { #define INPUT_DEVICE_ID_MATCH_SNDBIT 0x0400 #define INPUT_DEVICE_ID_MATCH_FFBIT0x0800 #define INPUT_DEVICE_ID_MATCH_SWBIT0x1000 +#define INPUT_DEVICE_ID_MATCH_PROPBIT 0x2000 struct input_device_id { @@ -327,6 +329,7 @@ struct input_device_id { kernel_ulong_t sndbit[INPUT_DEVICE_ID_SND_MAX / BITS_PER_LONG + 1]; kernel_ulong_t ffbit[INPUT_DEVICE_ID_FF_MAX / BITS_PER_LONG + 1]; kernel_ulong_t swbit[INPUT_DEVICE_ID_SW_MAX / BITS_PER_LONG + 1]; + kernel_ulong_t propbit[INPUT_DEVICE_ID_PROP_MAX / BITS_PER_LONG + 1]; kernel_ulong_t driver_info; }; diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c index e4d90e50f6fe..812657ab5aa3 100644 --- a/scripts/mod/devicetable-offsets.c +++ b/scripts/mod/devicetable-offsets.c @@ -105,6 +105,7 @@ int main(void) DEVID_FIELD(input_device_id, sndbit); DEVID_FIELD(input_device_id, ffbit); DEVID_FIELD(input_device_id, swbit); + DEVID_FIELD(input_device_id, propbit); DEVID(eisa_device_id); DEVID_FIELD(eisa_device_id, sig); diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c index 29d6699d5a06..bc25898f6df0 100644 --- a/scripts/mod/file2alias.c +++ b/scripts/mod/file2alias.c @@ -761,7 +761,7 @@ static void do_input(char *alias, sprintf(alias + strlen(alias), "%X,*", i); } -/* input:b0v0p0e0-eXkXrXaXmXlXsXfXwX where X is comma-separated %02X. */ +/* input:b0v0p0e0-eXkXrXaXmXlXsXfXwXprX where X is comma-separated %02X. */ static int do_input_entry(const char *filename, void *symval, char *alias) { @@ -779,6 +779,7 @@ static int do_input_entry(const char *filename, void *symval, DEF_FIELD_ADDR(symval, input_device_id, sndbit); DEF_FIELD_ADDR(symval, input_device_id, ffbit); DEF_FIELD_ADDR(symval, input_device_id, swbit); + DEF_FIELD_ADDR(symval, input_device_id, propbit); sprintf(alias, "input:"); @@ -816,6 +817,9 @@ static int do_input_entry(const char *filename, void *symval, sprintf(alias + strlen(alias), "w*"); if (flags & INPUT_DEVICE_ID_MATCH_SWBIT) do_input(alias, *swbit, 0, INPUT_DEVICE_ID_SW_MAX); + sprintf(alias + strlen(alias), "pr*"); + if (flags & INPUT_DEVICE_ID_MATCH_PROPBIT) + do_input(alias, *propbit, 0, INPUT_DEVICE_ID_PROP_MAX); return 1; } ADD_TO_DEVTABLE("input", input_device_id, do_input_entry); -- 2.14.2.920.gcf0c67979c-goog
Re: [PATCH] bitfield: Use __ffs64(x) to fix missing __ffsdi2()
Hi Geert, [auto build test ERROR on linus/master] [also build test ERROR on v4.14-rc4] [cannot apply to next-20170929] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Geert-Uytterhoeven/bitfield-Use-__ffs64-x-to-fix-missing-__ffsdi2/20171010-050009 config: x86_64-allyesdebian (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from include/uapi/linux/posix_types.h:4, from include/uapi/linux/types.h:13, from include/linux/types.h:5, from include/linux/list.h:4, from include/linux/module.h:9, from net/mac80211/sta_info.c:12: In function 'sta_stats_decode_rate', inlined from 'sta_set_rate_info_rx' at net/mac80211/sta_info.c:2016:2, inlined from 'sta_set_sinfo' at net/mac80211/sta_info.c:2218:7: >> include/linux/compiler.h:576:38: error: call to '__compiletime_assert_1971' >> declared with attribute error: BUILD_BUG_ON failed: ((~0UL) - (1UL << >> (8)) + 1) & (~0UL >> (64 - 1 - (11) + (1ULL << __ffs64~0UL) - (1UL >> << (8)) + 1) & (~0UL >> (64 - 1 - (11))) & ((~0UL) - (1UL << (8)) + >> 1) & (~0UL >> (64 - 1 - (11) + (1ULL << __ffs64~0UL) - (1UL << (8)) >> + 1) & (~0UL >> (64 - 1 - (11))) - 1)) != 0 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:556:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^~ include/linux/compiler.h:576:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/build_bug.h:70:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~ >> include/linux/build_bug.h:19:2: note: in expansion of macro 'BUILD_BUG_ON' BUILD_BUG_ON(((n) & ((n) - 1)) != 0) ^~~~ >> include/linux/bitfield.h:62:3: note: in expansion of macro >> '__BUILD_BUG_ON_NOT_POWER_OF_2' __BUILD_BUG_ON_NOT_POWER_OF_2((_mask) + \ ^ >> include/linux/bitfield.h:103:3: note: in expansion of macro >> '__BF_FIELD_CHECK' __BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \ ^~~~ >> net/mac80211/sta_info.h:764:32: note: in expansion of macro 'FIELD_GET' #define STA_STATS_GET(_n, _v) FIELD_GET(STA_STATS_FIELD_ ## _n, _v) ^ >> net/mac80211/sta_info.c:1971:14: note: in expansion of macro 'STA_STATS_GET' rinfo->bw = STA_STATS_GET(BW, rate); ^ include/linux/compiler.h:576:38: error: call to '__compiletime_assert_1973' declared with attribute error: BUILD_BUG_ON failed: ((~0UL) - (1UL << (13)) + 1) & (~0UL >> (64 - 1 - (15) + (1ULL << __ffs64~0UL) - (1UL << (13)) + 1) & (~0UL >> (64 - 1 - (15))) & ((~0UL) - (1UL << (13)) + 1) & (~0UL >> (64 - 1 - (15) + (1ULL << __ffs64~0UL) - (1UL << (13)) + 1) & (~0UL >> (64 - 1 - (15))) - 1)) != 0 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:556:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^~ include/linux/compiler.h:576:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/build_bug.h:70:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~ >> include/linux/build_bug.h:19:2: note: in expansion of macro 'BUILD_BUG_ON' BUILD_BUG_ON(((n) & ((n) - 1)) != 0) ^~~~ >> include/linux/bitfield.h:62:3: note: in expansion of macro >> '__BUILD_BUG_ON_NOT_POWER_OF_2' __BUILD_BUG_ON_NOT_POWER_OF_2((_mask) + \ ^ >> include/linux/bitfield.h:103:3: note: in expansion of macro >> '__BF_FIELD_CHECK'
Re: [PATCH] mm/page-writeback.c: fix bug caused by disable periodic writeback
On Sat, 7 Oct 2017 06:58:04 +0800 Yafang Shao wrote: > After disable periodic writeback by writing 0 to > dirty_writeback_centisecs, the handler wb_workfn() will not be > entered again until the dirty background limit reaches or > sync syscall is executed or no enough free memory available or > vmscan is triggered. > So the periodic writeback can't be enabled by writing a non-zero > value to dirty_writeback_centisecs > As it can be disabled by sysctl, it should be able to enable by > sysctl as well. > > ... > > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1972,7 +1972,13 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb) > int dirty_writeback_centisecs_handler(struct ctl_table *table, int write, > void __user *buffer, size_t *length, loff_t *ppos) > { > - proc_dointvec(table, write, buffer, length, ppos); > + unsigned int old_interval = dirty_writeback_interval; > + int ret; > + > + ret = proc_dointvec(table, write, buffer, length, ppos); > + if (!ret && !old_interval && dirty_writeback_interval) > + wakeup_flusher_threads(0, WB_REASON_PERIODIC); > + > return 0; We could do with a code comment here, explaining why this code exists. And... I'm not sure it works correctly? For example, if a device doesn't presently have bdi_has_dirty_io() then wakeup_flusher_threads() will skip it and the periodic writeback still won't be started? (why does the dirty_writeback_interval==0 special case exist, btw? Seems to be a strange thing to do). (and what happens if the interval was set to 1 hour and the user rewrites that to 1 second? Does that change take 1 hour to take effect?)
Re: Warning on __wait_on_bit
Hi, Tom On Sat, Sep 30, 2017 at 1:55 AM, Tom Green wrote: > Hi Tejun Cong Shaohua, > > I just observed the same problem in https://lkml.org/lkml/2016/12/9/373. It > seems to be exactly same as decribed in https://lwn.net/Articles/628628/. > Any progress? Thanks. Sorry for the delay, I was on vacation. Could you try the attached patch (against the latest Linus tree)? I did a quick boot test, it runs well. Thanks! diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 134d9f560240..211a773cb39f 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -71,23 +71,28 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr) return nfs_fileid_to_ino_t(fattr->fileid); } -static int nfs_wait_killable(int mode) +static int nfs_wait_killable(int mode, struct wait_queue_entry *wait) { - freezable_schedule_unsafe(); + freezer_do_not_count(); + if (wait) + wait_woken(wait, mode, MAX_SCHEDULE_TIMEOUT); + else + schedule(); + freezer_count_unsafe(); if (signal_pending_state(mode, current)) return -ERESTARTSYS; return 0; } -int nfs_wait_bit_killable(struct wait_bit_key *key, int mode) +int nfs_wait_bit_killable(struct wait_bit_key *key, int mode, struct wait_queue_entry *wait) { - return nfs_wait_killable(mode); + return nfs_wait_killable(mode, wait); } EXPORT_SYMBOL_GPL(nfs_wait_bit_killable); int nfs_wait_atomic_killable(atomic_t *p) { - return nfs_wait_killable(TASK_KILLABLE); + return nfs_wait_killable(TASK_KILLABLE, NULL); } /** diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 5bdf952f414b..a9144d311a50 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -386,7 +386,7 @@ extern void nfs_clear_inode(struct inode *); extern void nfs_evict_inode(struct inode *); void nfs_zap_acl_cache(struct inode *inode); extern bool nfs_check_cache_invalid(struct inode *, unsigned long); -extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode); +extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode, struct wait_queue_entry *wait); extern int nfs_wait_atomic_killable(atomic_t *p); /* super.c */ diff --git a/include/linux/freezer.h b/include/linux/freezer.h index dd03e837ebb7..1d197994b181 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -272,6 +272,7 @@ static inline bool try_to_freeze(void) { return false; } static inline void freezer_do_not_count(void) {} static inline void freezer_count(void) {} +static inline void freezer_count_unsafe(void) {} static inline int freezer_should_skip(struct task_struct *p) { return 0; } static inline void set_freezable(void) {} diff --git a/include/linux/sched.h b/include/linux/sched.h index 26a7df4e558c..1fd387e6c554 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -184,6 +184,9 @@ extern void io_schedule_finish(int token); extern long io_schedule_timeout(long timeout); extern void io_schedule(void); +struct wait_queue_entry; +extern void io_wait_timeout(struct wait_queue_entry *wait, int mode, long timeout); + /** * struct prev_cputime - snapshot of system and user cputime * @utime: time spent in user mode diff --git a/include/linux/wait.h b/include/linux/wait.h index 87c4641023fb..5a320e1c215e 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -987,6 +987,7 @@ void finish_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_en long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout); int woken_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key); int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key); +int autoremove_woken_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key); #define DEFINE_WAIT_FUNC(name, function) \ struct wait_queue_entry name = { \ diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h index 12b26660d7e9..f7a98ba7100c 100644 --- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -24,7 +24,7 @@ struct wait_bit_queue_entry { #define __WAIT_ATOMIC_T_KEY_INITIALIZER(p) \ { .flags = p, .bit_nr = WAIT_ATOMIC_T_BIT_NR, } -typedef int wait_bit_action_f(struct wait_bit_key *key, int mode); +typedef int wait_bit_action_f(struct wait_bit_key *key, int mode, struct wait_queue_entry *wait); void __wake_up_bit(struct wait_queue_head *wq_head, void *word, int bit); int __wait_on_bit(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_entry, wait_bit_action_f *action, unsigned int mode); int __wait_on_bit_lock(struct wait_queue_head *wq_head, struct wait_bit_queue_entry *wbq_entry, wait_bit_action_f *action, unsigned int mode); @@ -50,10 +50,10 @@ int wake_bit_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync
[PATCH v2] Userfaultfd: Add description for UFFD_FEATURE_SIGBUS
Userfaultfd feature UFFD_FEATURE_SIGBUS was merged recently and should be available in Linux 4.14 release. This patch is for the manpage changes documenting this API. Documents the following commit: commit 2d6d6f5a09a96cc1fec7ed992b825e05f64cb50e Author: Prakash Sangappa Date: Wed Sep 6 16:23:39 2017 -0700 mm: userfaultfd: add feature to request for a signal delivery Signed-off-by: Prakash Sangappa --- v2: Incorporated review feedback changes. --- man2/ioctl_userfaultfd.2 | 9 + man2/userfaultfd.2 | 23 +++ 2 files changed, 32 insertions(+) diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2 index 60fd29b..32f0744 100644 --- a/man2/ioctl_userfaultfd.2 +++ b/man2/ioctl_userfaultfd.2 @@ -196,6 +196,15 @@ with the flag set, .BR memfd_create (2), and so on. +.TP +.B UFFD_FEATURE_SIGBUS +Since Linux 4.14, If this feature bit is set, no page-fault events +.B (UFFD_EVENT_PAGEFAULT) +will be delivered, instead a +.B SIGBUS +signal will be sent to the faulting process. Applications using this +feature will not require the use of a userfaultfd monitor for processing +memory accesses to the regions registered with userfaultfd. .IP The returned .I ioctls diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 index 1741ee3..3c5b9c0 100644 --- a/man2/userfaultfd.2 +++ b/man2/userfaultfd.2 @@ -172,6 +172,29 @@ or .BR ioctl (2) operations to resolve the page fault. .PP +Starting from Linux 4.14, if application sets +.B UFFD_FEATURE_SIGBUS +feature bit using +.B UFFDIO_API +.BR ioctl (2), +no page fault notification will be forwarded to +the user-space, instead a +.B SIGBUS +signal is delivered to the faulting process. With this feature, +userfaultfd can be used for robustness purpose to simply catch +any access to areas within the registered address range that do not +have pages allocated, without having to listen to userfaultfd events. +No userfaultfd monitor will be required for dealing with such memory +accesses. For example, this feature can be useful for applications that +want to prevent the kernel from automatically allocating pages and filling +holes in sparse files when the hole is accessed thru mapped address. +.PP +The +.B UFFD_FEATURE_SIGBUS +feature is implicitly inherited through fork() if used in combination with +.BR UFFD_FEATURE_FORK . + +.PP Details of the various .BR ioctl (2) operations can be found in -- 2.7.4
Re: [PATCH 0/3] arm64: cpuinfo: make /proc/cpuinfo more human-readable
On 09/27/2017 04:34 AM, Mark Rutland wrote: > On Tue, Sep 26, 2017 at 04:23:21PM -0600, Al Stone wrote: >> As ARMv8 servers get deployed, I keep getting the same set of questions >> from end-users of those systems: what do all the hex numbers mean in >> /proc/cpuinfo and could you make them so I don't have to carry a cheat >> sheet with me all the time? > > I appreciate that /proc/cpuinfo can be opaque to end users, but I do not > believe that this is the right solution to that problem. > > There are a number of issues stemming from the face that /proc/cpuinfo > is ill-defined and overused for a number of cases. Changes to it almost > certainly violate brittle de-facto ABI details people are relying on, > and there's a very long tail on fallout resulting from this. In > addition, many niceties come at an excessive maintenance cost, and are > simply unreliable as an ABI. Instead of dealing with the replies to the specific patches, I'd like to deal with the real gist of the matter first -- details can be worked out later. While I don't disagree -- cpuinfo is enormously fragile -- what _is_ a good solution, then? Right now, things are not terribly useful. Yes, it is true /sys/firmware/dmi has info in it, but again it is all numeric (or in a file named 'raw'). Any one who wishes to read the values will need to put some interpretation on them somehow. My personal preference is that the kernel control that interpretation instead of random admin tools scattered over a raft of different companies. > So, as with all other patches modifying /proc/cpuinfo, I must NAK this > series. Hrm. I suggest that policy needs to be rethought. I understand the reasoning, but it puts my end-users in a very difficult position; they now have to justify modifying management software that is already in operation in order to purchase and integrate ARM servers. And while I really don't want distro-specific code, I'd have to consider it as a possibility. To be completely fair, though, I'm not completely fixated on cpuinfo as the only possible answer. It has been the most requested, however. > For the MPIDR and product info, I think we can expose this in a more > structured way (e.g. under sysfs). IIUC the product info is all derived > from DMI -- do we not expose that already? Agreed; the MPIDR I think could just as easily be in /sys/devices/cpu/* as a hex value. It doesn't have the value to end-users that the other info does, either -- for debugging, definitely, but not necessarily sysadmin. I'll put something together for this. The DMI info is in sysfs but it is not in a fashion normally consumed, hence the questions from end-users, in a way. They're not expecting to have to do the interpretation, and they are expecting the platform to be able to tell them what "Model: 2" means. > For the human-readable names I must NAK such patches. This is an > extremely brittle ABI that cannot be forwards compatible, and comes with > hilarious political problems. This should be managed in userspace alone. > > I thought tools like lscpu and lshw were intended to gather such > information for users. What are these missing today? All of the above human-readable parts -- lscpu reads model information from /proc/cpuinfo, for example, and would require new code to read this info from somewhere else, and then figure out how to interpret it. All I get is a lovely '2' for 'model' :). We could obviously change the userspace tools, but I'm guessing it just moves the political hilarity from the kernel to userspace, correct? It also runs the risk of even greater silliness, like perhaps lscpu gets a product name right, but lshw calls it something different, and so on. The other approach that occurs to me is to blend what x86 does and sysfs; i.e., would it be acceptable to modify cputype.h to include comments (as x86 does) that are then munged by a script somewhat like arch/x86/kernel/cpu/mkcapflags.sh that then produces a header file with proper human-readable content that can then be exposed in sysfs? I don't think we can ever remove the brittleness completely, but perhaps this would at least mitigate it. This would still require userspace modifications, too, but they could at least rely on a uniform source of data -- and perhaps a single place to focus when hilarity ensues :). I'll bodge some patches together and maybe start over as an RFC, if there's a chance this would be accepted. -- ciao, al --- Al Stone Software Engineer Red Hat, Inc. a...@redhat.com ---
[PATCH v3 4/5] uio: uio_pdrv_genirq: Add st,stm32f100 to dt compatible list
The stm32f100 is a general purpose micro controller. Allow users to interact with one as a uio device. Signed-off-by: Chris Packham --- Changes in v3: - (replaces "uio: add default compatible string to uio_pdrv_genirq") - remove superfluous change suggested by checkpatch.pl - As suggested by Rob, add specific dt bindings for the hardware I'm supporting. uio_pdrv_genirq is one potential implementor of a compatible driver. drivers/uio/uio_pdrv_genirq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/uio/uio_pdrv_genirq.c b/drivers/uio/uio_pdrv_genirq.c index f598ecddc8a7..532de98034c2 100644 --- a/drivers/uio/uio_pdrv_genirq.c +++ b/drivers/uio/uio_pdrv_genirq.c @@ -253,11 +253,12 @@ static const struct dev_pm_ops uio_pdrv_genirq_dev_pm_ops = { #ifdef CONFIG_OF static struct of_device_id uio_of_genirq_match[] = { + { .compatible = "st,stm32f100" }, { /* This is filled with module_parm */ }, { /* Sentinel */ }, }; MODULE_DEVICE_TABLE(of, uio_of_genirq_match); -module_param_string(of_id, uio_of_genirq_match[0].compatible, 128, 0); +module_param_string(of_id, uio_of_genirq_match[1].compatible, 128, 0); MODULE_PARM_DESC(of_id, "Openfirmware id of the device to be handled by uio"); #endif -- 2.14.2
[PATCH v3 0/5] using uio_pdrv_genirq without module param
I found myself about to add a driver that was a sub-optimal clone of uio_pdrv_genirq the only difference was that I didn't want to modify the args passed to the kernel by my bootloader. If uio_pdrv_genirq had a default of_match entry I could simply use that. This series attempts to implement this. I wonder if it is worth having a catch-all compat string for this driver "generic-dev" or something. People seem to be ok with spidev not having any such fallback so perhaps there is no-need (other than the 2 devices I want to support with userspace drivers). Changes in v2: - added a better commit message to 1/2 - remove bogus checkpatch fix in 2/2 Changes in v3: - As suggested by Rob document the bindings for the hardware and treat uio_pdrv_genirq.c as one potential implementer of a compatible driver. Following the model used by spidev.c - Patches 1-3 replace patch 1 from v2 of this series - Patches 4-5 replace patch 2 from v2 Chris Packham (5): uio: dt-bindings: document existing binding for uio-pdrv-genirq uio: dt-bindings: Add binding for "st,stm32f100" uio: dt-bindings: Add binding for "marvell,88e2040" uio: uio_pdrv_genirq: Add st,stm32f100 to dt compatible list uio: uio_pdrv_genirq: Add "marvell,88e2040" to dt compatible list .../devicetree/bindings/uio/uio-pdrv-genirq.txt| 28 ++ drivers/uio/uio_pdrv_genirq.c | 4 +++- 2 files changed, 31 insertions(+), 1 deletion(-) create mode 100644 Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt -- 2.14.2
[PATCH v3 3/5] uio: dt-bindings: Add binding for "marvell,88e2040"
The marvell,88e2040 is a N-BaseT Ethernet PHY which requires a userspace driver. The only kernel presence is an interrupt handler which uio_pdrv_genirq can provide. Signed-off-by: Chris Packham --- Changes in v3: - (replaces "uio: dt-bindings: document binding for uio-pdrv-genirq") - split the bindings up into the devices I'm actually wanting to support. The second is a marvell N-BaseT Ethernet PHY Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt b/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt index fb4c1b5059f2..7b28bead4016 100644 --- a/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt +++ b/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt @@ -1,8 +1,8 @@ *Generic UIO platform driver with interrupts Required properties: -- compatible: Should be a value set with the of_id module parameter or the - built in value "st,stm32f100". +- compatible: Should be a value set with the of_id module parameter or one of + the built in values "st,stm32f100", "marvell,88e2040". Optional properties: - interrupts: Should contain the interrupt to be associated with this device -- 2.14.2
[PATCH v3 2/5] uio: dt-bindings: Add binding for "st,stm32f100"
The stm32f100 is a general purpose micro controller. Document a binding that allows a user-space driver to be implemented for these devices. Signed-off-by: Chris Packham --- Changes in v3: - (replaces "uio: dt-bindings: document binding for uio-pdrv-genirq") - split the bindings up into the devices I'm actually wanting to support. The first is a stm32f100 micro controller. Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt b/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt index 5a92b1f8825f..fb4c1b5059f2 100644 --- a/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt +++ b/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt @@ -1,7 +1,8 @@ *Generic UIO platform driver with interrupts Required properties: -- compatible: Should be a value set with the of_id module parameter. +- compatible: Should be a value set with the of_id module parameter or the + built in value "st,stm32f100". Optional properties: - interrupts: Should contain the interrupt to be associated with this device -- 2.14.2
[PATCH v3 1/5] uio: dt-bindings: document existing binding for uio-pdrv-genirq
Document the device tree bindings for the uio-prv-genirq driver. Provide some examples on how it can be used. Signed-off-by: Chris Packham --- Changes in v3: - New. Document existing binding first. .../devicetree/bindings/uio/uio-pdrv-genirq.txt| 27 ++ 1 file changed, 27 insertions(+) create mode 100644 Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt diff --git a/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt b/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt new file mode 100644 index ..5a92b1f8825f --- /dev/null +++ b/Documentation/devicetree/bindings/uio/uio-pdrv-genirq.txt @@ -0,0 +1,27 @@ +*Generic UIO platform driver with interrupts + +Required properties: +- compatible: Should be a value set with the of_id module parameter. + +Optional properties: +- interrupts: Should contain the interrupt to be associated with this device + (only a single interrupt is supported per device). +- interrupt-parent: Specifies the phandle to the parent interrupt controller. +- reg: Should specify the physical address spaces used by this device. + +Example: + +/* Device with MM IO and interrupt assuming of_id="me,my-device" */ +my-device@1 { + compatible = "me,my-device"; + reg = <0x1 0x40>; + interrupts = <4 IRQ_TYPE_EDGE_BOTH>; + interrupt-parent = <&gic>; +}; + +/* Device with interrupt only assuming of_id="me,my-device" */ +my-int { + compatible = "me,my-device"; + interrupts = <6 IRQ_TYPE_EDGE_BOTH>; + interrupt-parent = <&gic>; +}; -- 2.14.2
[PATCH v3 5/5] uio: uio_pdrv_genirq: Add "marvell,88e2040" to dt compatible list
The marvell,88e2040 is a N-BaseT Ethernet PHY which requires a userspace driver. uio_pdrv_genirq can be used to provide interrupts to such a userspace driver. Signed-off-by: Chris Packham --- Changes in v3: - (replaces "uio: add default compatible string to uio_pdrv_genirq") - As suggested by Rob, add specific dt bindings for the hardware I'm supporting. uio_pdrv_genirq is one potential implementor of a compatible driver. drivers/uio/uio_pdrv_genirq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/uio/uio_pdrv_genirq.c b/drivers/uio/uio_pdrv_genirq.c index 532de98034c2..4bcab82b899e 100644 --- a/drivers/uio/uio_pdrv_genirq.c +++ b/drivers/uio/uio_pdrv_genirq.c @@ -254,11 +254,12 @@ static const struct dev_pm_ops uio_pdrv_genirq_dev_pm_ops = { #ifdef CONFIG_OF static struct of_device_id uio_of_genirq_match[] = { { .compatible = "st,stm32f100" }, + { .compatible = "marvell,88e2040" }, { /* This is filled with module_parm */ }, { /* Sentinel */ }, }; MODULE_DEVICE_TABLE(of, uio_of_genirq_match); -module_param_string(of_id, uio_of_genirq_match[1].compatible, 128, 0); +module_param_string(of_id, uio_of_genirq_match[2].compatible, 128, 0); MODULE_PARM_DESC(of_id, "Openfirmware id of the device to be handled by uio"); #endif -- 2.14.2
[PATCH v6 02/14] platform/x86: dell-wmi: increase severity of some failures
There is a lot of error checking in place for the format of the WMI descriptor buffer, but some of the potentially raised issues should be considered critical failures. If the buffer size or header don't match, this is a good indication that the buffer format changed in a way that the rest of the data should not be relied upon. For the remaining data set vectors, continue to notate a warning in undefined results, but as those are fields that the descriptor intended to refer to other applications, don't fail if they're new values. Signed-off-by: Mario Limonciello --- drivers/platform/x86/dell-wmi.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c index 1fbef560ca67..2cfaaa8faf0a 100644 --- a/drivers/platform/x86/dell-wmi.c +++ b/drivers/platform/x86/dell-wmi.c @@ -657,17 +657,18 @@ static int dell_wmi_check_descriptor_buffer(struct wmi_device *wdev) dev_err(&wdev->dev, "Dell descriptor buffer has invalid length (%d)\n", obj->buffer.length); - if (obj->buffer.length < 16) { - ret = -EINVAL; - goto out; - } + ret = -EINVAL; + goto out; } buffer = (u32 *)obj->buffer.pointer; - if (buffer[0] != 0x4C4C4544 && buffer[1] != 0x494D5720) - dev_warn(&wdev->dev, "Dell descriptor buffer has invalid signature (%*ph)\n", + if (buffer[0] != 0x4C4C4544 && buffer[1] != 0x494D5720) { + dev_err(&wdev->dev, "Dell descriptor buffer has invalid signature (%*ph)\n", 8, buffer); + ret = -EINVAL; + goto out; + } if (buffer[2] != 0 && buffer[2] != 1) dev_warn(&wdev->dev, "Dell descriptor buffer has unknown version (%d)\n", -- 2.14.1
[PATCH v6 00/14] Introduce support for Dell SMBIOS over WMI
The existing way that the dell-smbios helper module and associated other drivers (dell-laptop, dell-wmi) communicate with the platform really isn't secure. It requires creating a buffer in physical DMA32 memory space and passing that to the platform via SMM. Since the platform got a physical memory pointer, you've just got to trust that the platform has only modified (and accessed) memory within that buffer. Dell Platform designers recognize this security risk and offer a safer way to communicate with the platform over ACPI. This is in turn exposed via a WMI interface to the OS. When communicating over WMI-ACPI the communication doesn't occur with physical memory pointers. When the ASL is invoked, the fixed length ACPI buffer is copied to a small operating region. The ASL will invoke the SMI, and SMM will only have access to this operating region. When the ASL returns the buffer is copied back for the OS to process. This method of communication should also deprecate the usage of the dcdbas kernel module and software dependent upon it's interface. Instead offer a character device interface for communicating with this ASL method to allow userspace to use instead. To faciliate that this patch series introduces a generic way for WMI drivers to be able to create discoverable character devices with a predictable IOCTL interface through the WMI bus when desired. Requiring WMI drivers to explicitly ask for this functionality will act as an effective vendor whitelist to character device creation. Some of this work is the basis for what will be a proper interpreter of MOF in the kernel and controls for what drivers will be able to do with that MOF. NOTE: This patch series is intended to go on top of platform-drivers-x86 linux-next. For convenience the entire series including those is also available here: https://github.com/superm1/linux/tree/wmi-smbios changes between v5 and v6: * Adjust Kconfig for dell-smbios to not depend on anything. - SMM and WMI drivers will both select dell-smbios - Technically the module can run on it's own (it's just not useful) * Add a new tokens sysfs interface * Rework blacklisting into easily expandable structures * Lock modules during ioctl call from WMI bus * drop references to compat ioctl in both WMI and dell-smbios-wmi drivers. (Made sure ioctl worked in both 32 and 64 userspace though) * dell-smbios-wmi - s/buffer_size/req_buf_size/ to make it clearer that userspace doesn't get to set this, it's set internally. - read just buffer length before reading whole structure from userspace - if buffer length is too big, show a warning - I tried to rename argument for unlocked_ioctl, but this caused problems in matching initialization lists, so still casting. - Add comments clarifying everything happening in ioctl changes between v4 and v5: * Remove Andy's S suggested by in sysfs tokens patch * Make some output in dell-wmi-descriptor debug only * Adjust various Kconfig dependencies as recommended by Darren * Drop patch to set dell-smbios to default on ACPI_WMI, it's not needed after the Kconfig dependencies rework * Move WSMT check patch to after WMI driver is introduced. * Make common smbios call return value int as there could be errors now with drivers not being loaded. * Make SMBIOS call methods for all drivers return status * Reorder patches 2 and 4. * Don't export symbols for calling functions on dispatchers * wmi patch: - use sprintf instead of strcpy - remove needless bool for tracking found - adjust logic to look for instance_count - 1, it's zero based not 1 based. - Pass a callback to unlocked_ioctl instead of full file operations object - ioctl: Don't fail on no bound WMI driver - Add missing header for uapi - Make helper macros include data types - add compat ioctl * dell-smbios: - Add filtering functionality for SMBIOS calling interface - Use dev_dbg rather than pr_debug where possible * dell-smbios-wmi: - test for handle on b1 table - correct misc flags comment - drop access checks - use dev_dbg instead of pr_* calls - use filtering functionality - add mutexes around list add/remove - switch from get_first_device to get_first_priv and inline - add mutex locking to prevent unloading mid-call. - update to new ioctl passing - fix userspace uapi to use __u32 instead of u32 - Don't use a header file for internal only use - make sure it works with compat ioctl * dell-laptop: make dell_smbios_send_request local function for boilerplate calls. * ioctl patch - Change API to have a simpler structure to pass back and forth - Rename header file - Export to sysfs properly - Add the size of the length variable into the requested buffersize to sysfs, do math in the driver when copying data around. changes between v3 and v4: * Make Dell WMI notifications driver fail notifications fail when WMI descriptor
Re: [PATCH] bcache: mark expected switch fall-throughs in STRTO_H
On 10/09/2017 02:54 PM, Gustavo A. R. Silva wrote: In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Cc: Kent Overstreet Cc: Shaohua Li Cc: linux-bca...@vger.kernel.org Cc: linux-r...@vger.kernel.org Signed-off-by: Gustavo A. R. Silva Reviewed-by: Michael Lyle Looks good to me. Thanks, Mike
[PATCH v6 01/14] platform/x86: wmi: Add new method wmidev_evaluate_method
Drivers properly using the wmibus can pass their wmi_device pointer rather than the GUID back to the WMI bus to evaluate the proper methods. Any "new" drivers added that use the WMI bus should use this rather than the old wmi_evaluate_method that would take the GUID. Signed-off-by: Mario Limonciello --- drivers/platform/x86/wmi.c | 28 include/linux/wmi.h| 6 ++ 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c index 7a05843aff19..4d73a87c2ddf 100644 --- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -200,6 +200,28 @@ static acpi_status wmi_method_enable(struct wmi_block *wblock, int enable) */ acpi_status wmi_evaluate_method(const char *guid_string, u8 instance, u32 method_id, const struct acpi_buffer *in, struct acpi_buffer *out) +{ + struct wmi_block *wblock = NULL; + + if (!find_guid(guid_string, &wblock)) + return AE_ERROR; + return wmidev_evaluate_method(&wblock->dev, instance, method_id, + in, out); +} +EXPORT_SYMBOL_GPL(wmi_evaluate_method); + +/** + * wmidev_evaluate_method - Evaluate a WMI method + * @wdev: A wmi bus device from a driver + * @instance: Instance index + * @method_id: Method ID to call + * &in: Buffer containing input for the method call + * &out: Empty buffer to return the method results + * + * Call an ACPI-WMI method + */ +acpi_status wmidev_evaluate_method(struct wmi_device *wdev, u8 instance, + u32 method_id, const struct acpi_buffer *in, struct acpi_buffer *out) { struct guid_block *block = NULL; struct wmi_block *wblock = NULL; @@ -209,9 +231,7 @@ u32 method_id, const struct acpi_buffer *in, struct acpi_buffer *out) union acpi_object params[3]; char method[5] = "WM"; - if (!find_guid(guid_string, &wblock)) - return AE_ERROR; - + wblock = container_of(wdev, struct wmi_block, dev); block = &wblock->gblock; handle = wblock->acpi_device->handle; @@ -246,7 +266,7 @@ u32 method_id, const struct acpi_buffer *in, struct acpi_buffer *out) return status; } -EXPORT_SYMBOL_GPL(wmi_evaluate_method); +EXPORT_SYMBOL_GPL(wmidev_evaluate_method); static acpi_status __query_block(struct wmi_block *wblock, u8 instance, struct acpi_buffer *out) diff --git a/include/linux/wmi.h b/include/linux/wmi.h index cd0d7734dc49..2cd10c3b89e9 100644 --- a/include/linux/wmi.h +++ b/include/linux/wmi.h @@ -26,6 +26,12 @@ struct wmi_device { bool setable; }; +/* evaluate the ACPI method associated with this device */ +extern acpi_status wmidev_evaluate_method(struct wmi_device *wdev, + u8 instance, u32 method_id, + const struct acpi_buffer *in, + struct acpi_buffer *out); + /* Caller must kfree the result. */ extern union acpi_object *wmidev_block_query(struct wmi_device *wdev, u8 instance); -- 2.14.1
[PATCH v6 14/14] platform/x86: dell-smbios-wmi: introduce userspace interface
It's important for the driver to provide a R/W ioctl to ensure that two competing userspace processes don't race to provide or read each others data. This userspace character device will be used to perform SMBIOS calls from any applications. It provides an ioctl that will allow passing the WMI calling interface buffer between userspace and kernel space. This character device is intended to deprecate the dcdbas kernel module and the interface that it provides to userspace. To use the character device the buffer needed for the machine will also be needed. This information is exported to a sysfs attribute. The API for interacting with this interface is defined in documentation as well as a uapi header provides the format of the structures. Signed-off-by: Mario Limonciello --- Documentation/ABI/testing/dell-smbios-wmi | 41 .../ABI/testing/sysfs-platform-dell-smbios-wmi | 10 ++ MAINTAINERS| 1 + drivers/platform/x86/dell-smbios-wmi.c | 107 ++--- drivers/platform/x86/dell-smbios.h | 11 +-- include/uapi/linux/dell-smbios.h | 42 6 files changed, 191 insertions(+), 21 deletions(-) create mode 100644 Documentation/ABI/testing/dell-smbios-wmi create mode 100644 Documentation/ABI/testing/sysfs-platform-dell-smbios-wmi create mode 100644 include/uapi/linux/dell-smbios.h diff --git a/Documentation/ABI/testing/dell-smbios-wmi b/Documentation/ABI/testing/dell-smbios-wmi new file mode 100644 index ..e067e955fcc9 --- /dev/null +++ b/Documentation/ABI/testing/dell-smbios-wmi @@ -0,0 +1,41 @@ +What: /dev/wmi/dell-smbios +Date: November 2017 +KernelVersion: 4.15 +Contact: "Mario Limonciello" +Description: + Perform SMBIOS calls on supported Dell machines. + through the Dell ACPI-WMI interface. + + IOCTL's and buffer formats are defined in: + + + 1) To perform a call from userspace, you'll need to first + determine the minimum size of the calling interface buffer + for your machine. + Platforms that contain larger buffers can return larger + objects from the system firmware. + Commonly this size is either 4k or 32k. + + To determine the size of the buffer, refer to: + sysfs-platform-dell-smbios-wmi + + 2) After you've determined the minimum size of the calling + interface buffer, you can allocate a structure that represents + the structure documented above. + + 3) In the 'length' object store the size of the buffer you + determined above and allocated. + + 4) In this buffer object, prepare as necessary for the SMBIOS + call you're interested in. Typically SMBIOS buffers have + "class", "select", and "input" defined to values that coincide + with the data you are interested in. + Documenting class/select/input values is outside of the scope + of this documentation. Check with the libsmbios project for + further documentation on these values. + + 6) Run the call by using ioctl() as described in the header. + + 7) The output will be returned in the buffer object. + + 8) Be sure to free up your allocated object. diff --git a/Documentation/ABI/testing/sysfs-platform-dell-smbios-wmi b/Documentation/ABI/testing/sysfs-platform-dell-smbios-wmi new file mode 100644 index ..33f0fb73f785 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-platform-dell-smbios-wmi @@ -0,0 +1,10 @@ +What: /sys/devices/platform//required_buffer_size +Date: November 2017 +KernelVersion: 4.15 +Contact: "Mario Limonciello" +Description: + A read-only description of the size of a calling + interface buffer that can be passed to Dell + firmware. + + Commonly this size is either 4k or 32k. diff --git a/MAINTAINERS b/MAINTAINERS index 2a99ee9fd883..4940f3c7481b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3986,6 +3986,7 @@ M:Mario Limonciello L: platform-driver-...@vger.kernel.org S: Maintained F: drivers/platform/x86/dell-smbios-wmi.c +F: include/uapi/linux/dell-smbios.h DELL LAPTOP DRIVER M: Matthew Garrett diff --git a/drivers/platform/x86/dell-smbios-wmi.c b/drivers/platform/x86/dell-smbios-wmi.c index e8a2eb858677..ec38fbf945d5 100644 --- a/drivers/platform/x86/dell-smbios-wmi.c +++ b/drivers/platform/x86/dell-smbios-wmi.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "dell-smbios.h" #include "dell-wmi-descriptor.h" static DEFINE_MUTEX(call_mutex); @@ -29,19 +30,9 @@ struct misc_bios_flags_structure { #define DELL_WMI_SMBIOS_GUID "A805
[PATCH v6 13/14] platform/x86: wmi: create character devices when requested by drivers
For WMI operations that are only Set or Query read or write sysfs attributes created by WMI vendor drivers make sense. For other WMI operations that are run on Method, there needs to be a way to guarantee to userspace that the results from the method call belong to the data request to the method call. Sysfs attributes don't work well in this scenario because two userspace processes may be competing at reading/writing an attribute and step on each other's data. When a WMI vendor driver declares an ioctl callback in the wmi_driver the WMI bus driver will create a character device that maps to that function. That character device will correspond to this path: /dev/wmi/$driver The WMI bus driver will interpret the IOCTL calls, test them for a valid instance and pass them on to the vendor driver to run. This creates an implicit policy that only driver per character device. If a module matches multiple GUID's, the wmi_devices will need to be all handled by the same wmi_driver if the same character device is used. The WMI vendor drivers will be responsible for managing access to this character device and proper locking on it. When a WMI vendor driver is unloaded the WMI bus driver will clean up the character device. Signed-off-by: Mario Limonciello --- MAINTAINERS| 1 + drivers/platform/x86/wmi.c | 69 ++ include/linux/wmi.h| 3 ++ include/uapi/linux/wmi.h | 19 + 4 files changed, 92 insertions(+) create mode 100644 include/uapi/linux/wmi.h diff --git a/MAINTAINERS b/MAINTAINERS index e7514b616e13..2a99ee9fd883 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -372,6 +372,7 @@ ACPI WMI DRIVER L: platform-driver-...@vger.kernel.org S: Orphan F: drivers/platform/x86/wmi.c +F: include/uapi/linux/wmi.h AD1889 ALSA SOUND DRIVER M: Thibaut Varene diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c index bcb41c1c7f52..a22b0d333c8f 100644 --- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -69,6 +70,7 @@ struct wmi_block { struct wmi_device dev; struct list_head list; struct guid_block gblock; + struct miscdevice misc_dev; struct acpi_device *acpi_device; wmi_notify_handler handler; void *handler_data; @@ -765,12 +767,74 @@ static int wmi_dev_match(struct device *dev, struct device_driver *driver) return 0; } +static long wmi_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) +{ + struct wmi_driver *wdriver = NULL; + struct wmi_block *wblock = NULL; + const char *driver_name; + struct list_head *p; + int ret; + + if (_IOC_TYPE(cmd) != WMI_IOC) + return -ENOTTY; + + driver_name = filp->f_path.dentry->d_iname; + + list_for_each(p, &wmi_block_list) { + wblock = list_entry(p, struct wmi_block, list); + wdriver = container_of(wblock->dev.dev.driver, + struct wmi_driver, driver); + if (!wdriver) + continue; + if (strcmp(driver_name, wdriver->driver.name) == 0) + break; + } + + if (!wdriver) + return -ENODEV; + + /* make sure we're not calling a higher instance than exists*/ + if (_IOC_NR(cmd) > wblock->gblock.instance_count - 1) + return -EINVAL; + + if (!try_module_get(wdriver->driver.owner)) + return -EBUSY; + + ret = wdriver->unlocked_ioctl(&wblock->dev, cmd, arg); + module_put(wdriver->driver.owner); + + return ret; +} + +static const struct file_operations wmi_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = wmi_ioctl, +}; + static int wmi_dev_probe(struct device *dev) { struct wmi_block *wblock = dev_to_wblock(dev); struct wmi_driver *wdriver = container_of(dev->driver, struct wmi_driver, driver); int ret = 0; + char *buf; + + /* driver wants a character device made */ + if (wdriver->unlocked_ioctl) { + buf = kmalloc(strlen(wdriver->driver.name) + 4, GFP_KERNEL); + if (!buf) + return -ENOMEM; + sprintf(buf, "wmi/%s", wdriver->driver.name); + wblock->misc_dev.minor = MISC_DYNAMIC_MINOR; + wblock->misc_dev.name = buf; + wblock->misc_dev.fops = &wmi_fops; + ret = misc_register(&wblock->misc_dev); + if (ret) { + dev_warn(dev, "failed to register char dev: %d", ret); + kfree(buf); + return -ENOMEM; + } + } if (ACPI_FAILURE(wmi_method_enable(wblock, 1))) dev_warn(dev, "failed to enable device --
[PATCH v6 09/14] platform/x86: dell-smbios: Introduce dispatcher for SMM calls
This splits up the dell-smbios driver into two drivers: * dell-smbios * dell-smbios-smm dell-smbios can operate with multiple different dispatcher drivers to perform SMBIOS operations. Also modify the interface that dell-laptop and dell-wmi use align to this model more closely. Rather than a single global buffer being allocated for all drivers, each driver will allocate and be responsible for it's own buffer. The pointer will be passed to the calling function and each dispatcher driver will then internally copy it to the proper location to perform it's call. Signed-off-by: Mario Limonciello --- MAINTAINERS| 6 + drivers/platform/x86/Kconfig | 15 +- drivers/platform/x86/Makefile | 1 + drivers/platform/x86/dell-laptop.c | 255 - drivers/platform/x86/dell-smbios-smm.c | 136 ++ drivers/platform/x86/dell-smbios.c | 113 +-- drivers/platform/x86/dell-smbios.h | 13 +- drivers/platform/x86/dell-wmi.c| 11 +- 8 files changed, 329 insertions(+), 221 deletions(-) create mode 100644 drivers/platform/x86/dell-smbios-smm.c diff --git a/MAINTAINERS b/MAINTAINERS index 2e3f2aea0370..8faf08ebcfee 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3974,6 +3974,12 @@ L: platform-driver-...@vger.kernel.org S: Maintained F: drivers/platform/x86/dell-smbios.* +DELL SMBIOS SMM DRIVER +M: Mario Limonciello +L: platform-driver-...@vger.kernel.org +S: Maintained +F: drivers/platform/x86/dell-smbios-smm.c + DELL LAPTOP DRIVER M: Matthew Garrett M: Pali Rohár diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 7722923c968c..53a2de781de6 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -93,12 +93,19 @@ config ASUS_LAPTOP config DELL_SMBIOS tristate - select DCDBAS + +config DELL_SMBIOS_SMM + tristate "Dell SMBIOS calling interface (SMM implementation)" + depends on DCDBAS + default DCDBAS + select DELL_SMBIOS ---help--- - This module provides common functions for kernel modules using - Dell SMBIOS. + This provides an implementation for the Dell SMBIOS calling interface + communicated over SMI/SMM. - If you have a Dell laptop, say Y or M here. + If you have a Dell computer from <=2017 you should say Y or M here. + If you aren't sure and this module doesn't work for your computer + it just won't load. config DELL_LAPTOP tristate "Dell Laptop Extras" diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 8636f5d3424f..e743615241f8 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -12,6 +12,7 @@ obj-$(CONFIG_MSI_LAPTOP) += msi-laptop.o obj-$(CONFIG_ACPI_CMPC)+= classmate-laptop.o obj-$(CONFIG_COMPAL_LAPTOP)+= compal-laptop.o obj-$(CONFIG_DELL_SMBIOS) += dell-smbios.o +obj-$(CONFIG_DELL_SMBIOS_SMM) += dell-smbios-smm.o obj-$(CONFIG_DELL_LAPTOP) += dell-laptop.o obj-$(CONFIG_DELL_WMI) += dell-wmi.o obj-$(CONFIG_DELL_WMI_DESCRIPTOR) += dell-wmi-descriptor.o diff --git a/drivers/platform/x86/dell-laptop.c b/drivers/platform/x86/dell-laptop.c index f42159fd2031..2f65f5c15d53 100644 --- a/drivers/platform/x86/dell-laptop.c +++ b/drivers/platform/x86/dell-laptop.c @@ -85,6 +85,7 @@ static struct platform_driver platform_driver = { } }; +static struct calling_interface_buffer *buffer; static struct platform_device *platform_device; static struct backlight_device *dell_backlight_device; static struct rfkill *wifi_rfkill; @@ -283,6 +284,23 @@ static const struct dmi_system_id dell_quirks[] __initconst = { { } }; +int dell_send_request(u16 class, u16 select, u32 arg0, u32 arg1, u32 arg2, + u32 arg3) +{ + int ret; + + buffer->class = class; + buffer->select = select; + buffer->input[0] = arg0; + buffer->input[1] = arg1; + buffer->input[2] = arg2; + buffer->input[3] = arg3; + ret = dell_smbios_call(buffer); + if (ret != 0) + return ret; + return dell_smbios_error(buffer->output[0]); +} + /* * Derived from information in smbios-wireless-ctl: * @@ -405,7 +423,6 @@ static const struct dmi_system_id dell_quirks[] __initconst = { static int dell_rfkill_set(void *data, bool blocked) { - struct calling_interface_buffer *buffer; int disable = blocked ? 1 : 0; unsigned long radio = (unsigned long)data; int hwswitch_bit = (unsigned long)data - 1; @@ -413,20 +430,14 @@ static int dell_rfkill_set(void *data, bool blocked) int status; int ret; - buffer = dell_smbios_get_buffer(); - - dell_smbios_send_request(17, 11); - ret = buffer->output[0]; + ret = dell_send_request(17, 11, 0, 0, 0, 0); + if (
[PATCH v3 2/2] KVM: VMX: Don't expose unrestricted_guest is enabled if ept is disabled
From: Wanpeng Li SDM mentioned: "If either the “unrestricted guest” VM-execution control or the “mode-based execute control for EPT” VM- execution control is 1, the “enable EPT” VM-execution control must also be 1." However, we can still observe unrestricted_guest is Y after inserting the kvm-intel.ko w/ ept=N. It depends on later starts a guest in order that the function vmx_compute_secondary_exec_control() can be executed, then both the module parameter and exec control fields will be amended. This patch fixes it by amending module parameter immediately during vmcs data setup. Reviewed-by: Jim Mattson Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Jim Mattson Signed-off-by: Wanpeng Li --- v1 -> v2: * reduce to just "enable_ept = 0" arch/x86/kvm/vmx.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ab0f16e..dedff59 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6730,16 +6730,13 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_ept() || !cpu_has_vmx_ept_4levels() || !cpu_has_vmx_ept_mt_wb() || - !cpu_has_vmx_invept_global()) { + !cpu_has_vmx_invept_global()) enable_ept = 0; - enable_unrestricted_guest = 0; - enable_ept_ad_bits = 0; - } if (!cpu_has_vmx_ept_ad_bits() || !enable_ept) enable_ept_ad_bits = 0; - if (!cpu_has_vmx_unrestricted_guest()) + if (!cpu_has_vmx_unrestricted_guest() || !enable_ept) enable_unrestricted_guest = 0; if (!cpu_has_vmx_flexpriority()) -- 2.7.4
[PATCH v3 1/2] KVM: X86: Processor States following Reset or INIT
From: Wanpeng Li - XCR0 is reset to 1 by RESET but not INIT - XSS is zeroed by both RESET and INIT - BNDCFGU, BND0-BND3, BNDCFGS are zeroed by both RESET and INIT This patch does this according to SDM. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Jim Mattson Signed-off-by: Wanpeng Li --- v2 -> v3: * fix null pointer deference * fix patch description v1 -> v2: * XCR0 is not zeroed by INIT * XSS, BNDCFGU, BND0-BND3, BNDCFGS are zeroed by both RESET and INIT arch/x86/kvm/vmx.c | 2 ++ arch/x86/kvm/x86.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 244e366..ab0f16e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5581,6 +5581,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0); + if (kvm_mpx_supported()) + vmcs_write64(GUEST_BNDCFGS, 0); setup_msrs(vmx); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b0d2915..b1c1755 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7804,18 +7804,33 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_async_pf_hash_reset(vcpu); vcpu->arch.apf.halted = false; + if (kvm_mpx_supported()) { + void *mpx_state_buffer; + + mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave, XFEATURE_MASK_BNDREGS); + if (mpx_state_buffer) + memset(mpx_state_buffer, 0, sizeof(struct mpx_bndreg_state)); + mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave, XFEATURE_MASK_BNDCSR); + if (mpx_state_buffer) + memset(mpx_state_buffer, 0, sizeof(u64)); + } + if (!init_event) { kvm_pmu_reset(vcpu); vcpu->arch.smbase = 0x3; vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT; vcpu->arch.msr_misc_features_enables = 0; + + vcpu->arch.xcr0 = XFEATURE_MASK_FP; } memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs)); vcpu->arch.regs_avail = ~0; vcpu->arch.regs_dirty = ~0; + vcpu->arch.ia32_xss = 0; + kvm_x86_ops->vcpu_reset(vcpu, init_event); } -- 2.7.4
Re: [PATCH] bitfield: Use __ffs64(x) to fix missing __ffsdi2()
On Mon, 9 Oct 2017 10:40:49 +0200, Geert Uytterhoeven wrote: > On most architectures[*], gcc turns __builtin_ffsll() into a call to > __ffsdi2(), which is not provided by any architecture, leading to > failures like: > > rcar-gen3-cpg.c:(.text+0x289): undefined reference to `__ffsdi2' > > To fix this, use __ffs64() instead, which is available on all > architectures. > > [*] Known exceptions are some 64-bit architectures like amd64, arm64, > ia64, powerpc64, and tilegx. > > Reported-by: kbuild test robot > Fixes: 3e9b3112ec74f192 ("add basic register-field manipulation macros") > Signed-off-by: Geert Uytterhoeven > --- > include/linux/bitfield.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h > index 8b9d6fff002db113..0a827677372756fa 100644 > --- a/include/linux/bitfield.h > +++ b/include/linux/bitfield.h > @@ -15,6 +15,7 @@ > #ifndef _LINUX_BITFIELD_H > #define _LINUX_BITFIELD_H > > +#include > #include > > /* > @@ -46,7 +47,7 @@ > * reg |= FIELD_PREP(REG_FIELD_C, c); > */ > > -#define __bf_shf(x) (__builtin_ffsll(x) - 1) > +#define __bf_shf(x) __ffs64(x) Hm. The build bot failure made me think. I think rcar-gen3-cpg.c may be doing something wrong here, could you point me at the patch in question? I don't see any FIELD_* there in Linus's tree. __bf_shf() is supposed to be used with constant masks only, therefore the call must be optimized away completely. > #define __BF_FIELD_CHECK(_mask, _reg, _val, _pfx)\ > ({ \
[PATCH v6 11/14] platform/x86: dell-smbios-wmi: Add new WMI dispatcher driver
The dell-smbios stack only currently uses an SMI interface which grants direct access to physical memory to the firmware SMM methods via a pointer. This dispatcher driver adds a WMI-ACPI interface that is detected by WMI probe and preferred over the SMI interface in dell-smbios. Changing this to operate over WMI-ACPI will use an ACPI OperationRegion for a buffer of data storage when SMM calls are performed. This is a safer approach to use in kernel drivers as the SMM will only have access to that OperationRegion. Signed-off-by: Mario Limonciello --- MAINTAINERS| 6 + drivers/platform/x86/Kconfig | 14 ++ drivers/platform/x86/Makefile | 1 + drivers/platform/x86/dell-smbios-wmi.c | 229 + 4 files changed, 250 insertions(+) create mode 100644 drivers/platform/x86/dell-smbios-wmi.c diff --git a/MAINTAINERS b/MAINTAINERS index 8faf08ebcfee..e7514b616e13 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3980,6 +3980,12 @@ L: platform-driver-...@vger.kernel.org S: Maintained F: drivers/platform/x86/dell-smbios-smm.c +DELL SMBIOS WMI DRIVER +M: Mario Limonciello +L: platform-driver-...@vger.kernel.org +S: Maintained +F: drivers/platform/x86/dell-smbios-wmi.c + DELL LAPTOP DRIVER M: Matthew Garrett M: Pali Rohár diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 53a2de781de6..83aee9068c64 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -94,6 +94,20 @@ config ASUS_LAPTOP config DELL_SMBIOS tristate +config DELL_SMBIOS_WMI + tristate "Dell SMBIOS calling interface (WMI implementation)" + depends on ACPI_WMI + select DELL_WMI_DESCRIPTOR + default ACPI_WMI + select DELL_SMBIOS + ---help--- + This provides an implementation for the Dell SMBIOS calling interface + communicated over ACPI-WMI. + + If you have a Dell computer from >2007 you should say Y or M here. + If you aren't sure and this module doesn't work for your computer + it just won't load. + config DELL_SMBIOS_SMM tristate "Dell SMBIOS calling interface (SMM implementation)" depends on DCDBAS diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index e743615241f8..1c4234861de0 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -12,6 +12,7 @@ obj-$(CONFIG_MSI_LAPTOP) += msi-laptop.o obj-$(CONFIG_ACPI_CMPC)+= classmate-laptop.o obj-$(CONFIG_COMPAL_LAPTOP)+= compal-laptop.o obj-$(CONFIG_DELL_SMBIOS) += dell-smbios.o +obj-$(CONFIG_DELL_SMBIOS_WMI) += dell-smbios-wmi.o obj-$(CONFIG_DELL_SMBIOS_SMM) += dell-smbios-smm.o obj-$(CONFIG_DELL_LAPTOP) += dell-laptop.o obj-$(CONFIG_DELL_WMI) += dell-wmi.o diff --git a/drivers/platform/x86/dell-smbios-wmi.c b/drivers/platform/x86/dell-smbios-wmi.c new file mode 100644 index ..e8a2eb858677 --- /dev/null +++ b/drivers/platform/x86/dell-smbios-wmi.c @@ -0,0 +1,229 @@ +/* + * WMI methods for use with dell-smbios + * + * Copyright (c) 2017 Dell Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include "dell-smbios.h" +#include "dell-wmi-descriptor.h" +static DEFINE_MUTEX(call_mutex); +static DEFINE_MUTEX(list_mutex); +static int wmi_supported; + +struct misc_bios_flags_structure { + struct dmi_header header; + u16 flags0; +} __packed; +#define FLAG_HAS_ACPI_WMI 0x02 + +#define DELL_WMI_SMBIOS_GUID "A80593CE-A997-11DA-B012-B622A1EF5492" + +struct wmi_extensions { + __u32 argattrib; + __u32 blength; + __u8 data[]; +} __packed; + +struct wmi_smbios_buffer { + struct calling_interface_buffer std; + struct wmi_extensions ext; +} __packed; + +struct wmi_smbios_priv { + struct wmi_smbios_buffer *buf; + struct list_head list; + struct wmi_device *wdev; + struct device *child; + u32 req_buf_size; +}; +static LIST_HEAD(wmi_list); + +static inline struct wmi_smbios_priv *get_first_smbios_priv(void) +{ + return list_first_entry_or_null(&wmi_list, + struct wmi_smbios_priv, + list); +} + +static int run_smbios_call(struct wmi_device *wdev) +{ + struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL}; + struct wmi_smbios_priv *priv; + struct acpi_buffer input; + union acpi_object *obj; + acpi_status status; + + priv = dev_get_drvdata(&wdev->dev); + input.length = priv->req_buf_size; + input.pointer = &priv->buf->std; + + dev_dbg(&wdev->dev, "evaluating: %x/%x [%x,%x,%x,
Re: [PATCH v2] Userfaultfd: Add description for UFFD_FEATURE_SIGBUS
On Mon, Oct 09, 2017 at 03:45:51PM -0700, Prakash Sangappa wrote: > Userfaultfd feature UFFD_FEATURE_SIGBUS was merged recently and should > be available in Linux 4.14 release. This patch is for the manpage > changes documenting this API. > > Documents the following commit: > > commit 2d6d6f5a09a96cc1fec7ed992b825e05f64cb50e > Author: Prakash Sangappa > Date: Wed Sep 6 16:23:39 2017 -0700 > > mm: userfaultfd: add feature to request for a signal delivery > > Signed-off-by: Prakash Sangappa Reviewed-by: Andrea Arcangeli
[PATCH v6 08/14] platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens
Currently userspace tools can access system tokens via the dcdbas kernel module and a SMI call that will cause the platform to execute SMM code. With a goal in mind of deprecating the dcdbas kernel module a different method for accessing these tokens from userspace needs to be created. This is intentionally marked to only be readable as root as it can contain sensitive information about the platform's configuration. While adding this interface I found that some tokens are duplicated. These need to be ignored from sysfs to avoid duplicate files. MAINTAINERS was missing for this driver. Add myself and Pali to maintainers list for it. Signed-off-by: Mario Limonciello --- .../ABI/testing/sysfs-platform-dell-smbios | 20 ++ MAINTAINERS| 7 + drivers/platform/x86/dell-smbios.c | 205 - 3 files changed, 231 insertions(+), 1 deletion(-) create mode 100644 Documentation/ABI/testing/sysfs-platform-dell-smbios diff --git a/Documentation/ABI/testing/sysfs-platform-dell-smbios b/Documentation/ABI/testing/sysfs-platform-dell-smbios new file mode 100644 index ..6700189832f5 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-platform-dell-smbios @@ -0,0 +1,20 @@ +What: /sys/devices/platform//tokens/* +Date: November 2017 +KernelVersion: 4.15 +Contact: "Mario Limonciello" +Description: + A read-only description of Dell platform tokens + available on the machine. + + Each token attribute is available as a pair of + sysfs attributes readable by root. + + For example the token ID "5" would be available + as the following attributes: + + 0005_location + 0005_value + + Tokens will vary from machine to machine, and + only tokens available on that machine will be + displayed. diff --git a/MAINTAINERS b/MAINTAINERS index 659dbeec4191..2e3f2aea0370 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3967,6 +3967,13 @@ M: "Maciej W. Rozycki" S: Maintained F: drivers/net/fddi/defxx.* +DELL SMBIOS DRIVER +M: Pali Rohár +M: Mario Limonciello +L: platform-driver-...@vger.kernel.org +S: Maintained +F: drivers/platform/x86/dell-smbios.* + DELL LAPTOP DRIVER M: Matthew Garrett M: Pali Rohár diff --git a/drivers/platform/x86/dell-smbios.c b/drivers/platform/x86/dell-smbios.c index 7e779278d054..ade248646578 100644 --- a/drivers/platform/x86/dell-smbios.c +++ b/drivers/platform/x86/dell-smbios.c @@ -23,6 +23,7 @@ #include #include #include "../../firmware/dcdbas.h" +#include #include "dell-smbios.h" struct calling_interface_structure { @@ -39,7 +40,11 @@ static DEFINE_MUTEX(buffer_mutex); static int da_command_address; static int da_command_code; static int da_num_tokens; +static struct platform_device *platform_device; static struct calling_interface_token *da_tokens; +static struct device_attribute *token_location_attrs; +static struct device_attribute *token_value_attrs; +static struct attribute **token_attrs; int dell_smbios_error(int value) { @@ -157,6 +162,26 @@ static void __init parse_da_table(const struct dmi_header *dm) da_num_tokens += tokens; } +static void zero_duplicates(struct device *dev) +{ + int i, j; + + for (i = 0; i < da_num_tokens; i++) { + for (j = i+1; j < da_num_tokens; j++) { + if (da_tokens[i].tokenID == 0 || + da_tokens[j].tokenID == 0) + continue; + if (da_tokens[i].tokenID == da_tokens[j].tokenID) { + dev_dbg(dev, "Zeroing dup token ID %x(%x/%x)\n", + da_tokens[j].tokenID, + da_tokens[j].location, + da_tokens[j].value); + da_tokens[j].tokenID = 0; + } + } + } +} + static void __init find_tokens(const struct dmi_header *dm, void *dummy) { switch (dm->type) { @@ -170,6 +195,148 @@ static void __init find_tokens(const struct dmi_header *dm, void *dummy) } } +static int match_attribute(struct device *dev, + struct device_attribute *attr) +{ + int i; + + for (i = 0; i < da_num_tokens * 2; i++) { + if (!token_attrs[i]) + continue; + if (strcmp(token_attrs[i]->name, attr->attr.name) == 0) + return i/2; + } + dev_dbg(dev, "couldn't match: %s\n", attr->attr.name); + return -EINVAL; +} + +static ssize_t location_show(struct device *dev, +struct device_attribute *attr, char *buf) +{ + int i; + + i = match_attribute(dev, attr); + if (
[PATCH v6 12/14] platform/x86: dell-smbios-smm: test for WSMT
WSMT is as an attestation to the OS that the platform won't modify memory outside of pre-defined areas. If a platform has WSMT enabled in BIOS setup, SMM calls through dcdbas will fail. The only way to access platform data in these instances is through the WMI SMBIOS calling interface. Signed-off-by: Mario Limonciello --- drivers/platform/x86/dell-smbios-smm.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/drivers/platform/x86/dell-smbios-smm.c b/drivers/platform/x86/dell-smbios-smm.c index 223531e43fea..ba315753e847 100644 --- a/drivers/platform/x86/dell-smbios-smm.c +++ b/drivers/platform/x86/dell-smbios-smm.c @@ -25,6 +25,8 @@ static struct calling_interface_buffer *buffer; struct platform_device *platform_device; static DEFINE_MUTEX(smm_mutex); +#define WSMT_EN_TOKEN 0x04EC + static const struct dmi_system_id dell_device_table[] __initconst = { { .ident = "Dell laptop", @@ -76,6 +78,30 @@ int dell_smbios_smm_call(struct calling_interface_buffer *input) return 0; } +static int test_wsmt_enabled(void) +{ + struct calling_interface_token *token; + + /* if token doesn't exist, SMM will work */ + token = dell_smbios_find_token(WSMT_EN_TOKEN); + if (!token) + return 0; + + /* if token exists, try to access over SMM */ + buffer->class = 0; + buffer->select = 0; + memset(buffer, 0, sizeof(struct calling_interface_buffer)); + buffer->input[0] = token->location; + dell_smbios_smm_call(buffer); + + /* if lookup failed, we know WSMT was enabled */ + if (buffer->output[0] != 0) + return 1; + + /* query token status if it didn't fail */ + return (buffer->output[1] == token->value); +} + static int __init dell_smbios_smm_init(void) { int ret; @@ -88,6 +114,13 @@ static int __init dell_smbios_smm_init(void) return -ENOMEM; dell_smbios_get_smm_address(&da_command_address, &da_command_code); + ret = test_wsmt_enabled(); + pr_debug("WSMT enable test: %d\n", ret); + if (ret) { + ret = -ENODEV; + goto fail_wsmt; + } + platform_device = platform_device_alloc("dell-smbios", 1); if (!platform_device) { ret = -ENOMEM; @@ -111,6 +144,7 @@ static int __init dell_smbios_smm_init(void) fail_platform_device_add: platform_device_put(platform_device); +fail_wsmt: fail_platform_device_alloc: free_page((unsigned long)buffer); return ret; -- 2.14.1
[PATCH v6 07/14] platform/x86: dell-smbios: only run if proper oem string is detected
The proper way to indicate that a system is a 'supported' Dell System is by the presence of this string in OEM strings. Allowing the driver to load on non-Dell systems will have undefined results. Signed-off-by: Mario Limonciello --- drivers/platform/x86/dell-smbios.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/platform/x86/dell-smbios.c b/drivers/platform/x86/dell-smbios.c index e9b1ca07c872..7e779278d054 100644 --- a/drivers/platform/x86/dell-smbios.c +++ b/drivers/platform/x86/dell-smbios.c @@ -172,8 +172,15 @@ static void __init find_tokens(const struct dmi_header *dm, void *dummy) static int __init dell_smbios_init(void) { + const struct dmi_device *valid; int ret; + valid = dmi_find_device(DMI_DEV_TYPE_OEM_STRING, "Dell System", NULL); + if (!valid) { + pr_err("Unable to run on non-Dell system\n"); + return -ENODEV; + } + dmi_walk(find_tokens, NULL); if (!da_tokens) { -- 2.14.1
[PATCH v6 10/14] platform/x86: dell-smbios: add filtering capability for requests
There are some categories of tokens and SMBIOS calls that it makes sense to protect userspace from accessing. These are calls that may write to one time use fields or activate hardware debugging capabilities. They are not intended for general purpose use. This same functionality may be be later extended to also intercept calls that may cause kernel functionality to get out of sync if the same functions are used by other drivers. Signed-off-by: Mario Limonciello --- drivers/platform/x86/dell-smbios.c | 123 + drivers/platform/x86/dell-smbios.h | 2 + 2 files changed, 125 insertions(+) diff --git a/drivers/platform/x86/dell-smbios.c b/drivers/platform/x86/dell-smbios.c index 386130811a34..3279d3f07be8 100644 --- a/drivers/platform/x86/dell-smbios.c +++ b/drivers/platform/x86/dell-smbios.c @@ -32,6 +32,7 @@ struct calling_interface_structure { struct calling_interface_token tokens[]; } __packed; +static u32 da_supported_commands; static int da_command_address; static int da_command_code; static int da_num_tokens; @@ -48,6 +49,54 @@ struct smbios_device { int (*call_fn)(struct calling_interface_buffer *); }; + +/* calls that should be blacklisted. may contain diagnostics, + * debugging information or are write once functions + */ +struct smbios_call { + int class; + int select; +}; + +static struct smbios_call call_blacklist[] = { + {01, 07}, + {06, 05}, + {11, 03}, + {11, 07}, + {11, 11}, + {19, -1}, +}; + +/* tokens corresponding to diagnostics or write once locations */ +struct token_range { + u16 exact_value; + u16 min; + u16 max; +}; + +static struct token_range token_blacklist[] = { + {0x, 0x0175, 0x0176}, + {0x, 0x0195, 0x0197}, + {0x, 0x01DC, 0x01DD}, + {0x, 0x027D, 0x0284}, + {0x02E3, 0x, 0x}, + {0x02FF, 0x, 0x}, + {0x, 0x0300, 0x0302}, + {0x, 0x0325, 0x0326}, + {0x, 0x0332, 0x0335}, + {0x0350, 0x, 0x}, + {0x0363, 0x, 0x}, + {0x0368, 0x, 0x}, + {0x, 0x03F6, 0x03F7}, + {0x, 0x049E, 0x049F}, + {0x, 0x04A0, 0x04A3}, + {0x, 0x04E6, 0x04E7}, + {0x, 0x4000, 0x7FFF}, + {0x, 0x9000, 0x9001}, + {0x, 0xA000, 0xBFFF}, + {0x, 0xEFF0, 0xEFFF}, +}; + static LIST_HEAD(smbios_device_list); void dell_smbios_get_smm_address(int *address, int *code) @@ -107,6 +156,72 @@ void dell_smbios_unregister_device(struct device *d) } EXPORT_SYMBOL_GPL(dell_smbios_unregister_device); +int dell_smbios_call_filter(struct device *d, + struct calling_interface_buffer *buffer) +{ + int i; + u16 t = 0; + + /* can't make calls over 30 */ + if (buffer->class > 30) { + dev_dbg(d, "buffer->class too big: %d\n", buffer->class); + return -EINVAL; + } + + /* supported calls on the particular system */ + if (!(da_supported_commands & (1 << buffer->class))) { + dev_dbg(d, "invalid command, supported commands: 0x%8x\n", + da_supported_commands); + return -EINVAL; + } + + /* match against call blacklist */ + for (i = 0; i < ARRAY_SIZE(call_blacklist); i++) { + if (buffer->class != call_blacklist[i].class) + continue; + if (buffer->select != call_blacklist[i].select && + call_blacklist[i].select != -1) + continue; + dev_dbg(d, "blacklisted command: %d/%d\n", + buffer->class, buffer->select); + return -EINVAL; + } + + /* if a token call, find token ID */ + if ((buffer->class == 0 && buffer->select < 3) || + (buffer->class == 1 && buffer->select < 3)) { + for (i = 0; i < da_num_tokens; i++) { + /* find the matching token ID */ + if (da_tokens[i].location != buffer->input[0]) + continue; + t = da_tokens[i].tokenID; + break; + } + /* not a token call */ + } else + return 0; + + /* token call; but token didn't exist */ + if (!t) { + dev_dbg(d, "token at location %u doesn't exist\n", + buffer->input[0]); + return -EINVAL; + } + /* match against token blacklist */ + for (i = 0; i < ARRAY_SIZE(token_blacklist); i++) { + if (token_blacklist[i].exact_value && + t == token_blacklist[i].exact_value) + return -EINVAL; + if (!token_blacklist[i].min || !token_blacklist[i].max) + continue; + if (t >= token_blacklist[i].min && t <=
[PATCH v6 06/14] platform/x86: wmi: Don't allow drivers to get each other's GUIDs
The only driver using this was dell-wmi, and it really was a hack. The driver was getting a data attribute from another driver and this type of action should not be encouraged. Rather drivers that need to interact with one another should pass data back and forth via exported functions. Signed-off-by: Mario Limonciello --- drivers/platform/x86/wmi.c | 17 - include/linux/wmi.h| 4 2 files changed, 21 deletions(-) diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c index 4d73a87c2ddf..bcb41c1c7f52 100644 --- a/drivers/platform/x86/wmi.c +++ b/drivers/platform/x86/wmi.c @@ -368,23 +368,6 @@ union acpi_object *wmidev_block_query(struct wmi_device *wdev, u8 instance) } EXPORT_SYMBOL_GPL(wmidev_block_query); -struct wmi_device *wmidev_get_other_guid(struct wmi_device *wdev, -const char *guid_string) -{ - struct wmi_block *this_wb = container_of(wdev, struct wmi_block, dev); - struct wmi_block *other_wb; - - if (!find_guid(guid_string, &other_wb)) - return NULL; - - if (other_wb->acpi_device != this_wb->acpi_device) - return NULL; - - get_device(&other_wb->dev.dev); - return &other_wb->dev; -} -EXPORT_SYMBOL_GPL(wmidev_get_other_guid); - /** * wmi_set_block - Write to a WMI block * @guid_string: 36 char string of the form fa50ff2b-f2e8-45de-83fa-65417f2f49ba diff --git a/include/linux/wmi.h b/include/linux/wmi.h index 2cd10c3b89e9..ddee427e0721 100644 --- a/include/linux/wmi.h +++ b/include/linux/wmi.h @@ -36,10 +36,6 @@ extern acpi_status wmidev_evaluate_method(struct wmi_device *wdev, extern union acpi_object *wmidev_block_query(struct wmi_device *wdev, u8 instance); -/* Gets another device on the same bus. Caller must put_device the result. */ -extern struct wmi_device *wmidev_get_other_guid(struct wmi_device *wdev, - const char *guid_string); - struct wmi_device_id { const char *guid_string; }; -- 2.14.1
[PATCH v6 05/14] platform/x86: dell-wmi-descriptor: split WMI descriptor into it's own driver
All communication on individual GUIDs should occur in separate drivers. Allowing a driver to communicate with the bus to another GUID is just a hack that discourages drivers to adopt the bus model. The information found from the WMI descriptor driver is now exported for use by other drivers. Signed-off-by: Mario Limonciello --- MAINTAINERS| 5 + drivers/platform/x86/Kconfig | 5 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/dell-wmi-descriptor.c | 162 + drivers/platform/x86/dell-wmi-descriptor.h | 18 drivers/platform/x86/dell-wmi.c| 89 ++-- 6 files changed, 198 insertions(+), 82 deletions(-) create mode 100644 drivers/platform/x86/dell-wmi-descriptor.c create mode 100644 drivers/platform/x86/dell-wmi-descriptor.h diff --git a/MAINTAINERS b/MAINTAINERS index 08b96f77f618..659dbeec4191 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4002,6 +4002,11 @@ M: Pali Rohár S: Maintained F: drivers/platform/x86/dell-wmi.c +DELL WMI DESCRIPTOR DRIVER +M: Mario Limonciello +S: Maintained +F: drivers/platform/x86/dell-wmi-descriptor.c + DELTA ST MEDIA DRIVER M: Hugues Fruchet L: linux-me...@vger.kernel.org diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 1f7959ff055c..7722923c968c 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -121,6 +121,7 @@ config DELL_WMI depends on DMI depends on INPUT depends on ACPI_VIDEO || ACPI_VIDEO = n + select DELL_WMI_DESCRIPTOR select DELL_SMBIOS select INPUT_SPARSEKMAP ---help--- @@ -129,6 +130,10 @@ config DELL_WMI To compile this driver as a module, choose M here: the module will be called dell-wmi. +config DELL_WMI_DESCRIPTOR + tristate + depends on ACPI_WMI + config DELL_WMI_AIO tristate "WMI Hotkeys for Dell All-In-One series" depends on ACPI_WMI diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 2b315d0df3b7..8636f5d3424f 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_COMPAL_LAPTOP) += compal-laptop.o obj-$(CONFIG_DELL_SMBIOS) += dell-smbios.o obj-$(CONFIG_DELL_LAPTOP) += dell-laptop.o obj-$(CONFIG_DELL_WMI) += dell-wmi.o +obj-$(CONFIG_DELL_WMI_DESCRIPTOR) += dell-wmi-descriptor.o obj-$(CONFIG_DELL_WMI_AIO) += dell-wmi-aio.o obj-$(CONFIG_DELL_WMI_LED) += dell-wmi-led.o obj-$(CONFIG_DELL_SMO8800) += dell-smo8800.o diff --git a/drivers/platform/x86/dell-wmi-descriptor.c b/drivers/platform/x86/dell-wmi-descriptor.c new file mode 100644 index ..72e317cf0365 --- /dev/null +++ b/drivers/platform/x86/dell-wmi-descriptor.c @@ -0,0 +1,162 @@ +/* + * Dell WMI descriptor driver + * + * Copyright (C) 2017 Dell Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published + * by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include "dell-wmi-descriptor.h" + +#define DELL_WMI_DESCRIPTOR_GUID "8D9DDCBC-A997-11DA-B012-B622A1EF5492" + +struct descriptor_priv { + struct list_head list; + u32 interface_version; + u32 size; +}; +static LIST_HEAD(wmi_list); + +bool dell_wmi_get_interface_version(u32 *version) +{ + struct descriptor_priv *priv; + + priv = list_first_entry_or_null(&wmi_list, + struct descriptor_priv, + list); + if (!priv) + return false; + *version = priv->interface_version; + return true; +} +EXPORT_SYMBOL_GPL(dell_wmi_get_interface_version); + +bool dell_wmi_get_size(u32 *size) +{ + struct descriptor_priv *priv; + + priv = list_first_entry_or_null(&wmi_list, + struct descriptor_priv, + list); + if (!priv) + return false; + *size = priv->size; + return true; +} +EXPORT_SYMBOL_GPL(dell_wmi_get_size); + +/* + * Descriptor buffer is 128 byte long and contains: + * + * Name Offset Length Value + * Vendor Signature 0 4"DELL" + * Object Signature 4 4" WMI" + * WMI Interface Version 8 4 + * WMI buffer length12 44096 or 32768 + */ +static int dell_wmi_descriptor_probe(struct wmi_device *wdev) +{ +
[PATCH v6 03/14] platform/x86: dell-wmi: clean up wmi descriptor check
Some cases the wrong type was used for errors and checks can be done more cleanly. Signed-off-by: Mario Limonciello Reviewed-by: Edward O'Callaghan Suggested-by: Andy Shevchenko --- drivers/platform/x86/dell-wmi.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c index 2cfaaa8faf0a..ece2fe341f01 100644 --- a/drivers/platform/x86/dell-wmi.c +++ b/drivers/platform/x86/dell-wmi.c @@ -663,19 +663,19 @@ static int dell_wmi_check_descriptor_buffer(struct wmi_device *wdev) buffer = (u32 *)obj->buffer.pointer; - if (buffer[0] != 0x4C4C4544 && buffer[1] != 0x494D5720) { - dev_err(&wdev->dev, "Dell descriptor buffer has invalid signature (%*ph)\n", - 8, buffer); + if (strncmp(obj->string.pointer, "DELL WMI", 8) != 0) { + dev_err(&wdev->dev, "Dell descriptor buffer has invalid signature (%8ph)\n", + buffer); ret = -EINVAL; goto out; } if (buffer[2] != 0 && buffer[2] != 1) - dev_warn(&wdev->dev, "Dell descriptor buffer has unknown version (%d)\n", + dev_warn(&wdev->dev, "Dell descriptor buffer has unknown version (%u)\n", buffer[2]); if (buffer[3] != 4096) - dev_warn(&wdev->dev, "Dell descriptor buffer has invalid buffer length (%d)\n", + dev_warn(&wdev->dev, "Dell descriptor buffer has invalid buffer length (%u)\n", buffer[3]); priv->interface_version = buffer[2]; -- 2.14.1
[PATCH v6 04/14] platform/x86: dell-wmi: allow 32k return size in the descriptor
Some platforms this year will be adopting 32k WMI buffer, so don't complain when encountering those. Signed-off-by: Mario Limonciello --- drivers/platform/x86/dell-wmi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c index ece2fe341f01..c8c7f4f9326c 100644 --- a/drivers/platform/x86/dell-wmi.c +++ b/drivers/platform/x86/dell-wmi.c @@ -624,7 +624,7 @@ static void dell_wmi_input_destroy(struct wmi_device *wdev) * Vendor Signature 0 4"DELL" * Object Signature 4 4" WMI" * WMI Interface Version 8 4 - * WMI buffer length12 44096 + * WMI buffer length12 44096 or 32768 */ static int dell_wmi_check_descriptor_buffer(struct wmi_device *wdev) { @@ -674,7 +674,7 @@ static int dell_wmi_check_descriptor_buffer(struct wmi_device *wdev) dev_warn(&wdev->dev, "Dell descriptor buffer has unknown version (%u)\n", buffer[2]); - if (buffer[3] != 4096) + if (desc_buffer[3] != 4096 && desc_buffer[3] != 32768) dev_warn(&wdev->dev, "Dell descriptor buffer has invalid buffer length (%u)\n", buffer[3]); -- 2.14.1
Re: [PATCH v3 03/22] dt-bindings: arm: scmi: add ARM MHU specific mailbox client bindings
On Mon, Oct 9, 2017 at 9:46 AM, Jassi Brar wrote: > On Mon, Oct 9, 2017 at 7:22 PM, Rob Herring wrote: >> On Fri, Oct 6, 2017 at 9:26 PM, Jassi Brar wrote: >>> On Fri, Oct 6, 2017 at 9:24 PM, Rob Herring wrote: On Fri, Oct 6, 2017 at 6:01 AM, Jassi Brar wrote: > On Fri, Oct 6, 2017 at 4:50 AM, Rob Herring wrote: >> On Thu, Sep 28, 2017 at 02:11:27PM +0100, Sudeep Holla wrote: >>> >> >>> +- mbox-data : For each phandle listed in mboxes property, an unsigned >>> 32-bit >>> + data as expected by the mailbox controller >> >> Shouldn't that be cells as part of mboxes property? >> > A MHU client can send any number of commands (such u32 values) over a > channel. > This client (SCMI) sends just one command over a channel, but other > clients may/do send two or more. >> >> The above definition doesn't support 2 or more as it is 1-1 with channels. >> > I thought you suggested to make controller driver accept the command > as another cell in client's mboxes property. > Which we can't do. Yes, agreed. But I'm wondering since a client may need more than one, how would that be expressed? Okay, then I guess I don't understand why this is in DT. >>> Yeah the client has to provide code (u32 value) for the commands it >>> sends, and that value is going to be platform specific. For example, >>> on Juno the ITS_AN_SCMI_COMMAND may be defined as BIT(7) while on my >>> platform it may be 0x4567 >>> >>> For MHU based platforms, it becomes easy if the u32 is passed from DT. >>> And that should be ok since that is like a h/w parameter - a value >>> chosen/expected by the remote firmware. >> >> Could it ever be more than 1 cell? >> > SCMI sends sub-commands via SHMEM, so it is always going to be 1cell for > _scmi_. > However many firmwares are unlikely to use just one command over a > channel - say, the protocol is trivial or the linux and remote have no > SHMEM. I'd hope the normal case is not enumerating commands and sub-commands in DT and this is special case of a "generic" protocol with platform specific aspects. It could be solved with a specific compatible for each platform/implementation. We'll probably regret not doing that, but I'm going to pretend that this time SoC vendors won't mess it up. >> I guess being in DT is fine, but I'm still not sure about the naming. >> The current name suggests it is part of the mbox binding. Do we want >> that or should it be SCMI specific? Then "data" is vague. Perhaps >> "scmi-commands"? >> > Sure. I have no problem with whatever we wanna call it. Okay. That should have an "arm" prefix too BTW. Rob
Re: [PATCH] bitfield: Use __ffs64(x) to fix missing __ffsdi2()
Hi Geert, [auto build test ERROR on linus/master] [also build test ERROR on v4.14-rc4] [cannot apply to next-20171009] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Geert-Uytterhoeven/bitfield-Use-__ffs64-x-to-fix-missing-__ffsdi2/20171010-050009 config: x86_64-allyesdebian (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): In file included from arch/x86/include/asm/current.h:4:0, from include/linux/sched.h:11, from drivers/net/wireless/intel/iwlwifi/pcie/rx.c:31: In function 'iwl_pcie_rx_mq_hw_init', inlined from 'iwl_pcie_rx_init' at drivers/net/wireless/intel/iwlwifi/pcie/rx.c:976:3: >> include/linux/compiler.h:576:38: error: call to '__compiletime_assert_857' >> declared with attribute error: BUILD_BUG_ON failed: (((0xF00) + (1ULL << >> __ffs64(0xF00))) & (((0xF00) + (1ULL << __ffs64(0xF00))) - 1)) != 0 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:556:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^~ include/linux/compiler.h:576:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/build_bug.h:70:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~ include/linux/build_bug.h:19:2: note: in expansion of macro 'BUILD_BUG_ON' BUILD_BUG_ON(((n) & ((n) - 1)) != 0) ^~~~ include/linux/bitfield.h:62:3: note: in expansion of macro '__BUILD_BUG_ON_NOT_POWER_OF_2' __BUILD_BUG_ON_NOT_POWER_OF_2((_mask) + \ ^ include/linux/bitfield.h:89:3: note: in expansion of macro '__BF_FIELD_CHECK' __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \ ^~~~ >> drivers/net/wireless/intel/iwlwifi/iwl-fh.h:487:34: note: in expansion of >> macro 'FIELD_PREP' #define RFH_GEN_CFG_VAL(_n, _v) FIELD_PREP(RFH_GEN_CFG_ ## _n, _v) ^~ >> drivers/net/wireless/intel/iwlwifi/pcie/rx.c:857:11: note: in expansion of >> macro 'RFH_GEN_CFG_VAL' RFH_GEN_CFG_VAL(DEFAULT_RXQ_NUM, 0) | ^~~ >> include/linux/compiler.h:576:38: error: call to '__compiletime_assert_862' >> declared with attribute error: BUILD_BUG_ON failed: 1UL << (4))) + (1ULL >> << __ffs64((1UL << (4) & 1UL << (4))) + (1ULL << __ffs64((1UL << >> (4) - 1)) != 0 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:556:4: note: in definition of macro '__compiletime_assert' prefix ## suffix();\ ^~ include/linux/compiler.h:576:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~ include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^~ include/linux/build_bug.h:70:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ^~~~ include/linux/build_bug.h:19:2: note: in expansion of macro 'BUILD_BUG_ON' BUILD_BUG_ON(((n) & ((n) - 1)) != 0) ^~~~ include/linux/bitfield.h:62:3: note: in expansion of macro '__BUILD_BUG_ON_NOT_POWER_OF_2' __BUILD_BUG_ON_NOT_POWER_OF_2((_mask) + \ ^ include/linux/bitfield.h:89:3: note: in expansion of macro '__BF_FIELD_CHECK' __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \ ^~~~ >> drivers/net/wireless/intel/iwlwifi/iwl-fh.h:487:34: note: in expansion of >> macro 'FIELD_PREP' #define RFH_GEN_CFG_VAL(_n, _v) FIELD_PREP(RFH_GEN_CF
Re: [PATCH] cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan
From: Aleksander Morgado Date: Mon, 9 Oct 2017 14:05:12 +0200 > The u-blox TOBY-L2 is a LTE Cat 4 module with HSPA+ and 2G fallback. > This module allows switching to different USB profiles with the > 'AT+UUSBCONF' command, and provides a ECM network interface when the > 'AT+UUSBCONF=2' profile is selected. > > The u-blox SARA-U2 is a HSPA module with 2G fallback. The default USB > configuration includes a ECM network interface. > > Both these modules are controlled via AT commands through one of the > TTYs exposed. Connecting these modules may be done just by activating > the desired PDP context with 'AT+CGACT=1,' and then running DHCP > on the ECM interface. > > Signed-off-by: Aleksander Morgado Applied, thank you.
Re: linux-next: manual merge of the cgroup tree with the net-next tree
On Mon, Oct 09, 2017 at 07:38:36PM +0100, Mark Brown wrote: > Hi Tejun, > > Today's linux-next merge of the cgroup tree got a conflict in: > > kernel/cgroup/cgroup.c > > between commit: > > 324bda9e6c5ad ("bpf: multi program support for cgroup+bpf") > > from the net-next tree and commit: > > 041cd640b2f3c ("cgroup: Implement cgroup2 basic CPU usage accounting") > > from the cgroup tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > diff --cc kernel/cgroup/cgroup.c > index 00f5b358aeac,c3421ee0d230.. > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@@ -4765,8 -4785,9 +4788,11 @@@ static struct cgroup *cgroup_create(str > > return cgrp; > > +out_idr_free: > +cgroup_idr_remove(&root->cgroup_idr, cgrp->id); > + out_stat_exit: > + if (cgroup_on_dfl(parent)) > + cgroup_stat_exit(cgrp); thanks. I did the same merge conflict resolution for our combined tree.
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) wrote: > On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: > >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: > >> Fix BUG() calls to use BUG_ON(conditional) macros. > >> > >> This was found using make coccicheck M=net/core on linux next > >> tag next-2017092 > >> > >> Signed-off-by: Tim Hansen > >> --- > >> net/core/skbuff.c | 15 ++- > >> 1 file changed, 6 insertions(+), 9 deletions(-) > >> > >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 > >> --- a/net/core/skbuff.c > >> +++ b/net/core/skbuff.c > >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, > >> gfp_t gfp_mask) > >>/* Set the tail pointer and length */ > >>skb_put(n, skb->len); > >> > >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)) > >> - BUG(); > >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); > > > >I'm concerned with this change. > >1. Calling non-trivial bit of code inside the macro is a poor coding style > >(imo) > >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and > >implementation > >of BUG and BUG_ON look quite different. > > For these archs, wouldn't it then be more efficient to use BUG_ON rather than > BUG()? why more efficient? any data to prove that? I'm pointing that the change is not equivalent and this code has been around forever (pre-git days), so I see no reason to risk changing it.
Re: [PATCH 02/13] EDAC, altera: kill off ACCESS_ONCE()
On 10/09/2017 01:28 PM, Mark Rutland wrote: For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't currently harmful. However, for some features it is necessary to instrument reads and writes separately, which is not possible with ACCESS_ONCE(). This distinction is critical to correct operation. It's possible to transform the bulk of kernel code using the Coccinelle script below. However, this doesn't handle comments, leaving references to ACCESS_ONCE() instances which have been removed. As a preparatory step, this patch converts the Altera EDAC code and comments to use {READ,WRITE}_ONCE() consistently. virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) Signed-off-by: Mark Rutland Cc: Borislav Petkov Cc: Thor Thayer --- drivers/edac/altera_edac.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/edac/altera_edac.c b/drivers/edac/altera_edac.c index 346c498..11d6419 100644 --- a/drivers/edac/altera_edac.c +++ b/drivers/edac/altera_edac.c @@ -175,11 +175,11 @@ static ssize_t altr_sdr_mc_err_inject_write(struct file *file, /* * To trigger the error, we need to read the data back * (the data was written with errors above). -* The ACCESS_ONCE macros and printk are used to prevent the +* The READ_ONCE macros and printk are used to prevent the * the compiler optimizing these reads out. */ - reg = ACCESS_ONCE(ptemp[0]); - read_reg = ACCESS_ONCE(ptemp[1]); + reg = READ_ONCE(ptemp[0]); + read_reg = READ_ONCE(ptemp[1]); /* Force Read */ rmb(); @@ -618,7 +618,7 @@ static ssize_t altr_edac_device_trig(struct file *file, for (i = 0; i < (priv->trig_alloc_sz / sizeof(*ptemp)); i++) { /* Read data so we're in the correct state */ rmb(); - if (ACCESS_ONCE(ptemp[i])) + if (READ_ONCE(ptemp[i])) result = -1; /* Toggle Error bit (it is latched), leave ECC enabled */ writel(error_mask, (drvdata->base + priv->set_err_ofst)); @@ -635,7 +635,7 @@ static ssize_t altr_edac_device_trig(struct file *file, /* Read out written data. ECC error caused here */ for (i = 0; i < ALTR_TRIGGER_READ_WRD_CNT; i++) - if (ACCESS_ONCE(ptemp[i]) != i) + if (READ_ONCE(ptemp[i]) != i) edac_printk(KERN_ERR, EDAC_DEVICE, "Read doesn't match written data\n"); Tested on Cyclone5 and Arria10. Thanks! Acked-by: Thor Thayer
[GIT] Networking
1) Fix object leak on IPSEC offload failure, from Steffen Klassert. 2) Fix range checks in ipset address range addition operations, from Jozsef Kadlecsik. 3) Fix pernet ops unregistration order in ipset, from Florian Westphal. 4) Add missing netlink attribute policy for nl80211 packet pattern attrs, from Peng Xu. 5) Fix PPP device destruction race, from Guillaume Nault. 6) Write marks get lost when BPF verifier processes R1=R2 register assignments, causing incorrect liveness information and less state pruning. Fix from Alexei Starovoitov. 7) Fix blockhole routes so that they are marked dead and therefore not cached in sockets, otherwise IPSEC stops working. From Steffen Klassert. 8) Fix broadcast handling of UDP socket early demux, from Paolo Abeni. Please pull, thanks a lot! The following changes since commit 7a92616c0bac849e790283723b36c399668a1d9f: Merge tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2017-10-05 15:51:37 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to fdfbad3256918fc5736d68384331d2dbf45ccbd6: cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan (2017-10-09 16:03:32 -0700) Aleksander Morgado (1): cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan Alexei Starovoitov (1): bpf: fix liveness marking Alexey Kodanev (2): vti: fix NULL dereference in xfrm_input() gso: fix payload length when gso_size is zero Artem Savkov (2): xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lock netfilter: ebtables: fix race condition in frame_filter_net_init() Arvind Yadav (1): netfilter: nf_tables: Release memory obtained by kasprintf Axel Beckert (1): doc: Fix typo "8023.ad" in bonding documentation Dan Carpenter (1): selftests/net: rxtimestamp: Fix an off by one David S. Miller (4): Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec Merge tag 'mac80211-for-davem-2017-10-09' of git://git.kernel.org/.../jberg/mac80211 Merge branch '10GbE' of git://git.kernel.org/.../jkirsher/net-queue Merge git://git.kernel.org/.../pablo/nf Ding Tianhong (2): Revert commit 1a8b6d76dc5b ("net:add one common config...") net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Eric Dumazet (1): netfilter: x_tables: avoid stack-out-of-bounds read in xt_copy_counters_from_user Florian Westphal (1): netfilter: ipset: pernet ops must be unregistered last Guillaume Nault (1): ppp: fix race in ppp device destruction Gustavo A. R. Silva (1): net: thunderx: mark expected switch fall-throughs in nicvf_main() Ido Schimmel (1): mlxsw: spectrum_router: Avoid expensive lookup during route removal Jason A. Donenfeld (1): netlink: do not set cb_running if dump's start() errs JingPiao Chen (1): netfilter: nf_tables: fix update chain error John Fastabend (1): ixgbe: incorrect XDP ring accounting in ethtool tx_frame param Jon Maloy (2): tipc: correct initialization of skb list tipc: Unclone message at secondary destination lookup Jozsef Kadlecsik (1): netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses Lin Zhang (1): netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook Mark D Rustad (1): ixgbe: Return error when getting PHY address if PHY access is not supported Matteo Croce (1): ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real Pablo Neira Ayuso (1): netfilter: nf_tables: do not dump chain counters if not enabled Paolo Abeni (1): udp: fix bcast packet reception Peng Xu (1): nl80211: Define policy for packet pattern attributes Ross Lagerwall (1): netfilter: ipset: Fix race between dump and swap Sabrina Dubroca (1): ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register Shmulik Ladkani (1): netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Steffen Klassert (4): xfrm: Fix deletion of offloaded SAs on failure. xfrm: Fix negative device refcount on offload failure. ipv6: Fix traffic triggered IPsec connections. ipv4: Fix traffic triggered IPsec connections. Subash Abhinov Kasiviswanathan (1): netfilter: xt_socket: Restore mark from full sockets only Vadim Fedorenko (1): netfilter: ipvs: full-functionality option for ECN encapsulation in tunnel Documentation/networking/bonding.txt | 2 +- arch/Kconfig | 3 --- arch/sparc/Kconfig| 1 - drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c| 22 -- drivers/net/et
Re: [PATCH tip/core/rcu 1/9] rcu: Provide GP ordering in face of migrations and delays
On Mon, Oct 09, 2017 at 10:16:37AM +0200, Peter Zijlstra wrote: > On Sat, Oct 07, 2017 at 11:28:57AM -0700, Paul E. McKenney wrote: > > But if you are saying that it would be good to have wait_for_completion() > > and complete() directly modeled at some point, no argument. In addition, > > I hope that the memory model is applied to other tools that analyze kernel > > code. > > > > I'm not sure I got the point across; so I'll try once more. Without > > > providing this ordering the completion would be fundamentally broken. It > > > _must_ provide this ordering. > > > > OK, I now understand what you are getting at, and I do very much like > > that guarantee. > > Right, so maybe we should update the completion comments a bit to call > out this property, because I'm not sure its there. That would be very good! > Also, with this, I think the smp_store_release() / smp_load_acquire() is > a perfectly fine abstraction of it, I don't think the model needs to be > taught about the completion interface. Agreed, if we pull too much stuff into the model it might well become difficult to maintain in and of itself. So we do need to choose carefully based on model performance and ease of use. For a performance example, note that locking can be easily modeled with xchg_acquire() for lock acquisition and smp_store_release() for lock release. But here is the resulting performance compared to directly modeling locking within the cat code: xchg_acquire() and # Processes smp_store_release() Direct modeling 2 0.061 s 0.007 s 3 3.542 s 0.039 s 4 569.189 s 0.397 s 5 > 113,853.000 s 5.187 s Combinatorial explosion is a real pain... The problem is that the xchg_acquire() / smp_store_release() approach requires the herd7 tool to consider and then reject all the scenarios where the lock critical sections overlap. In contrast, when direct modeling, those overlapping critical-section scenarios are implicitly rejected by the cat-code that models locking. Don't get me wrong, there is still combinatorial explosion, as you can see from the rightmost column of numbers. After all, the herd7 tool still needs to consider all possible lock-acquisition orders. But the explosion is much less explosive. ;-) Given that we expect locking to be very heavily used, it makes a lot of sense to directly model it, which we (mostly Alan, Andrea, and Luc) have done. Direct modeling of RCU gets similar speedups over emulation, plus the fact that accurate emulation of RCU requires some surprisingly unnatural acts. ("Ghost" accesses unaffected by normal memory-ordering primitives, and "ghost-buster memory barriers" that affect both "ghost" and normal accesses. Plus the transformation is complex, so much so that I broke down and wrote a script, which was used to validate the early versions of the RCU model.) In constrast, I have no reason to believe that direct modeling could make wait_for_completion() / complete() go any faster than the equivalent setup using smp_store_release() / smp_load_acquire(), that is, I don't see anything that could be optimized away byt cat-code. So the only reason to directly model wait_for_completion() / complete() would be to get rid of the need to add the extra term to the "exists" clause. And agreed, we should wait until people are actually being significantly inconvenienced by this, which I do not expect any time soon. > > > Why not? In what sort of cases does it go wobbly? > > > > For one, when it conflicts with maintainability. For example, it would > > probably be OK for some of RCU's rcu_node ->lock acquisitions to skip the > > smp_mb__after_unlock_lock() invocations. But those are slowpaths, and the > > small speedup on only one architecture is just not worth the added pain. > > Especially given the nice wrapper functions that you provided. > > > > But of course if this were instead (say) rcu_read_lock() or common-case > > rcu_read_unlock(), I would be willing to undergo much more pain. On the > > other hand, for that exact reason, that common-case code path doesn't > > acquire locks in the first place. ;-) > > Ah, so for models I would go absolutely minimal; it helps understand > what the strict requirements are and where we over-provide etc.. Agreed, we should be careful how much we promise lest we get locked into needlessly heavyweight implementations. > For actual code you're entirely right, there's no point in trying to be > cute with the rcu-node locks. Simplicity rules etc.. Hear, hear! ;-) Thanx, Paul
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 04:06:20PM -0700, Alexei Starovoitov wrote: >On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) wrote: >> On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: >> >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: >> >> Fix BUG() calls to use BUG_ON(conditional) macros. >> >> >> >> This was found using make coccicheck M=net/core on linux next >> >> tag next-2017092 >> >> >> >> Signed-off-by: Tim Hansen >> >> --- >> >> net/core/skbuff.c | 15 ++- >> >> 1 file changed, 6 insertions(+), 9 deletions(-) >> >> >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 >> >> --- a/net/core/skbuff.c >> >> +++ b/net/core/skbuff.c >> >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, >> >> gfp_t gfp_mask) >> >> /* Set the tail pointer and length */ >> >> skb_put(n, skb->len); >> >> >> >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)) >> >> - BUG(); >> >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); >> > >> >I'm concerned with this change. >> >1. Calling non-trivial bit of code inside the macro is a poor coding style >> >(imo) >> >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and >> >implementation >> >of BUG and BUG_ON look quite different. >> >> For these archs, wouldn't it then be more efficient to use BUG_ON rather >> than BUG()? > >why more efficient? any data to prove that? Just guessing. Either way, is there a particular reason for not using BUG_ON() here besides that it's implementation is "quite different"? >I'm pointing that the change is not equivalent and >this code has been around forever (pre-git days), so I see >no reason to risk changing it. Do you know that BUG_ON() is broken on any archs? If not, "this code has been around forever" is really not an excuse to not touch code. If BUG_ON() behavior is broken somewhere, then it needs to get fixed. -- Thanks, Sasha
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
From: Alexei Starovoitov Date: Mon, 9 Oct 2017 16:06:20 -0700 >> For these archs, wouldn't it then be more efficient to use BUG_ON >> rather than BUG()? > > why more efficient? any data to prove that? It can completely eliminate a branch. For example on powerpc if you use BUG() then the code generated is: test condition branch_not_true 1f unconditional_trap 1: Whereas with BUG_ON() it's just: test condition trap_if_true Which is a lot better even when the branches in the first case are well predicted.
Re: [PATCH v4 1/2] pid: Replace pid bitmap implementation with IDR API
On Mon, 9 Oct 2017 17:13:43 -0400 Gargi Sharma wrote: > This patch replaces the current bitmap implemetation for > Process ID allocation. Functions that are no longer required, > for example, free_pidmap(), alloc_pidmap(), etc. are removed. > The rest of the functions are modified to use the IDR API. > The change was made to make the PID allocation less complex by > replacing custom code with calls to generic API. > I can't say I really understand the locking here. > > ... > > @@ -240,17 +230,11 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) >* >*/ > read_lock(&tasklist_lock); > - nr = next_pidmap(pid_ns, 1); > - while (nr > 0) { > - rcu_read_lock(); > - > - task = pid_task(find_vpid(nr), PIDTYPE_PID); > + nr = 2; > + idr_for_each_entry_continue(&pid_ns->idr, pid, nr) { > + task = pid_task(pid, PIDTYPE_PID); > if (task && !__fatal_signal_pending(task)) > send_sig_info(SIGKILL, SEND_SIG_FORCED, task); > - > - rcu_read_unlock(); > - > - nr = next_pidmap(pid_ns, nr); > } > read_unlock(&tasklist_lock); Especially here. I don't think pidmap_lock is held. Is that IDR iteration safe?
Re: linux-next: manual merge of the drivers-x86 tree with the net-next tree
On Mon, Oct 09, 2017 at 08:56:34PM +0100, Mark Brown wrote: > On Mon, Oct 09, 2017 at 10:43:01PM +0300, Mika Westerberg wrote: > > > If possible, I would rather move this chapter to be before "Networking > > over Thunderbolt cable". Reason is that it then follows NVM flashing > > chapter which is typically where you need to force power in the first > > place. > Agreed. > I guess that's something best sorted out either in the relevant trees or > during the merge window? I'm not sure how we would deal with it in the trees. Best to note this during the merge window - whichever goes in second. Test merge will identify the merge conflict, and we can include a note to Linus on the preference. -- Darren Hart VMware Open Source Technology Center
Re: [PATCH v2] net/core: Fix BUG to BUG_ON conditionals.
On Mon, Oct 09, 2017 at 11:15:40PM +, Levin, Alexander (Sasha Levin) wrote: > On Mon, Oct 09, 2017 at 04:06:20PM -0700, Alexei Starovoitov wrote: > >On Mon, Oct 09, 2017 at 08:26:34PM +, Levin, Alexander (Sasha Levin) > >wrote: > >> On Mon, Oct 09, 2017 at 10:15:42AM -0700, Alexei Starovoitov wrote: > >> >On Mon, Oct 09, 2017 at 11:37:59AM -0400, Tim Hansen wrote: > >> >> Fix BUG() calls to use BUG_ON(conditional) macros. > >> >> > >> >> This was found using make coccicheck M=net/core on linux next > >> >> tag next-2017092 > >> >> > >> >> Signed-off-by: Tim Hansen > >> >> --- > >> >> net/core/skbuff.c | 15 ++- > >> >> 1 file changed, 6 insertions(+), 9 deletions(-) > >> >> > >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >> >> index d98c2e3ce2bf..34ce4c1a0f3c 100644 > >> >> --- a/net/core/skbuff.c > >> >> +++ b/net/core/skbuff.c > >> >> @@ -1350,8 +1350,7 @@ struct sk_buff *skb_copy(const struct sk_buff > >> >> *skb, gfp_t gfp_mask) > >> >> /* Set the tail pointer and length */ > >> >> skb_put(n, skb->len); > >> >> > >> >> - if (skb_copy_bits(skb, -headerlen, n->head, headerlen + > >> >> skb->len)) > >> >> - BUG(); > >> >> + BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + > >> >> skb->len)); > >> > > >> >I'm concerned with this change. > >> >1. Calling non-trivial bit of code inside the macro is a poor coding > >> >style (imo) > >> >2. BUG_ON != BUG. Some archs like mips and ppc have HAVE_ARCH_BUG_ON and > >> >implementation > >> >of BUG and BUG_ON look quite different. > >> > >> For these archs, wouldn't it then be more efficient to use BUG_ON rather > >> than BUG()? > > > >why more efficient? any data to prove that? > > Just guessing. > > Either way, is there a particular reason for not using BUG_ON() here > besides that it's implementation is "quite different"? > > >I'm pointing that the change is not equivalent and > >this code has been around forever (pre-git days), so I see > >no reason to risk changing it. > > Do you know that BUG_ON() is broken on any archs? > > If not, "this code has been around forever" is really not an excuse to > not touch code. > > If BUG_ON() behavior is broken somewhere, then it needs to get fixed. no idea whether it's broken. My main objection is #1. imo it's a very poor coding style to put functions with side-effects into macros. Especially debug/bug/warn-like. For example llvm has DEBUG() macro and everything inside will disappear depending on compilation flags. I wouldn't be surprised if somebody for the name of security (to avoid crash on BUG_ON) will replace BUG/BUG_ON with some other implementation or nop and will have real bugs, since skb_copy_bits() is somehow not called or called in different context.
Re: [locking] 892ad5acca [ 12.849409] WARNING: bad unlock balance detected!
On Mon, Oct 9, 2017 at 8:55 AM, Fengguang Wu wrote: > > 0day kernel testing robot got the below dmesg and the first bad commit is This makes no sense. > [ 12.849409] WARNING: bad unlock balance detected! > [ 12.850157] 4.12.0-00420-g892ad5a #1 Not tainted > [ 12.850870] - > [ 12.851574] rhashtable_thra/109 is trying to release lock > (rcu_preempt_state) at: > [ 12.852701] [] rcu_read_unlock_special+0x3de/0x4a7 > [ 12.853707] but there are no more locks to release! The only RCU release is in the inlined rhashtable_lookup_fast(), but that one looks trivially and obviously correct. It literally just does rcu_read_lock(); obj = rhashtable_lookup(ht, key, params); rcu_read_unlock(); > [ 12.864927] rcu_read_unlock_special+0x3de/0x4a7 > [ 12.865660] __rcu_read_unlock+0x62/0x9f > [ 12.866262] thread_lookup_test+0x400/0x50b > [ 12.866911] threadfunc+0x84f/0x8e0 .. so I'm wondering if there's something wrong in the test harness, or some *serious* issue that causes RCU state to do the wrong thing over preemption or with unlucky interrupt timing. Because that thread_lookup_test() thing really is so trivial as to say the problem is not in the caller. Adding Paul to the cc list. Was there some known problem during the 4.13 merge window that could have caused this? The fact that it came in with the locking merge might also just mean that it's a bug in the locking balance detection, presumably interacting badly with something else. There *is* another email (from Sebastian Andrzej Siewior) that shows that same "WARNING: bad unlock balance detected", with the subject line "RCU boosting and lockdep are not playing nice", and that one blames commit cde50a67397c ("locking/rtmutex: Don't initialize lockdep when not required"). Maybe this was already fixed elsewhere, but I'm not seeing any replies to that other email either (it was send back in mid-August). Oh, and I see that the 0day robot also fingered that commit cde50a67397c in _another_ report, so it's just this one that seems to have oddly bisected to the locking merge. Presumably it's all the same thing, just in two different guises. Linus
Re: [PATCH 2/6] ARM: sun8i: r40: add USB host port nodes for R40
于 2017年10月10日 GMT+08:00 上午5:03:40, Maxime Ripard 写到: >On Sun, Oct 08, 2017 at 04:29:02AM +, Icenowy Zheng wrote: >> From: Icenowy Zheng >> >> Allwinner R40 SoC features a USB OTG port and two USB HOST ports. >> >> Add support for the host ports in the DTSI file. >> >> The OTG controller still cannot work with existing compatibles, and >needs >> more investigation. So it's not added yet. >> >> Signed-off-by: Icenowy Zheng >> --- >> arch/arm/boot/dts/sun8i-r40.dtsi | 78 > >> 1 file changed, 78 insertions(+) >> >> diff --git a/arch/arm/boot/dts/sun8i-r40.dtsi >b/arch/arm/boot/dts/sun8i-r40.dtsi >> index d5a6745409ae..f6c917cbbaac 100644 >> --- a/arch/arm/boot/dts/sun8i-r40.dtsi >> +++ b/arch/arm/boot/dts/sun8i-r40.dtsi >> @@ -173,6 +173,84 @@ >> #size-cells = <0>; >> }; >> >> +usbphy: phy@1c13400 { >> +compatible = "allwinner,sun8i-r40-usb-phy"; >> +reg = <0x01c13400 0x14>, >> + <0x01c14800 0x4>, >> + <0x01c19800 0x4>, >> + <0x01c1c800 0x4>; >> +reg-names = "phy_ctrl", >> +"pmu0", >> +"pmu1", >> +"pmu2"; >> +clocks = <&ccu CLK_USB_PHY0>, >> + <&ccu CLK_USB_PHY1>, >> + <&ccu CLK_USB_PHY2>; >> +clock-names = "usb0_phy", >> + "usb1_phy", >> + "usb2_phy"; >> +resets = <&ccu RST_USB_PHY0>, >> + <&ccu RST_USB_PHY1>, >> + <&ccu RST_USB_PHY2>; >> +reset-names = "usb0_reset", >> + "usb1_reset", >> + "usb2_reset"; >> +status = "disabled"; >> +#phy-cells = <1>; >> +}; >> + >> +ehci1: usb@1c19000 { >> +compatible = "allwinner,sun8i-r40-ehci", "generic-ehci"; >> +reg = <0x01c19000 0x100>; > >What is the actual size here? The OHCI/EHCI/PHY-PHY three parts are listed in the user manual as one MMIO zone. The size can be at most 0x400, as the OHCI is at offset 0x400. > >> +interrupts = ; >> +clocks = <&ccu CLK_BUS_OHCI1>, >> + <&ccu CLK_BUS_EHCI1>, >> + <&ccu CLK_USB_OHCI1>; >> +resets = <&ccu RST_BUS_OHCI1>, >> + <&ccu RST_BUS_EHCI1>; > >Why do you need to take the OHCI resources too? AW's strange design -- without OHCI resources taken EHCI won't work. > >Maxime
Re: [PATCH 3/6] ARM: sun8i: r40: add 5V regulator for Banana Pi M2 Ultra
于 2017年10月10日 GMT+08:00 上午5:04:07, Maxime Ripard 写到: >On Sun, Oct 08, 2017 at 04:29:03AM +, Icenowy Zheng wrote: >> On newer revisions of the Banana Pi M2 Ultra boards, the 5V power >output >> (used by HDMI, SATA and USB) is controller via a GPIO. >> >> Add the regulator node for it. >> >> Older revisions just have the 5V power output always on, and the GPIO >is >> reserved on these boards. So it won't affect the older revisions. >> >> Signed-off-by: Icenowy Zheng >> --- >> arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts >b/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts >> index 7b52608cebe6..035599d870b9 100644 >> --- a/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts >> +++ b/arch/arm/boot/dts/sun8i-r40-bananapi-m2-ultra.dts >> @@ -78,6 +78,15 @@ >> }; >> }; >> >> +reg_vcc5v0: vcc5v0 { >> +compatible = "regulator-fixed"; >> +regulator-name = "vcc5v0"; >> +regulator-min-microvolt = <500>; >> +regulator-max-microvolt = <500>; >> +gpio = <&pio 7 23 GPIO_ACTIVE_HIGH>; /* PH23 */ >> +enable-active-high; > >This is redundant with the GPIO flag I remember someone suggested me to add this property (maybe wens). It's said that the driver may not process the GPIO_ACTIVE_HIGH correctly. > >Maxime