Re: Kernel oops with bluetooth usb dongle
* Quel Qun | 2008-02-18 00:01:21 [+]: Please send me your .config file and process list (ps uax > ps_list) after the crash. I have a dongle with the same usb id as yours and I can't reproduce the crash. So it is either some .config magic or one of your programs is accessing the dongle. Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] panic after umount (biscted)
After umount of /dev/md/1 on /mnt/sec type xfs (rw,nosuid,nodev,noatime,nodiratime) I end up with [1] on v2.6.24-rc1. It worked fine with v2.6.23. Bisec came to conclusion that |fd5d806266935179deda1502101624832eacd01f is first bad commit |commit fd5d806266935179deda1502101624832eacd01f |Author: Jens Axboe <[EMAIL PROTECTED]> |Date: Tue Oct 16 11:05:02 2007 +0200 | |block: convert blkdev_issue_flush() to use empty barriers | |Then we can get rid of ->issue_flush_fn() and all the driver private |implementations of that. | |Signed-off-by: Jens Axboe <[EMAIL PROTECTED]> | |:04 04 75f5c511fd785fa4c75b955f1818c382593a22ec adf460e288740ca9c0b5ab823dfc2ca04980f3d2 M block |:04 04 a972fe9a2433a00c79fd5ead584c482cd08b4b2c 0f4534094b63fe21c5cf8a6170736abe5c518c00 M drivers |:04 04 ba2a9f681826db89ba5192a7db7139baecde786a fc68a77a41d25c283f39f712216566e9b36d00fc M include is the bad boy here. The last panic (with first bad commit) is [2] (there is almost no difference). My .config [3], lspci [4], bisec-log [5]. The md1 raid is a |Personalities : [raid1] |md1 : active raid1 sda2[1] | 174843328 blocks [2/1] [_U] | I hope this was usefull. Now, I'm going to rebuild my raid now [1] http://download.breakpoint.cc/bug/bug_rc1.jpeg 166 KiB [2] http://download.breakpoint.cc/bug/bug_last_commit.jpeg 217 KiB [3] http://download.breakpoint.cc/bug/config.txt 27 KiB [4] http://download.breakpoint.cc/bug/lspci.txt 2.3 KiB [5] http://download.breakpoint.cc/bug/git-bisect-log.txt 2.3K Sebastian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] panic after umount (biscted)
* Jens Axboe | 2007-10-26 11:32:42 [+0200]: >On Fri, Oct 26 2007, Jens Axboe wrote: >> > >> > I hope this was usefull. Now, I'm going to rebuild my raid now >> > >> Thanks a lot, a full report on this issue. Will get this fixed up asap. No problem, thanks for working on that :) > >Does this work? > >diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >index 61fdaf0..cf47fcb 100644 >--- a/drivers/scsi/scsi_lib.c >+++ b/drivers/scsi/scsi_lib.c >@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd) >* kmapping pages) >*/ > cmd->use_sg = req->nr_phys_segments; >+ if (!cmd->use_sg) >+ return 0; > > /* >* If sg table allocation fails, requeue request later. >@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, >struct request *req) > if (req->bio) { > int ret; > >- BUG_ON(!req->nr_phys_segments); >+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size); > > ret = scsi_init_io(cmd); > if (unlikely(ret)) > Nope. I get [1] on manual umount and [2] on system reboot. This is 24-rc1 with this patch on top. [1] http://download.breakpoint.cc/bug/bug_rc1_patch_manual.jpeg 163 KiB [2] http://download.breakpoint.cc/bug/bug_rc1_patch_reboot.jpeg 171 KiB >-- >Jens Axboe Sebastian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] panic after umount (biscted)
* Jens Axboe | 2007-10-26 13:42:30 [+0200]: >> [2] http://download.breakpoint.cc/bug/bug_rc1_patch_reboot.jpeg 171 KiB > >Ah, second BUG() for same issue. Try this one. This? > >diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >index 61fdaf0..57fde7b 100644 >--- a/drivers/scsi/scsi_lib.c >+++ b/drivers/scsi/scsi_lib.c >@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd) >* kmapping pages) >*/ > cmd->use_sg = req->nr_phys_segments; >+ if (!cmd->use_sg) >+ return 0; > > /* >* If sg table allocation fails, requeue request later. >@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, >struct request *req) > if (req->bio) { > int ret; > >- BUG_ON(!req->nr_phys_segments); >+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size); > > ret = scsi_init_io(cmd); > if (unlikely(ret)) >@@ -1236,9 +1238,10 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct >request *req) > if (ret != BLKPREP_OK) > return ret; > /* >- * Filesystem requests must transfer data. >+ * Filesystem requests must transfer data, unless it's an empty >+ * barrier. >*/ >- BUG_ON(!req->nr_phys_segments); >+ BUG_ON(!req->nr_phys_segments && !bio_empty_barrier(req->bio)); > > cmd = scsi_get_cmd_from_req(sdev, req); > if (unlikely(!cmd)) > I'm afraid you did not make it to the next level. I hope you have another man :). [1] shows the result. I double checked it, it seems to be the same bug() with the second patch. [1] http://download.breakpoint.cc/bug/bug_patch2.jpeg 134 KiB >Jens Axboe Sebastian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] panic after umount (biscted)
* Jens Axboe | 2007-10-27 13:39:16 [+0200]: >> [1] http://download.breakpoint.cc/bug/bug_patch2.jpeg 134 KiB > >OK, can you see what this produces? > >diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >index 61fdaf0..4042269 100644 >--- a/drivers/scsi/scsi_lib.c >+++ b/drivers/scsi/scsi_lib.c >@@ -1115,6 +1115,8 @@ static int scsi_init_io(struct scsi_cmnd *cmd) >* kmapping pages) >*/ > cmd->use_sg = req->nr_phys_segments; >+ if (!cmd->use_sg) >+ return 0; > > /* >* If sg table allocation fails, requeue request later. >@@ -1191,7 +1193,7 @@ int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, >struct request *req) > if (req->bio) { > int ret; > >- BUG_ON(!req->nr_phys_segments); >+ BUG_ON(!req->nr_phys_segments && req->bio->bi_size); > > ret = scsi_init_io(cmd); > if (unlikely(ret)) >@@ -1236,9 +1238,11 @@ int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct >request *req) > if (ret != BLKPREP_OK) > return ret; > /* >- * Filesystem requests must transfer data. >+ * Filesystem requests must transfer data, unless it's an empty >+ * barrier. >*/ >- BUG_ON(!req->nr_phys_segments); >+ if (!req->nr_phys_segments && !bio_empty_barrier(req->bio)) >+ blk_dump_rq_flags(req, "scsi"); > > cmd = scsi_get_cmd_from_req(sdev, req); > if (unlikely(!cmd)) Good, [1] has the dmesg output after umount. [1] http://download.breakpoint.cc/bug/bug_patch3.jpeg 36 KiB >-- >Jens Axboe Sebastian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] compat_ioctl requires CONFIG_BLOCK
Got with randconfig include/linux/loop.h:66: error: expected specifier-qualifier-list before 'request_queue_t' make[1]: *** [fs/compat_ioctl.o] Error 1 parts of compat ioctl require CONFIG_BLOCK to be set. Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> Index: b/fs/compat_ioctl.c === --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -63,7 +63,9 @@ #include #include #include +#ifdef CONFIG_BLOCK #include +#endif #include #include @@ -3491,8 +3493,10 @@ HANDLE_IOCTL(LPSETTIMEOUT, lp_timeout_tr IGNORE_IOCTL(VFAT_IOCTL_READDIR_BOTH32) IGNORE_IOCTL(VFAT_IOCTL_READDIR_SHORT32) +#ifdef CONFIG_BLOCK /* loop */ IGNORE_IOCTL(LOOP_CLR_FD) +#endif }; #define IOCTL_HASHSIZE 256 -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] i2o: defined but not used.
Got with randconfig drivers/message/i2o/exec-osm.c:539: warning: 'i2o_exec_lct_notify' defined but not used Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> Index: b/drivers/message/i2o/exec-osm.c === --- a/drivers/message/i2o/exec-osm.c +++ b/drivers/message/i2o/exec-osm.c @@ -389,9 +389,7 @@ static void i2o_exec_lct_modified(struct if (i2o_device_parse_lct(c) != -EAGAIN) change_ind = c->lct->change_ind + 1; -#ifdef CONFIG_I2O_LCT_NOTIFY_ON_CHANGES i2o_exec_lct_notify(c, change_ind); -#endif }; /** @@ -525,6 +523,7 @@ int i2o_exec_lct_get(struct i2o_controll return rc; } +#ifdef CONFIG_I2O_LCT_NOTIFY_ON_CHANGES /** * i2o_exec_lct_notify - Send a asynchronus LCT NOTIFY request * @c: I2O controller to which the request should be send @@ -570,6 +569,13 @@ static int i2o_exec_lct_notify(struct i2 return 0; }; +#else +static int i2o_exec_lct_notify(struct i2o_controller *c, u32 change_ind) +{ + return 0; +} +#endif + /* Exec OSM driver struct */ struct i2o_driver i2o_exec_driver = { .name = OSM_NAME, -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] compat_ioctl requires CONFIG_BLOCK
* Arnd Bergmann | 2007-07-21 01:08:57 [+0200]: >On Saturday 21 July 2007, Sebastian Siewior wrote: >> >> Got with randconfig >> include/linux/loop.h:66: error: expected specifier-qualifier-list before >> 'request_queue_t' >> make[1]: *** [fs/compat_ioctl.o] Error 1 >> >> parts of compat ioctl require CONFIG_BLOCK to be set. >> >> Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> >> Index: b/fs/compat_ioctl.c >> === >> --- a/fs/compat_ioctl.c >> +++ b/fs/compat_ioctl.c >> @@ -63,7 +63,9 @@ >> #include >> #include >> #include >> +#ifdef CONFIG_BLOCK >> #include >> +#endif > >Adding #ifdef around an #include is considered bad style. Better just >make loop.h compile without any conditionals. Does the below >patch work for you? > Yes it does. > Arnd <>< Sebastian > >--- a/include/linux/loop.h >+++ b/include/linux/loop.h >@@ -63,7 +63,7 @@ struct loop_device { > struct task_struct *lo_thread; > wait_queue_head_t lo_event; > >- request_queue_t *lo_queue; >+ struct request_queue*lo_queue; > struct gendisk *lo_disk; > struct list_headlo_list; > }; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i2o: defined but not used.
* Andrew Morton | 2007-07-21 03:31:33 [-0700]: >On Sat, 21 Jul 2007 00:58:01 +0200 Sebastian Siewior <[EMAIL PROTECTED]> wrote: > >> Got with randconfig > >randconfig apparently generates impossible configs. Please always >run `make oldconfig' after the randconfig, then do the test build. Ah okey, I did not know this. CONFIG_I2O=y # CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set Is still in my .config after oldconfig. Sebastian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] x86_64: use asm() like the other atomic operations already do.
As Segher pointed out, inline asm is better than the volatile casting all over the place. From the PowerPC patch description: Also use inline functions instead of macros; this actually improves code generation (some code becomes a little smaller, probably because of improved alias information -- just a few hundred bytes total on a default kernel build, nothing shocking). My config with march=k8 and gcc (GCC) 4.1.2 (Gentoo 4.1.2): textdata bss dec hex filename 4002473 385936 474440 4862849 4a3381 atomic_normal/vmlinux 4002587 385936 474440 4862963 4a33f3 atomic_inlineasm/vmlinux 4003911 385936 474440 4864287 4a391f atomic_volatile/vmlinux 4003959 385936 474440 4864335 4a394f atomic_volatile_inline/vmlinux Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> --- a/include/asm-x86_64/atomic.h +++ b/include/asm-x86_64/atomic.h @@ -29,19 +29,34 @@ typedef struct { int counter; } atomic_t /** * atomic_read - read atomic variable * @v: pointer of type atomic_t - * + * * Atomically reads the value of @v. - */ -#define atomic_read(v) ((v)->counter) + */ +static __inline__ int atomic_read(const atomic_t *v) +{ + int t; + + __asm__ __volatile__( + "movl %1, %0" + : "=r"(t) + : "m"(v->counter)); + return t; +} /** * atomic_set - set atomic variable * @v: pointer of type atomic_t * @i: required value - * + * * Atomically sets the value of @v to @i. - */ -#define atomic_set(v,i)(((v)->counter) = (i)) + */ +static __inline__ void atomic_set(atomic_t *v, int i) +{ + __asm__ __volatile__( + "movl %1, %0" + : "=m"(v->counter) + : "ir"(i)); +} /** * atomic_add - add integer to atomic variable @@ -206,7 +221,7 @@ static __inline__ int atomic_sub_return( /* An 64bit atomic type */ -typedef struct { volatile long counter; } atomic64_t; +typedef struct { long counter; } atomic64_t; #define ATOMIC64_INIT(i) { (i) } @@ -217,7 +232,16 @@ typedef struct { volatile long counter; * Atomically reads the value of @v. * Doesn't imply a read memory barrier. */ -#define atomic64_read(v) ((v)->counter) +static __inline__ long atomic64_read(const atomic64_t *v) +{ + long t; + + __asm__ __volatile__( + "movq %1, %0" + : "=r"(t) + : "m"(v->counter)); + return t; +} /** * atomic64_set - set atomic64 variable @@ -226,7 +250,13 @@ typedef struct { volatile long counter; * * Atomically sets the value of @v to @i. */ -#define atomic64_set(v,i) (((v)->counter) = (i)) +static __inline__ void atomic64_set(atomic64_t *v, long i) +{ + __asm__ __volatile__( + "movq %1, %0" + : "=m"(v->counter) + : "ir"(i)); +} /** * atomic64_add - add integer to atomic64 variable -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/2] use asm() for atomic_{read|set}
I converted i386+x86-64. Compiled, booted and played for a while. The description of both patches contains the file size of four kernel builds: - "normal" is 28e8351ac22de25034e048c680014ad824323c65 as it - "inline asm" is with this patch - "inline volatile" is *(volatile int *)&(v)->counter as a static inline function - "volatile" is *(volatile int *)&(v)->counter as a #define macro I hope I don't encourage anyone to use macros over inline functions. Sebastian -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/2] i386: use asm() like the other atomic operations already do.
As Segher pointed out, inline asm is better than the volatile casting all over the place. From the PowerPC patch description: Also use inline functions instead of macros; this actually improves code generation (some code becomes a little smaller, probably because of improved alias information -- just a few hundred bytes total on a default kernel build, nothing shocking). My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2): textdata bss dec hex filename 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux 3436201 249176 176128 3861505 3aec01 atomic_inline_volatile/vmlinux 3436203 249176 176128 3861507 3aec03 atomic_volatile/vmlinux Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> --- a/include/asm-i386/atomic.h +++ b/include/asm-i386/atomic.h @@ -22,19 +22,34 @@ typedef struct { int counter; } atomic_t /** * atomic_read - read atomic variable * @v: pointer of type atomic_t - * + * * Atomically reads the value of @v. - */ -#define atomic_read(v) ((v)->counter) + */ +static __inline__ int atomic_read(const atomic_t *v) +{ + int t; + + __asm__ __volatile__( + "movl %1,%0" + : "=r"(t) + : "m"(v->counter)); + return t; +} /** * atomic_set - set atomic variable * @v: pointer of type atomic_t * @i: required value - * + * * Atomically sets the value of @v to @i. - */ -#define atomic_set(v,i)(((v)->counter) = (i)) + */ +static __inline__ void atomic_set(atomic_t *v, int i) +{ + __asm__ __volatile__( + "movl %1,%0" + : "=m"(v->counter) + : "ir"(i)); +} /** * atomic_add - add integer to atomic variable -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] i386: use asm() like the other atomic operations already do.
* Andi Kleen | 2007-08-15 02:20:35 [+0200]: >> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2): >> textdata bss dec hex filename >> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux >> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux > >What is the difference between atomic_normal and atomic_inlineasm? atomic normal is Linus' tree, commit 28e8351. Inline asm is with this patch. I wrote this in 0/2 (you want this here as well?). > >> /** >> * atomic_read - read atomic variable >> * @v: pointer of type atomic_t >> - * >> + * > >Please don't change white space in patches I fixed white space errors. SubmitChecklist:24 says I should not introduce any new ones so fixing existig sounds like the right thing to do. No white space fixing in future? > >> * Atomically reads the value of @v. >> - */ >> -#define atomic_read(v) ((v)->counter) >> + */ >> +static __inline__ int atomic_read(const atomic_t *v) >> +{ >> +int t; >> + >> +__asm__ __volatile__( > >And don't use __*__ in new code Okey. > >-Andi Sebastian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] i386: use asm() like the other atomic operations already do.
* Satyam Sharma | 2007-08-15 18:32:10 [+0530]: >On Wed, 15 Aug 2007, Herbert Xu wrote: > >> Andi Kleen <[EMAIL PROTECTED]> wrote: >> >> My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2): >> >> textdata bss dec hex filename >> >> 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux >> >> 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux >> > >> > What is the difference between atomic_normal and atomic_inlineasm? >> >> The inline asm stops certain optimisations from occuring. > >Yup, the "__volatile__" after "__asm__" shouldn't be required. The >previous definitions allowed the compiler to optimize certain atomic >ops away, and considering we haven't seen any bugs due to the past >behaviour anyway, changing it like this sounds unnecessary. > >The second behavioral change that comes about here is the force- >constraining of v->counter to memory in the extended asm's constraint >sets. But that's required to give "volatility"-like behaviour, which >I suspect is the goal of this patch. exactly. >Sebastian, could you look at the kernel images in detail and see why, >how and what in the text expanded by 1158 bytes with these changes? Wow, easy. "normal" is Linus' git. It has no volatile or anything. I suspect the compiler to "optimize" some of the reads and sets away (which remain after my patch is applied). Anyway, I tried to find out the source of this and it is not that easy. If the text is getting bigger, all references to the data section are changed and I can't simply diff. Than I searched for two "equal" .o files which differ in size. This was not better because gcc reorganized the whole .o file and the functions that were at the beginning in the first file, were at a different position in the second one. For those who have plenty of time in their hands, I posted the four kernels [1] and my four O= folders [2]. Once I have some time, I try to diff the specific function within the .o file which used the atomic_{read|set} operation. >> I'm still unconvinced why we need this because nobody has >> brought up any examples of kernel code that legitimately >> need this. > >Me too, but I do find these more palatable than the variants using >"volatile", I admit. Yes. The inline asm is even smaller than the volatile ones, what sounds like better optimization by the gcc (and it does exactly what volatile was used for). [1] http://download.breakpoint.cc/kernel_builds_vmlinux.tbz2 8.3 MiB [2] http://download.breakpoint.cc/kernel_builds.tbz2 117 MiB > >Satyam Sebastian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/2] use asm() for atomic_{read|set} (shot 2)
Diff to last post: - no __*__ functions used. - no white space fixes. I converted i386+x86-64 arch. The description of both patches contains the file size of four kernel builds: - "normal" is 28e8351ac22de25034e048c680014ad824323c65 as it - "inline asm" is with this patch - "inline volatile" is *(volatile int *)&(v)->counter as a static inline function - "volatile" is *(volatile int *)&(v)->counter as a #define macro Sebastian -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/2] i386: use asm() like the other atomic operations already do.
As Segher pointed out, inline asm is better than the volatile casting all over the place. From the PowerPC patch description: Also use inline functions instead of macros; this actually improves code generation (some code becomes a little smaller, probably because of improved alias information -- just a few hundred bytes total on a default kernel build, nothing shocking). My config with march=pentium-m and gcc (GCC) 4.1.2 (Gentoo 4.1.2): textdata bss dec hex filename 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux 3435308 249176 176128 3860612 3ae884 atomic_inlineasm/vmlinux 3436201 249176 176128 3861505 3aec01 atomic_inline_volatile/vmlinux 3436203 249176 176128 3861507 3aec03 atomic_volatile/vmlinux Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Chris Snook <[EMAIL PROTECTED]> --- a/include/asm-i386/atomic.h +++ b/include/asm-i386/atomic.h @@ -25,7 +25,16 @@ typedef struct { int counter; } atomic_t * * Atomically reads the value of @v. */ -#define atomic_read(v) ((v)->counter) +static inline int atomic_read(const atomic_t *v) +{ + int t; + + asm volatile( + "movl %1,%0" + : "=r"(t) + : "m"(v->counter)); + return t; +} /** * atomic_set - set atomic variable @@ -34,7 +43,13 @@ typedef struct { int counter; } atomic_t * * Atomically sets the value of @v to @i. */ -#define atomic_set(v,i)(((v)->counter) = (i)) +static inline void atomic_set(atomic_t *v, int i) +{ + asm volatile( + "movl %1,%0" + : "=m"(v->counter) + : "ir"(i)); +} /** * atomic_add - add integer to atomic variable -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] x86_64: use asm() like the other atomic operations already do.
As Segher pointed out, inline asm is better than the volatile casting all over the place. From the PowerPC patch description: Also use inline functions instead of macros; this actually improves code generation (some code becomes a little smaller, probably because of improved alias information -- just a few hundred bytes total on a default kernel build, nothing shocking). My config with march=k8 and gcc (GCC) 4.1.2 (Gentoo 4.1.2): textdata bss dec hex filename 4002473 385936 474440 4862849 4a3381 atomic_normal/vmlinux 4002587 385936 474440 4862963 4a33f3 atomic_inlineasm/vmlinux 4003911 385936 474440 4864287 4a391f atomic_volatile/vmlinux 4003959 385936 474440 4864335 4a394f atomic_volatile_inline/vmlinux Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Chris Snook <[EMAIL PROTECTED]> --- a/include/asm-x86_64/atomic.h +++ b/include/asm-x86_64/atomic.h @@ -32,7 +32,16 @@ typedef struct { int counter; } atomic_t * * Atomically reads the value of @v. */ -#define atomic_read(v) ((v)->counter) +static inline int atomic_read(const atomic_t *v) +{ + int t; + + asm volatile( + "movl %1, %0" + : "=r"(t) + : "m"(v->counter)); + return t; +} /** * atomic_set - set atomic variable @@ -41,7 +50,13 @@ typedef struct { int counter; } atomic_t * * Atomically sets the value of @v to @i. */ -#define atomic_set(v,i)(((v)->counter) = (i)) +static inline void atomic_set(atomic_t *v, int i) +{ + asm volatile( + "movl %1, %0" + : "=m"(v->counter) + : "ir"(i)); +} /** * atomic_add - add integer to atomic variable @@ -206,7 +221,7 @@ static __inline__ int atomic_sub_return( /* An 64bit atomic type */ -typedef struct { volatile long counter; } atomic64_t; +typedef struct { long counter; } atomic64_t; #define ATOMIC64_INIT(i) { (i) } @@ -217,7 +232,16 @@ typedef struct { volatile long counter; * Atomically reads the value of @v. * Doesn't imply a read memory barrier. */ -#define atomic64_read(v) ((v)->counter) +static inline long atomic64_read(const atomic64_t *v) +{ + long t; + + asm volatile( + "movq %1, %0" + : "=r"(t) + : "m"(v->counter)); + return t; +} /** * atomic64_set - set atomic64 variable @@ -226,7 +250,13 @@ typedef struct { volatile long counter; * * Atomically sets the value of @v to @i. */ -#define atomic64_set(v,i) (((v)->counter) = (i)) +static inline void atomic64_set(atomic64_t *v, long i) +{ + asm volatile( + "movq %1, %0" + : "=m"(v->counter) + : "ir"(i)); +} /** * atomic64_add - add integer to atomic64 variable -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [mmc] fix compile without LED Triggers
drivers/mmc/core/host.c: In function 'mmc_remove_host': drivers/mmc/core/host.c:146: error: implicit declaration of function 'led_trigger_unregister' drivers/mmc/core/host.c:146: error: 'struct mmc_host' has no member named 'led' Signed-off-by: Sebastian Siewior <[EMAIL PROTECTED]> --- include/linux/leds.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/leds.h b/include/linux/leds.h index dc1178f..24c4830 100644 --- a/include/linux/leds.h +++ b/include/linux/leds.h @@ -98,6 +98,7 @@ extern void led_trigger_event(struct led_trigger *trigger, #define DEFINE_LED_TRIGGER_GLOBAL(x) #define led_trigger_register_simple(x, y) do {} while(0) #define led_trigger_unregister_simple(x) do {} while(0) +#define led_trigger_unregister(x) do {} while(0) #define led_trigger_event(x, y) do {} while(0) #endif -- 1.5.3.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.24-rc1-git: crash on shutdown/unmount?
* Jens Axboe | 2007-11-01 11:51:09 [+0100]: >On Wed, Oct 31 2007, Alistair John Strachan wrote: >> Hi Jens, >> >> I guessed from the oops that you might have an idea what's causing this oops >> on shutdown/unmount. The git version (describe), a screenshot showing the >> oops, a config, and dmesg for a booted kernel are available from: >> >> http://devzero.co.uk/~alistair/oops-20071031/ >> >> I went back to -rc1 and it still happens there too. If you need any more >> information or want me to bisect it, please let me know. > >Does this work for you? Yes, it does. Thanks for working on that. Sorry for the late reply but I run -ENOINET. Acked-by: Sebastian Siewior <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v1 00/25] printk: new implementation
On 2019-03-13 09:19:32 [+0100], John Ogness wrote: > recursive situation. As you are pointing out, the notification/wake > component of printk_safe will still be needed. I will leave that (small) > part in printk_safe.c. Does this mean we keep irq_work part or we bury it and solve it by other means? > John Ogness Sebastian
Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
On 2021-01-07 11:45:39 [+0100], Peter Zijlstra wrote: > On Thu, Jan 07, 2021 at 05:18:41PM +0800, Ran Wang wrote: > > + > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && > > + !strncmp(p->comm, "ksoftirqd/", 10)) > > + schedule_hrtimeout(&to, > > + HRTIMER_MODE_REL | HRTIMER_MODE_HARD); > > + else > > + schedule_hrtimeout(&to, HRTIMER_MODE_REL); > > This is horrific, why did you not self-censor and spare me the mental > anguish of having to formulate a CoC compliant response? > > It also violates coding style, but given the total lack of any sense, > that seems like a minor detail. > > Why can't we use HRTIMER_MODE_HARD unconditionally? I had a similar patch in -RT and dropped it in v5.10-rc7-rt16. It was added because RT couldn't boot since creating the boot-threads didn't work before the ksoftirqd was up. This was fixed by commit 26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()") and live was good again. tglx (also) suggested to add HRTIMER_MODE_HARD unconditionally (it looked at SYSTEM_STATE back then) and I was only worried some abuse via userland. This sleep can be triggered by ptrace/strace() and with brief testing I can trigger the sleep there but I don't get it anywhere near where I would notice it with cyclictest. Sebastian
Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
On 2021-01-08 08:45:14 [+], Ran Wang wrote: > Hi Sebastian, Peter Hi, > > I had a similar patch in -RT and dropped it in v5.10-rc7-rt16. > > It was added because RT couldn't boot since creating the boot-threads > > didn't work before the ksoftirqd was up. This was fixed by commit > >26c7295be0c5e ("kthread: Do not preempt current task if it is going to > > call schedule()") > > I tried applying above commit to linux-5.6.y-rt, it could resolve my problem > on LX2160ARDB, THANKS! so in other words all this could have been avoided by using a supported or maintained RT series. The v5.4 series has this patch, v5.6 isn't maintained anymore so it is likely that there is more missing. Sebastian
Re: [PATCH] [RFC] rt: kernel/sched/core: fix kthread_park() pending too long when CPU un-plugged
On 2021-01-08 10:32:36 [+0100], Peter Zijlstra wrote: > It's a single task wakeup (the caller), doing that from hardirq context > really should not be a problem, we do lots of that in RT already. I'm not worry about that single wakeup but about an artificial case where you manage to accumulate multiple single wake ups in a short time frame. Sebastian
[PATCH] fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
From: Thomas Gleixner Bit spinlocks are problematic if PREEMPT_RT is enabled, because they disable preemption, which is undesired for latency reasons and breaks when regular spinlocks are taken within the bit_spinlock locked region because regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT replaces the bit spinlocks with regular spinlocks to avoid this problem. Bit spinlocks are also not covered by lock debugging, e.g. lockdep. Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock. Signed-off-by: Thomas Gleixner [bigeasy: remove the wrapper and use always spinlock_t] Signed-off-by: Sebastian Andrzej Siewior --- fs/buffer.c | 19 +++ fs/ext4/page-io.c | 8 +++- fs/ntfs/aops.c | 9 +++-- include/linux/buffer_head.h | 6 +++--- 4 files changed, 16 insertions(+), 26 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 131d39ec7d316..eab37fbaa439f 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -275,8 +275,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate) * decide that the page is now completely done. */ first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + spin_lock_irqsave(&first->uptodate_lock, flags); clear_buffer_async_read(bh); unlock_buffer(bh); tmp = bh; @@ -289,8 +288,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate) } tmp = tmp->b_this_page; } while (tmp != bh); - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->uptodate_lock, flags); /* * If none of the buffers had errors and they are all @@ -302,8 +300,7 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate) return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->uptodate_lock, flags); return; } @@ -331,8 +328,7 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate) } first = page_buffers(page); - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &first->b_state); + spin_lock_irqsave(&first->uptodate_lock, flags); clear_buffer_async_write(bh); unlock_buffer(bh); @@ -344,14 +340,12 @@ void end_buffer_async_write(struct buffer_head *bh, int uptodate) } tmp = tmp->b_this_page; } - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->uptodate_lock, flags); end_page_writeback(page); return; still_busy: - bit_spin_unlock(BH_Uptodate_Lock, &first->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&first->uptodate_lock, flags); return; } EXPORT_SYMBOL(end_buffer_async_write); @@ -3420,6 +3414,7 @@ struct buffer_head *alloc_buffer_head(gfp_t gfp_flags) struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags); if (ret) { INIT_LIST_HEAD(&ret->b_assoc_buffers); + spin_lock_init(&ret->uptodate_lock); preempt_disable(); __this_cpu_inc(bh_accounting.nr); recalc_bh_state(); diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 12ceadef32c5a..7745ed23c6ad9 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -87,11 +87,10 @@ static void ext4_finish_bio(struct bio *bio) } bh = head = page_buffers(page); /* -* We check all buffers in the page under BH_Uptodate_Lock +* We check all buffers in the page under uptodate_lock * to avoid races with other end io clearing async_write flags */ - local_irq_save(flags); - bit_spin_lock(BH_Uptodate_Lock, &head->b_state); + spin_lock_irqsave(&head->uptodate_lock, flags); do { if (bh_offset(bh) < bio_start || bh_offset(bh) + bh->b_size > bio_end) { @@ -103,8 +102,7 @@ static void ext4_finish_bio(struct bio *bio) if (bio->bi_status) buffer_io_error(bh); } while ((bh = bh->b_this_page) != head); - bit_spin_unlock(BH_Uptodate_Lock, &head->b_state); - local_irq_restore(flags); + spin_unlock_irqrestore(&head->uptodate_lock, flags); if (!under_io) { fscrypt_free_bounce_page(bounce_page); end_page_writeback(page); diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c index 7202a1e39d70c..14ca433b3a9e4 100644 --- a/fs/ntfs/aops.c +++ b/fs/
Re: [patch V2 0/7] fs: Substitute bit-spinlocks for PREEMPT_RT and debugging
On 2019-08-10 01:18:34 [-0700], Christoph Hellwig wrote: > > > Does SLUB work on -rt at all? > > > > It's the only allocator we support with a few tweaks :) > > What do you do about this particular piece of code there? This part remains untouched. This "lock" is acquired within ->list_lock which is a raw_spinlock_t and disables preemption/interrupts on -RT. Sebastian
Re: [PATCH] fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
On 2019-08-20 20:01:14 [+0200], Thomas Gleixner wrote: > On Tue, 20 Aug 2019, Matthew Wilcox wrote: > > On Tue, Aug 20, 2019 at 07:08:18PM +0200, Sebastian Siewior wrote: > > > Bit spinlocks are problematic if PREEMPT_RT is enabled, because they > > > disable preemption, which is undesired for latency reasons and breaks when > > > regular spinlocks are taken within the bit_spinlock locked region because > > > regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT > > > replaces the bit spinlocks with regular spinlocks to avoid this problem. > > > Bit spinlocks are also not covered by lock debugging, e.g. lockdep. > > > > > > Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock. > > > > > > Signed-off-by: Thomas Gleixner > > > [bigeasy: remove the wrapper and use always spinlock_t] > > > > Uhh ... always grow the buffer_head, even for non-PREEMPT_RT? Why? > > Christoph requested that: > > https://lkml.kernel.org/r/20190802075612.ga20...@infradead.org What do we do about this one? > Thanks, > > tglx Sebastian
Re: [patch 09/10] sched/core: Add migrate_disable/enable()
On 2020-09-18 10:22:32 [+0200], pet...@infradead.org wrote: > > > One reason for not allowing migrate_disable() to sleep was: FPU code. > > > > > > Could it be it does something like: > > > > > > preempt_disable(); > > > spin_lock(); > > > > > > spin_unlock(); > > > preempt_enable(); > > > > > > Where we'll never get preempted while migrate_disable()'d and thus never > > > trigger any of the sleep paths? > > > > I try to get rid of something like that. This doesn't work either way > > because the spin_lock() may block which it can't with disabled > > preemption. > > Yeah, that obviously should have been migrate_disable/enable instead of > spin_lock/unlock :/ Ah. Me stupid. fpregs_lock() does preempt_disable(); local_bh_disable(); which is more or less the "official" pattern. As of today local_bh_disable() does migrate_disable() / spin_lock(). Not sure what we end up with for local_bh_disable() in the end. We used not have a BLK here on RT but ended up in all kind of locking problems because vanilla treats local_bh_disable() as a BLK and uses it for locking. Today we have a per-CPU spinlock_t in local_bh_disable() to emulate the BKL. But this pattern above isn't working due to the atomic part… Sebastian
Re: [patch 09/10] sched/core: Add migrate_disable/enable()
On 2020-09-17 16:24:38 [+0200], pet...@infradead.org wrote: > And if I'm not mistaken, the above migrate_enable() *does* require being > able to schedule, and our favourite piece of futex: > > raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock); > spin_unlock(q.lock_ptr); > > is broken. Consider that spin_unlock() doing migrate_enable() with a > pending sched_setaffinity(). There are two instances of the above and only in the futex code and we have sort of duct tape for that by manually balancing the migrate counter so that it does not come to this. But yes, not having to do the manual balance is a plus. Sebastian
Re: [patch 09/10] sched/core: Add migrate_disable/enable()
On 2020-09-17 16:49:37 [+0200], pet...@infradead.org wrote: > I'm aware of the duct-tape :-) But I was under the impression that we > didn't want the duct-tape, and that there was lots of issues with the > FPU code, or was that another issue? Of course it would be better not to need the duct tape. Also symmetrical locking is what you want but clearly futex is one of a kind. I'm currently not aware of any issues in the FPU code in regard to this. A few weeks ago, I was looking for this kind of usage and only futex popped up. Sebastian
Re: [patch 09/10] sched/core: Add migrate_disable/enable()
On 2020-09-17 17:54:10 [+0200], pet...@infradead.org wrote: > I'm not sure what the problem with FPU was, I was throwing alternatives > at tglx to see what would stick, in part to (re)discover the design > constraints of this thing. was this recent or distant in the time line? > One reason for not allowing migrate_disable() to sleep was: FPU code. > > Could it be it does something like: > > preempt_disable(); > spin_lock(); > > spin_unlock(); > preempt_enable(); > > Where we'll never get preempted while migrate_disable()'d and thus never > trigger any of the sleep paths? I try to get rid of something like that. This doesn't work either way because the spin_lock() may block which it can't with disabled preemption. Looking at my queue, FPU related is crypto. And here we break the loops mostly due to the construct: kernel_fpu_begin(); while (bytes) crypto_thingy(); skcipher_walk_done() and skcipher_walk_done() could allocate/free/map memory. This is independent. Ah. We used to have migrate_disable() in pagefault_disable(). The x86 FPU code does preempt_disable(); … pagefault_disable(); but that migrate_disable() was moved from pagefault_disable() to kmap_atomic(). We shouldn't have preempt_disable(); || local_irq_disable(); kmap_atomic(); on RT. I've been running around removing those. See a10dcebacdb0c ("fs/ntfs/aops.c: don't disable interrupts during kmap_atomic()") ce1e518190ea7 ("ide: don't disable interrupts during kmap_atomic()") f3a1075e5fc34 ("block: don't disable interrupts during kmap_atomic()") Sebastian
Re: [patch V2 00/24] cpu/hotplug: Convert get_online_cpus() to a percpu_rwsem
On 2017-04-25 17:10:37 [+0100], Mark Rutland wrote: > Hi, Hi, > When we bring the secondary CPU online, we detect an erratum that wasn't > present on the boot CPU, and try to enable a static branch we use to > track the erratum. The call to static_branch_enable() blows up as above. this (cpus_set_cap()) seems only to be used used in CPU up part. > I see that we now have static_branch_disable_cpuslocked(), but we don't > have an equivalent for enable. I'm not sure what we should be doing > here. We should introduce static_branch_enable_cpuslocked(). Does this work for you (after s/static_branch_enable/static_branch_enable_cpuslocked/ in cpus_set_cap()) ?: >From fe004351e4649da037d5165247e89ab976fe4a26 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Tue, 25 Apr 2017 18:43:00 +0200 Subject: [PATCH] jump_label: Provide static_key_[enable|/slow_inc]_cpuslocked() Provide static_key_[enable|slow_inc]_cpuslocked() variant that don't take cpu_hotplug_lock(). Signed-off-by: Sebastian Andrzej Siewior --- include/linux/jump_label.h | 7 +++ kernel/jump_label.c| 10 ++ 2 files changed, 17 insertions(+) diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index d7b17d1ab875..c80d8b1279b5 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -164,6 +164,7 @@ extern void static_key_slow_dec_cpuslocked(struct static_key *key); extern void jump_label_apply_nops(struct module *mod); extern int static_key_count(struct static_key *key); extern void static_key_enable(struct static_key *key); +extern void static_key_enable_cpuslocked(struct static_key *key); extern void static_key_disable(struct static_key *key); extern void static_key_disable_cpuslocked(struct static_key *key); @@ -252,6 +253,11 @@ static inline void static_key_enable(struct static_key *key) static_key_slow_inc(key); } +static inline void static_key_enable_cpuslocked(struct static_key *key) +{ + static_key_enable(key); +} + static inline void static_key_disable(struct static_key *key) { int count = static_key_count(key); @@ -429,6 +435,7 @@ extern bool wrong_branch_error(void); */ #define static_branch_enable(x) static_key_enable(&(x)->key) +#define static_branch_enable_cpuslocked(x) static_key_enable_cpuslocked(&(x)->key) #define static_branch_disable(x) static_key_disable(&(x)->key) #define static_branch_disable_cpuslocked(x) static_key_disable_cpuslocked(&(x)->key) diff --git a/kernel/jump_label.c b/kernel/jump_label.c index d71124ee3b14..6343f4c7e27f 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -90,6 +90,16 @@ void static_key_enable(struct static_key *key) } EXPORT_SYMBOL_GPL(static_key_enable); +void static_key_enable_cpuslocked(struct static_key *key) +{ + int count = static_key_count(key); + + WARN_ON_ONCE(count < 0 || count > 1); + + if (!count) + static_key_slow_inc_cpuslocked(key); +} + void static_key_disable(struct static_key *key) { int count = static_key_count(key); -- 2.11.0 > Thanks, > Mark. Sebastian
Re: [patch V2 00/24] cpu/hotplug: Convert get_online_cpus() to a percpu_rwsem
On 2017-04-26 11:32:36 [+0100], Mark Rutland wrote: > > So we could end up calling static_branch_enable_cpuslocked() > > without actually holding the lock. Should we do a cpu_hotplug_begin/done in > > setup_cpu_feature_capabilities ? I agree it doesn't look that nice. > > Thoughts ? > > I agree that's hideous, but it looks like the only choice given the > hotplug rwsem cahnges. :/ would work for you to provide a locked and unlocked version? > I can spin a v2 with that and the typo fixes. > > Thanks, > Mark. Sebastian
[RFC PATCH] trace/perf: cure locking issue in perf_event_open() error path
As trinity figured out, there is a recursive get_online_cpus() in perf_event_open()'s error path: | Call Trace: | dump_stack+0x86/0xce | __lock_acquire+0x2520/0x2cd0 | lock_acquire+0x27c/0x2f0 | get_online_cpus+0x3d/0x80 | static_key_slow_dec+0x5a/0x70 | sw_perf_event_destroy+0x8e/0x100 | _free_event+0x61b/0x800 | free_event+0x68/0x70 | SyS_perf_event_open+0x19db/0x1d80 In order to cure, I am moving free_event() after the put_online_cpus() block. Besides that one, there also the error path in perf_event_alloc() which also invokes event->destory. Here I delayed the destory work to schedule_work(). Signed-off-by: Sebastian Andrzej Siewior --- I am not quite happy with the schedule_work() part. include/linux/perf_event.h | 1 + kernel/events/core.c | 52 +- 2 files changed, 38 insertions(+), 15 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 24a635887f28..d6a874dbbd21 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -718,6 +718,7 @@ struct perf_event { #endif struct list_headsb_list; + struct work_struct destroy_work; #endif /* CONFIG_PERF_EVENTS */ }; diff --git a/kernel/events/core.c b/kernel/events/core.c index 7aed78b516fc..3358889609f8 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9320,6 +9320,16 @@ static void account_event(struct perf_event *event) account_pmu_sb_event(event); } +static void perf_alloc_destroy_ev(struct work_struct *work) +{ + struct perf_event *event; + + event = container_of(work, struct perf_event, destroy_work); + event->destroy(event); + module_put(event->pmu->module); + kfree(event); +} + /* * Allocate and initialize a event structure */ @@ -9334,6 +9344,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, struct pmu *pmu; struct perf_event *event; struct hw_perf_event *hwc; + bool delay_destroy = false; long err = -EINVAL; if ((unsigned)cpu >= nr_cpu_ids) { @@ -9497,15 +9508,22 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, exclusive_event_destroy(event); err_pmu: - if (event->destroy) - event->destroy(event); - module_put(pmu->module); + if (event->destroy) { + /* delay ->destroy due to nested get_online_cpus() */ + INIT_WORK(&event->destroy_work, perf_alloc_destroy_ev); + delay_destroy = true; + } else { + module_put(pmu->module); + } err_ns: if (is_cgroup_event(event)) perf_detach_cgroup(event); if (event->ns) put_pid_ns(event->ns); - kfree(event); + if (delay_destroy) + schedule_work(&event->destroy_work); + else + kfree(event); return ERR_PTR(err); } @@ -9798,7 +9816,7 @@ SYSCALL_DEFINE5(perf_event_open, pid_t, pid, int, cpu, int, group_fd, unsigned long, flags) { struct perf_event *group_leader = NULL, *output_event = NULL; - struct perf_event *event, *sibling; + struct perf_event *event = NULL, *sibling; struct perf_event_attr attr; struct perf_event_context *ctx, *uninitialized_var(gctx); struct file *event_file = NULL; @@ -9908,13 +9926,14 @@ SYSCALL_DEFINE5(perf_event_open, NULL, NULL, cgroup_fd); if (IS_ERR(event)) { err = PTR_ERR(event); + event = NULL; goto err_cred; } if (is_sampling_event(event)) { if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) { err = -EOPNOTSUPP; - goto err_alloc; + goto err_cred; } } @@ -9927,7 +9946,7 @@ SYSCALL_DEFINE5(perf_event_open, if (attr.use_clockid) { err = perf_event_set_clock(event, attr.clockid); if (err) - goto err_alloc; + goto err_cred; } if (pmu->task_ctx_nr == perf_sw_context) @@ -9962,7 +9981,7 @@ SYSCALL_DEFINE5(perf_event_open, ctx = find_get_context(pmu, task, event); if (IS_ERR(ctx)) { err = PTR_ERR(ctx); - goto err_alloc; + goto err_cred; } if ((pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE) && group_leader) { @@ -10186,18 +10205,21 @@ SYSCALL_DEFINE5(perf_event_open, err_context: perf_unpin_context(ctx); put_ctx(ctx); -err_alloc: - /* -* If event_file is set, the fput() above will have called ->release() -* and that will take care of freeing the event. -*/ - if (!event_file) - free_event(event); err_cred: if (task) mutex_unlock(&task->signal->cred_gua
Re: [RFC PATCH] trace/perf: cure locking issue in perf_event_open() error path
With the last patch on-top I trigger this now and then: == WARNING: possible circular locking dependency detected 4.11.0-rc8-00894-g8bd462ee4aac-dirty #84 Not tainted -- trinity-subchil/4966 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++}, at: [] tp_perf_event_destroy+0xd/0x20 but task is already holding lock: (&ctx->mutex){+.+.+.}, at: [] perf_event_exit_task+0x2ae/0x8d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&ctx->mutex){+.+.+.}: __lock_acquire+0x2534/0x2cd0 lock_acquire+0x27c/0x2f0 __mutex_lock+0xef/0x1280 mutex_lock_nested+0x16/0x20 SyS_perf_event_open+0x11ab/0x1e80 entry_SYSCALL_64_fastpath+0x23/0xc2 -> #1 (&sig->cred_guard_mutex){+.+.+.}: __lock_acquire+0x2534/0x2cd0 lock_acquire+0x27c/0x2f0 __mutex_lock+0xef/0x1280 mutex_lock_interruptible_nested+0x16/0x20 SyS_perf_event_open+0x1bd6/0x1e80 entry_SYSCALL_64_fastpath+0x23/0xc2 -> #0 (cpu_hotplug_lock.rw_sem){++}: check_prevs_add+0x544/0x18f0 __lock_acquire+0x2534/0x2cd0 lock_acquire+0x27c/0x2f0 get_online_cpus+0x3d/0x80 tp_perf_event_destroy+0xd/0x20 _free_event+0x61b/0x800 free_event+0x68/0x70 perf_event_exit_task+0x816/0x8d0 do_exit+0xc90/0x2a90 do_group_exit+0x1aa/0x2b0 SyS_exit_group+0x18/0x20 entry_SYSCALL_64_fastpath+0x23/0xc2 other info that might help us debug this: Chain exists of: cpu_hotplug_lock.rw_sem --> &sig->cred_guard_mutex --> &ctx->mutex Possible unsafe locking scenario: CPU0CPU1 lock(&ctx->mutex); lock(&sig->cred_guard_mutex); lock(&ctx->mutex); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** 1 lock held by trinity-subchil/4966: #0: (&ctx->mutex){+.+.+.}, at: [] perf_event_exit_task+0x2ae/0x8d0 stack backtrace: CPU: 3 PID: 4966 Comm: trinity-subchil Not tainted 4.11.0-rc8-00894-g8bd462ee4aac-dirty #84 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xce print_circular_bug+0x5c3/0x620 ? lockdep_on+0x50/0x50 check_prevs_add+0x544/0x18f0 ? check_irq_usage+0x150/0x150 ? lock_acquire+0x27c/0x2f0 __lock_acquire+0x2534/0x2cd0 ? __lock_acquire+0x2534/0x2cd0 ? find_held_lock+0x36/0x1c0 lock_acquire+0x27c/0x2f0 ? tp_perf_event_destroy+0xd/0x20 get_online_cpus+0x3d/0x80 ? tp_perf_event_destroy+0xd/0x20 tp_perf_event_destroy+0xd/0x20 _free_event+0x61b/0x800 free_event+0x68/0x70 perf_event_exit_task+0x816/0x8d0 do_exit+0xc90/0x2a90 ? mm_update_next_owner+0x550/0x550 ? getname_flags+0xde/0x370 ? getname_flags+0xde/0x370 ? rcu_read_lock_sched_held+0x14a/0x180 ? prepare_bprm_creds+0xf0/0xf0 ? kmem_cache_free+0x250/0x2c0 ? getname_flags+0xde/0x370 ? entry_SYSCALL_64_fastpath+0x5/0xc2 do_group_exit+0x1aa/0x2b0 SyS_exit_group+0x18/0x20 entry_SYSCALL_64_fastpath+0x23/0xc2 and I am not sure what to do here… Sebastian
Re: [patch RT 0/4] rwsem/rt: Lift the single reader restriction
On 2017-04-01 12:50:58 [+0200], Thomas Gleixner wrote: > R/W semaphores in RT do not allow multiple readers because a writer > blocking on the sempahore would have deal with all the readers in terms of > priority or budget inheritance. While multi reader priority boosting would > be possible (it has been attempted before), multi reader budget inheritance > is impossible. … > It cures the Radeon mess, lowers the contention on mmap_sem for certain > workloads and did not have any negative impact in our initial testing on RT > behaviour. > > I think it's worth to expose it to a wider audience of users for testing, > so we can figure out it whether there are dragons lurking. series applied. > Thanks, > > tglx Sebastian
Re: [PATCH 2/3] rtmutex: deboost priority conditionally when rt-mutex unlock
On 2017-04-13 22:02:53 [+0800], Alex Shi wrote: > The rt_mutex_fastunlock() will deboost 'current' task when it should be. without looking whether or not this patch makes sense those patches won't apply. Please look at tip/master or tip's locking/core https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=locking/core where PeterZ rewrote part of the code. Sebastian
Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper
On 2017-11-13 12:56:53 [-0500], Mikulas Patocka wrote: > Hi Hi, > I'm submitting this patch for the CONFIG_PREEMPT_RT patch. It fixes > deadlocks in device mapper when real time preemption is used. applied, thank you. > Mikulas Sebastian
Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper
On 2017-11-21 22:20:51 [+0100], Mike Galbraith wrote: > On Tue, 2017-11-21 at 14:56 -0500, Mikulas Patocka wrote: > > > > If we don't have any reason why it is needed to unplug block requests when > > a spinlock is taken - so let's not do this. > > That's perfectly fine. I guess I shouldn't have even mentioned having > encountered unplug at mutex being insufficient. While at it, I intend to drop fs-jbd2-pull-your-plug-when-waiting-for-space.patch from the -RT queue for v4.14 which does --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -116,6 +116,8 @@ void __jbd2_log_wait_for_space(journal_t nblocks = jbd2_space_needed(journal); while (jbd2_log_space_left(journal) < nblocks) { write_unlock(&journal->j_state_lock); + if (current->plug) + io_schedule(); mutex_lock(&journal->j_checkpoint_mutex); /* and is/was probably a workaround for the missing schedule while blocking on mutex/rwsem. > -Mike Sebastian
Re: [PATCH PREEMPT RT] rt-mutex: fix deadlock in device mapper
On 2017-11-18 19:37:10 [+0100], Mike Galbraith wrote: > Below is my 2012 3.0-rt version of that for reference; at that time we > were using slab, and slab_lock referenced below was a local_lock. The > comment came from crash analysis of a deadlock I met before adding the > (yeah, hacky) __migrate_disabled() blocker. At the time, that was not > an optional hack, you WOULD deadlock if you ground disks to fine powder > the way SUSE QA did. Pulling the plug before blocking cured the > xfs/ext[34] IO deadlocks they griped about, but I had to add that hack > to not trade their nasty IO deadlocks for the more mundane variety. So > my question is: are we sure that in the here and now flush won't want > any lock we may be holding? In days of yore, it most definitely would > turn your box into a doorstop if you tried hard enough. I have a deadlock in ftest01/LTP which is cured by that. The root-problem (as I understand it) is that !RT does schedule() -> sched_submit_work() -> blk_schedule_flush_plug() on a lock contention (that is that bit_spinlock or rwsem in jbd2/ext4 for instance). On RT this does not happen because of tsk_is_pi_blocked() check in sched_submit_work(). That check is needed because an additional (rtmutex) lock can not be acquired at this point. With this change we get closer to what !RT does. In regard to that migrate_disable() we could check if it is possible to do that after we acquired the lock (which is something we tried before but failed due to CPU-hotplug requirement, maybe something changed now) or flush the plug before disabling migration if it is really a problem. To your question whether or not delaying IO can cause any deadlocks is something that I can't answer and this something that would affect !RT, too. I tried to add lockdep to bit-spinlocks but this does not work because one context acquires the lock and another does the unlock. It has been explained to me that no deadlocks should happen as long as the IO is flushed before we block/wait on a lock. > Subject: rt: pull your plug before blocking > Date: Wed Jul 18 14:43:15 CEST 2012 > From: Mike Galbraith > > Queued IO can lead to IO deadlock should a task require wakeup from a task > which is blocked on that queued IO. > > ext3: dbench1 queues a buffer, blocks on journal mutex, it's plug is not > pulled. dbench2 mutex owner is waiting for kjournald, who is waiting for > the buffer queued by dbench1. Game over. > > Signed-off-by: Mike Galbraith > --- > kernel/locking/rtmutex.c | 18 -- > 1 file changed, 16 insertions(+), 2 deletions(-) > > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > > #include "rtmutex_common.h" > > @@ -930,8 +931,18 @@ static inline void rt_spin_lock_fastlock > > if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) > rt_mutex_deadlock_account_lock(lock, current); > - else > + else { > + /* > + * We can't pull the plug if we're already holding a lock > + * else we can deadlock. eg, if we're holding slab_lock, > + * ksoftirqd can block while processing BLOCK_SOFTIRQ after > + * having acquired q->queue_lock. If _we_ then block on > + * that q->queue_lock while flushing our plug, deadlock. > + */ > + if (__migrate_disabled(current) < 2 && > blk_needs_flush_plug(current)) > + blk_schedule_flush_plug(current); > slowfn(lock); > + } > } > > static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock, > @@ -1892,9 +1903,12 @@ rt_mutex_fastlock(struct rt_mutex *lock, > if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) { > rt_mutex_deadlock_account_lock(lock, current); > return 0; > - } else > + } else { > + if (blk_needs_flush_plug(current)) > + blk_schedule_flush_plug(current); > return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK, > ww_ctx); > + } > } > > static inline int Sebastian