Re: percpu crash on NetBurst

2011-08-08 Thread Tejun Heo
Hello, Avi. On Sun, Aug 07, 2011 at 06:32:35PM +0300, Avi Kivity wrote: > qemu, under some conditions (-cpu host or -cpu kvm64), erroneously > passes family=15 as the virtual cpuid. This causes a BUG() in > percpu code during late boot: > > [ cut here ] > kernel BUG at mm

[PATCHSET] kthread_worker: reimplement flush_kthread_work() to allow freeing during execution

2012-07-19 Thread Tejun Heo
Hello, kthread_worker was introduced together with concurrency managed workqueue to serve workqueue users which need a special dedicated worker - e.g. RT scheduling. This is minimal queue / flush / flush all iterface on top of kthread and each provided interface matches the workqueue counterpart

[PATCH 1/2] kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation

2012-07-19 Thread Tejun Heo
>From c9bba34243a86fb3ac82d1bdd0ce4bf796b79559 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 19 Jul 2012 13:52:53 -0700 Make the following two non-functional changes. * Separate out insert_kthread_work() from queue_kthread_work(). * Relocate struct kthread_flush_work

[PATCH 2/2] kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed

2012-07-19 Thread Tejun Heo
>From 06f9a06f4aeecdb9d07014713ab41b548ae219b5 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 19 Jul 2012 13:52:53 -0700 kthread_worker provides minimalistic workqueue-like interface for users which need a dedicated worker thread (e.g. for realtime priority). It has basic queue, flush_w

Re: [PATCH 1/2] kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation

2012-07-22 Thread Tejun Heo
Hello, On Sat, Jul 21, 2012 at 01:13:27PM -0400, Andy Walls wrote: > > +/* insert @work before @pos in @worker */ > > Hi Tejun, > > Would a comment that the caller should be holding worker->lock be useful > here? Anyway, comment or not: > > Acked-by: Andy Walls Will add lockdep_assert_held()

Re: [PATCH 2/2] kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed

2012-07-22 Thread Tejun Heo
Hello, On Sat, Jul 21, 2012 at 02:20:06PM -0400, Andy Walls wrote: > > + worker->current_work = work; > > spin_unlock_irq(&worker->lock); > > > > if (work) { > > __set_current_state(TASK_RUNNING); > > work->func(work); > > If the call to 'work->func(work);' fre

[PATCH UPDATED 1/2] kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation

2012-07-22 Thread Tejun Heo
>From 9a2e03d8ed518a61154f18d83d6466628e519f94 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 19 Jul 2012 13:52:53 -0700 Make the following two non-functional changes. * Separate out insert_kthread_work() from queue_kthread_work(). * Relocate struct kthread_flush_work

Re: [PATCH 2/2] kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed

2012-07-23 Thread Tejun Heo
Hello, On Sun, Jul 22, 2012 at 04:46:54PM -0400, Andy Walls wrote: > Hmmm, I didn't know about the constraint about 'known to be alive' in > the other email I just sent. > > That might make calling flush_kthread_work() hard for a user to use, if > the user lets the work get freed by another threa

Re: kvm deadlock

2011-12-14 Thread Tejun Heo
Hello, On Wed, Dec 14, 2011 at 12:22:34PM -0500, Vivek Goyal wrote: > [..] > > __GFP_WAIT isn't the problem, you can block in the IO path. You cannot, > > however, recurse back into IO submission. That's why CFQ is using > > GFP_NOIO, implying that waiting is OK, but submitting new IO to satisfy >

Re: [RFT PATCH] blkio: alloc per cpu data from worker thread context( Re: kvm deadlock)

2011-12-19 Thread Tejun Heo
On Mon, Dec 19, 2011 at 12:27:17PM -0500, Vivek Goyal wrote: > On Sun, Dec 18, 2011 at 03:25:48PM -0600, Nate Custer wrote: > > > > On Dec 16, 2011, at 2:29 PM, Vivek Goyal wrote: > > > Thanks for testing it Nate. I did some debugging and found out that patch > > > is doing double free on per cpu

Re: [RFT PATCH] blkio: alloc per cpu data from worker thread context( Re: kvm deadlock)

2011-12-19 Thread Tejun Heo
Hello, Vivek. On Mon, Dec 19, 2011 at 01:27:17PM -0500, Vivek Goyal wrote: > Ok, that's good to know. If per cpu allocator can support this use case, > it will be good for 3.3 onwards. This seems to be right way to go to fix > the problem. Ummm... if we're gonna make percpu usable w/ GFP_NOIO, th

Re: [RFT PATCH] blkio: alloc per cpu data from worker thread context( Re: kvm deadlock)

2011-12-20 Thread Tejun Heo
Hello, On Tue, Dec 20, 2011 at 09:50:24AM -0500, Vivek Goyal wrote: > So IIUC, existing mempool implementation is not directly usable for my > requirement and I need to write some code of my own for the caching > layer which always allocates objects from reserve and fills in the > pool asynchronou

Re: [RFC PATCH 1/5] block: Introduce q->abort_queue_fn()

2012-05-21 Thread Tejun Heo
On Mon, May 21, 2012 at 05:08:29PM +0800, Asias He wrote: > When user hot-unplug a disk which is busy serving I/O, __blk_run_queue > might be unable to drain all the requests. As a result, the > blk_drain_queue() would loop forever and blk_cleanup_queue would not > return. So hot-unplug will fail.

Re: [RFC PATCH 1/5] block: Introduce q->abort_queue_fn()

2012-05-22 Thread Tejun Heo
Hello, On Tue, May 22, 2012 at 03:30:37PM +0800, Asias He wrote: > On 05/21/2012 11:42 PM, Tejun Heo wrote: > 1) if the queue is stopped, q->request_fn() will never call called. > we will be stuck in the loop forever. This can happen if the remove > method is called after th

Re: [PATCH RFC 1/2] block: Add blk_bio_map_sg() helper

2012-06-13 Thread Tejun Heo
On Wed, Jun 13, 2012 at 03:41:46PM +0800, Asias He wrote: > Add a helper to map a bio to a scatterlist, modelled after > blk_rq_map_sg. > > This helper is useful for any driver that wants to create > a scatterlist from its ->make_request_fn method. This may not be possible but I really wanna avoi

Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk

2012-06-18 Thread Tejun Heo
Hello, guys. On Mon, Jun 18, 2012 at 07:35:22PM +0930, Rusty Russell wrote: > On Mon, 18 Jun 2012 16:03:23 +0800, Asias He wrote: > > On 06/18/2012 03:46 PM, Rusty Russell wrote: > > > On Mon, 18 Jun 2012 14:53:10 +0800, Asias He wrote: > > >> This patch introduces bio-based IO path for virtio-b

Re: [PATCH 1/3] block: Introduce __blk_segment_map_sg() helper

2012-06-18 Thread Tejun Heo
Hello, Asias. On Mon, Jun 18, 2012 at 02:53:08PM +0800, Asias He wrote: > Split the mapping code in blk_rq_map_sg() to a helper > __blk_segment_map_sg(), so that other mapping function, e.g. > blk_bio_map_sg(), can share the code. > > Cc: Jens Axboe > Cc: Tejun Heo >

Re: [PATCH 1/3] block: Introduce __blk_segment_map_sg() helper

2012-06-19 Thread Tejun Heo
Hello, On Mon, Jun 18, 2012 at 7:02 PM, Asias He wrote: >> I *hope* this is a bit prettier.  e.g. Do we really need to pass in >> @sglist and keep using "goto new_segment"? > > I think this deserves another patch on top of this splitting one. I'd like > to clean this up later. Yeap, doing it in

Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming

2012-03-30 Thread Tejun Heo
On Fri, Mar 30, 2012 at 05:53:52PM +0800, Ren Mingxin wrote: > The current virtblk's naming algorithm only supports 263 disks. > If there are mass of virtblks(exceeding 263), there will be disks > with the same name. > > By renaming "sd_format_disk_name()" to "disk_name_format()" > and moving it

Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming

2012-03-30 Thread Tejun Heo
On Fri, Mar 30, 2012 at 08:26:06AM -0700, Tejun Heo wrote: > On Fri, Mar 30, 2012 at 05:53:52PM +0800, Ren Mingxin wrote: > > The current virtblk's naming algorithm only supports 263 disks. > > If there are mass of virtblks(exceeding 263), there will be disks > > with

Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming

2012-03-30 Thread Tejun Heo
On Fri, Mar 30, 2012 at 08:26:06AM -0700, Tejun Heo wrote: > On Fri, Mar 30, 2012 at 05:53:52PM +0800, Ren Mingxin wrote: > > The current virtblk's naming algorithm only supports 263 disks. > > If there are mass of virtblks(exceeding 263), there will be disks > > with

Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming

2012-04-02 Thread Tejun Heo
Hello, On Mon, Apr 02, 2012 at 10:20:09AM +0300, Michael S. Tsirkin wrote: > Pleae don't rename virtio disks, it is way too late for that: > virtio block driver was merged around 2007, it is not new by > any measure, and there are many systems out there using > the current naming scheme. There's

Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming

2012-04-02 Thread Tejun Heo
Hello, James. On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote: > So if we're agreed no other devices going forwards should ever use this > interface, is there any point unifying the interface? No matter how > many caveats you hedge it round with, putting the API in a central place

Re: [PATCH] virtio_blk: Add help function to format mass of disks

2012-04-10 Thread Tejun Heo
Hello, guys. On Tue, Apr 10, 2012 at 04:34:06PM +0300, Michael S. Tsirkin wrote: > > Why not use 'base' below? neither unit nor base change. > > Yes it's a bit strange, it was the same in Tejun's patch. > Tejun, any idea? It was years ago, so I don't recall much. I think I wanted to use a vari

[PATCH] pci-stub: ignore zero-length id parameters

2010-12-22 Thread Tejun Heo
pci-stub uses strsep() to separate list of ids and generates a warning message when it fails to parse an id. However, not specifying the parameter results in ids set to an empty string. strsep() happily returns the empty string as the first token and thus triggers the warning message spuriously.

Re: [RFC -v3 PATCH 2/3] sched: add yield_to function

2011-01-05 Thread Tejun Heo
Hello, On Wed, Jan 05, 2011 at 06:39:19PM +0100, Peter Zijlstra wrote: > On Wed, 2011-01-05 at 19:35 +0200, Avi Kivity wrote: > > > Tejun, why did you end up not using preempt_notifiers in cmwq? > > Because I told him to use explicit function calls because that keeps the > code easier to read.

Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-09 Thread Tejun Heo
e still need use lock for safety. What's > > your opinion? > > > > commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 > > Author: Tejun Heo > > Date: Sat Feb 21 11:04:45 2009 +0900 > > > > [SCSI] sd: revive sd_index_lock > > > >

Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-15 Thread Tejun Heo
Hello, On Wed, Jun 15, 2011 at 02:21:51PM +0930, Rusty Russell wrote: > > + if (index_to_minor(index) >= 1 << MINORBITS) { > > + err = -ENOSPC; > > + goto out_free_index; > > + } > > Is this *really* how this is supposed to be used? > > Tejun, this is your code. What do

Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-16 Thread Tejun Heo
Hello, On Thu, Jun 16, 2011 at 09:35:34AM +0930, Rusty Russell wrote: > On Wed, 15 Jun 2011 09:06:38 +0200, Tejun Heo wrote: > > It's inherited from idr which was designed to have separate > > prepare/allocation stages so that allocation can happen inside an > > outer

[PATCH 50/62] vfio: convert to idr_alloc()

2013-02-02 Thread Tejun Heo
Convert to the much saner new idr interface. Only compile tested. Signed-off-by: Tejun Heo Cc: Alex Williamson Cc: kvm@vger.kernel.org --- This patch depends on an earlier idr changes and I think it would be best to route these together through -mm. Please holler if there's any obje

[PATCH v2 50/62] vfio: convert to idr_alloc()

2013-02-04 Thread Tejun Heo
Convert to the much saner new idr interface. Only compile tested. v2: Restore accidentally dropped "index 0" comment as suggested by Alex. Signed-off-by: Tejun Heo Acked-by: Alex Williamson Cc: kvm@vger.kernel.org --- drivers/vfio/vfio.c | 17 + 1 file

[PATCH 66/77] vfio: convert to idr_alloc()

2013-02-06 Thread Tejun Heo
Convert to the much saner new idr interface. Only compile tested. v2: Restore accidentally dropped "index 0" comment as suggested by Alex. Signed-off-by: Tejun Heo Acked-by: Alex Williamson Cc: kvm@vger.kernel.org --- drivers/vfio/vfio.c | 17 + 1 file

Re: [PATCH v2 0/9] Phase out pci_enable_msi_block()

2014-01-18 Thread Tejun Heo
er works for you ;) > > > > I am only concerned with a regression fix "ahci: Fix broken fallback to > > single MSI mode" which would be nice to have in 3.14. But it seems pretty > > much too late. > > Tejun, if you want to ack that one, I can put it in either

Re: [PATCHSET] kthread_worker: reimplement flush_kthread_work() to allow freeing during execution

2012-09-17 Thread Tejun Heo
On Fri, Sep 14, 2012 at 03:50:40PM -0700, Colin Cross wrote: > This patch set fixes a reproducible crash I'm seeing on a 3.4.10 > kernel. flush_kthread_worker (which is different from > flush_kthread_work) is initializing a kthread_work and a completion on > the stack, then queuing it and calling

Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Tejun Heo
Hello, On 05/27/2010 03:12 PM, Michael S. Tsirkin wrote: >> I don't understand the reasons for this patch, but this doesn't matter. > > Depending on userspace application, driver can create a lot of work > for a workqueue to handle. By making the workqueue thread > belong in a cgroup, we make it

Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Tejun Heo
Hello, On 05/27/2010 06:39 PM, Michael S. Tsirkin wrote: >> Unless you're gonna convert every driver to use this >> special kind of workqueue (and what happens when multiple tasks from >> different cgroups share the driver?), > > We'll then create a workqueue per task. Each workqueue will have th

Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-27 Thread Tejun Heo
Hello, Michael. On 05/27/2010 07:32 PM, Michael S. Tsirkin wrote: > Well, this is why I proposed adding a new API for creating > workqueue within workqueue.c, rather than exposing the task > and attaching it to cgroups in our driver: so that workqueue > maintainers can fix the implementation if it

Re: [PATCH 2/3] workqueue: Add an API to create a singlethread workqueue attached to the current task's cgroup

2010-05-28 Thread Tejun Heo
Hello, On 05/28/2010 05:08 PM, Michael S. Tsirkin wrote: > Well, we have create_singlethread_workqueue, right? > This is not very different ... is it? > > Just copying structures and code from workqueue.c, > adding vhost_ in front of it will definitely work: Sure it will, but you'll probably be

[PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup

2010-05-30 Thread Tejun Heo
From: Sridhar Samudrala Add a new kernel API to attach a task to current task's cgroup in all the active hierarchies. Signed-off-by: Sridhar Samudrala --- include/linux/cgroup.h |1 + kernel/cgroup.c| 23 +++ 2 files changed, 24 insertions(+) Index: work/incl

[PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-05-30 Thread Tejun Heo
Replace vhost_workqueue with per-vhost kthread. Other than callback argument change from struct work_struct * to struct vhost_poll *, there's no visible change to vhost_poll_*() interface. This conversion is to make each vhost use a dedicated kthread so that resource control via cgroup can be app

[PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers

2010-05-30 Thread Tejun Heo
Apply the cpumask and cgroup of the initializing task to the created vhost poller. Based on Sridhar Samudrala's patch. Cc: Michael S. Tsirkin Cc: Sridhar Samudrala --- drivers/vhost/vhost.c | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) Index: wo

[PATCH UPDATED 3/3] vhost: apply cpumask and cgroup to vhost pollers

2010-05-31 Thread Tejun Heo
Apply the cpumask and cgroup of the initializing task to the created vhost poller. Based on Sridhar Samudrala's patch. Li Zefan spotted a bug in error path, fixed. Cc: Michael S. Tsirkin Cc: Sridhar Samudrala Cc: Li Zefan --- Updated accordingly. Thanks. drivers/vhost/vhost.c | 36 ++

Re: [PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup

2010-05-31 Thread Tejun Heo
On 05/31/2010 03:07 AM, Li Zefan wrote: > 04:24, Tejun Heo wrote: >> From: Sridhar Samudrala >> >> Add a new kernel API to attach a task to current task's cgroup >> in all the active hierarchies. >> >> Signed-off-by: Sridhar Samudrala > > Acke

Re: [PATCH UPDATED2 3/3] vhost: apply cpumask and cgroup to vhost pollers

2010-05-31 Thread Tejun Heo
Apply the cpumask and cgroup of the initializing task to the created vhost poller. Based on Sridhar Samudrala's patch. Li Zefan spotted a bug in error path (twice), fixed (twice). Cc: Michael S. Tsirkin Cc: Sridhar Samudrala Cc: Li Zefan --- Heh... that's embarrassing. Let's see if I can get

Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-05-31 Thread Tejun Heo
Hello, On 05/31/2010 04:39 PM, Oleg Nesterov wrote: > On 05/30, Tejun Heo wrote: >> >> This conversion is to make each vhost use a dedicated kthread so that >> resource control via cgroup can be applied. > > Personally, I agree. I think This is better than play with

Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-05-31 Thread Tejun Heo
Hello, On 05/31/2010 05:31 PM, Oleg Nesterov wrote: >> I might have slightly over engineered this part not knowing the >> expected workload. ->queue_seq/->done_seq pair is to guarantee that >> flushers never get starved. > > Ah, indeed. > > Well, afaics we do not need 2 counters anyway, both vh

Re: [PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-05-31 Thread Tejun Heo
Hello, On 05/31/2010 05:22 PM, Michael S. Tsirkin wrote: > On Sun, May 30, 2010 at 10:24:01PM +0200, Tejun Heo wrote: >> Replace vhost_workqueue with per-vhost kthread. Other than callback >> argument change from struct work_struct * to struct vhost_poll *, >> there&#

[PATCH 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-06-01 Thread Tejun Heo
Replace vhost_workqueue with per-vhost kthread. Other than callback argument change from struct work_struct * to struct vhost_work *, there's no visible change to vhost_poll_*() interface. This conversion is to make each vhost use a dedicated kthread so that resource control via cgroup can be app

[PATCH 2/3] cgroups: Add an API to attach a task to current task's cgroup

2010-06-01 Thread Tejun Heo
From: Sridhar Samudrala Add a new kernel API to attach a task to current task's cgroup in all the active hierarchies. Signed-off-by: Sridhar Samudrala Reviewed-by: Paul Menage Acked-by: Li Zefan --- include/linux/cgroup.h |1 + kernel/cgroup.c| 23 +++ 2 fil

[PATCH 3/3] vhost: apply cpumask and cgroup to vhost workers

2010-06-01 Thread Tejun Heo
Apply the cpumask and cgroup of the initializing task to the created vhost worker. Based on Sridhar Samudrala's patch. Li Zefan spotted a bug in error path (twice), fixed (twice). Signed-off-by: Tejun Heo Cc: Michael S. Tsirkin Cc: Sridhar Samudrala Cc: Li Zefan --- drivers/vhost/vh

Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost workers

2010-06-01 Thread Tejun Heo
Hello, On 06/01/2010 12:17 PM, Michael S. Tsirkin wrote: > Something that I wanted to figure out - what happens if the > CPU mask limits us to a certain CPU that subsequently goes offline? The thread gets unbound during the last steps of cpu offlining. > Will e.g. flush block forever or until th

Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost workers

2010-06-01 Thread Tejun Heo
On 06/01/2010 07:19 PM, Sridhar Samudrala wrote: >> -int i; >> +cpumask_var_t mask; >> +int i, ret = -ENOMEM; >> + >> +if (!alloc_cpumask_var(&mask, GFP_KERNEL)) >> +goto out_free_mask; > > I think this is another bug in the error path. You should simply > do a return i

[PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-06-02 Thread Tejun Heo
an be applied. Partially based on Sridhar Samudrala's patch. * Updated to use sub structure vhost_work instead of directly using vhost_poll at Michael's suggestion. * Added flusher wake_up() optimization at Michael's suggestion. Signed-off-by: Tejun Heo Cc: Michael S. Tsirkin C

Re: kernel bug in kvm_intel

2009-11-01 Thread Tejun Heo
Hello, Avi Kivity wrote: > Only, that merge doesn't change virt/kvm or arch/x86/kvm. > > Tejun, anything known bad about that merge? ada3fa15 kills kvm. Nothing rings a bell at the moment. How does it kill kvm? One big difference caused by that merge is use of sparse areas near the top of vma

Re: kernel bug in kvm_intel

2009-11-01 Thread Tejun Heo
Hello, Avi Kivity wrote: > We get a page fault immediately (next instruction) after returning from > the guest when running with oprofile. The page fault address does not > match anything the instruction does, so presumably it is one of the > accesses the processor performs in order to service an

Re: kernel bug in kvm_intel

2009-11-18 Thread Tejun Heo
Hello, 11/01/2009 08:31 PM, Avi Kivity wrote: >>> Here is the code in question: >>> >>> 3ae7: 75 05 jne 3aee 3ae9: 0f 01 c2vmlaunch 3aec: eb 03 jmp 3af1 3aee:

Re: kernel bug in kvm_intel

2009-11-25 Thread Tejun Heo
Hello, 11/26/2009 10:35 AM, Andrew Theurer wrote: > I just tried testing tip of kvm.git, but unfortunately I think I might > be hitting a different problem, where processes run 100% in kernel mode. > In my case, cpus 9 and 13 were stuck, running qemu processes. A stack > backtrace for both cpus a

Re: WARNING: kernel/smp.c:292 smp_call_function_single [Was: mmotm 2009-11-24-16-47 uploaded]

2009-11-30 Thread Tejun Heo
Hello, On 11/28/2009 09:12 PM, Avi Kivity wrote: >> Hmm, commit 498657a moved the fire_sched_in_preempt_notifiers() call >> into the irqs disabled section recently. >> >> sched, kvm: Fix race condition involving sched_in_preempt_notifers >> >> In finish_task_switch(), fire_sched_in_preem

Re: WARNING: kernel/smp.c:292 smp_call_function_single [Was: mmotm 2009-11-24-16-47 uploaded]

2009-11-30 Thread Tejun Heo
Hello, On 11/30/2009 07:02 PM, Thomas Gleixner wrote: > No, it _CANNOT_ be preempted at that point: > > schedule() > { > preempt_disable(); > > switch_to(); > > preempt_enable(); > } Yes, you're right. >> For the time being, maybe it's best to back out the fix given that the

[PATCH tip/sched/urgent] sched: revert 498657a478c60be092208422fefa9c7b248729c2

2009-11-30 Thread Tejun Heo
vert the incorrect commit and add comment describing different contexts under with the two callbacks are invoked. Signed-off-by: Tejun Heo Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Avi Kivity --- Again, my apologies for the unnecessary fuss. I for some reason was thinking schedule_tail() is always call

Re: [PATCH repost] sched: export sched_set/getaffinity to modules

2010-07-01 Thread Tejun Heo
Hello, On 07/01/2010 03:39 PM, Michael S. Tsirkin wrote: >> I think that's called kernel_thread() see >> kernel/kthread.c:create_kthread(). >> >> Doing the whole kthreadd dance and then copying bits and pieces back >> sounds very fragile, so yeah, something like that should work. > > Thanks! > Sr

Re: [PATCH repost] sched: export sched_set/getaffinity to modules

2010-07-01 Thread Tejun Heo
Hello, On 07/01/2010 04:46 PM, Oleg Nesterov wrote: >> It might be a good idea to make the function take extra clone flags >> but anyways once created cloned task can be treated the same way as >> other kthreads, so nothing else needs to be changed. > > This makes kthread_stop() work. Otherwise t

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-22 Thread Tejun Heo
Hello, On 07/22/2010 05:58 PM, Michael S. Tsirkin wrote: > All the tricky barrier pairing made me uncomfortable. So I came up with > this on top (untested): if we do all operations under the spinlock, we > can get by without barriers and atomics. And since we need the lock for > list operations

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-25 Thread Tejun Heo
Hello, On 07/24/2010 09:14 PM, Michael S. Tsirkin wrote: >> I've created kthread_worker in wq#for-next tree and already converted >> ivtv to use it. Once this lands in mainline, I think converting vhost >> to use it would be better choice. kthread worker code uses basically >> the same logic use

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
Hello, On 07/26/2010 05:25 PM, Michael S. Tsirkin wrote: > BTW, kthread_worker would benefit from the optimization I implemented > here as well. Hmmm... I'm not quite sure whether it's an optimization. I thought the patch was due to feeling uncomfortable about using barriers? Is it an optimizat

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
On 07/26/2010 05:34 PM, Tejun Heo wrote: > Hello, > > On 07/26/2010 05:25 PM, Michael S. Tsirkin wrote: >> BTW, kthread_worker would benefit from the optimization I implemented >> here as well. > > Hmmm... I'm not quite sure whether it's an optimization. I t

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
Hello, On 07/26/2010 05:50 PM, Michael S. Tsirkin wrote: >> Hmmm... I'm not quite sure whether it's an optimization. I thought >> the patch was due to feeling uncomfortable about using barriers? > > Oh yes. But getting rid of barriers is what motivated me originally. Yeah, getting rid of barrie

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
Just one more thing. On 07/26/2010 06:05 PM, Tejun Heo wrote: > * Placing try_to_freeze() could be a bit annoying. It shouldn't be > executed when there's a work to flush. * Similar issue exists for kthread_stop(). The kthread shouldn't exit while there's a work

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
Hello, On 07/26/2010 06:31 PM, Michael S. Tsirkin wrote: >> On 07/26/2010 06:05 PM, Tejun Heo wrote: >>> * Placing try_to_freeze() could be a bit annoying. It shouldn't be >>> executed when there's a work to flush. > > BTW why is this important? >

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
On 07/26/2010 06:23 PM, Michael S. Tsirkin wrote: >> * Can you please keep the outer goto repeat loop? I just don't like >> outermost for (;;). > > Okay ... can we put the code in a {} scope to make it clear > where does the loop starts and ends? If we're gonna do that, it would be better to p

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
On 07/26/2010 06:51 PM, Michael S. Tsirkin wrote: > On Mon, Jul 26, 2010 at 06:14:30PM +0200, Tejun Heo wrote: >> Just one more thing. > > I noticed that with vhost, flush_work was getting the worker > pointer as well. Can we live with this API change? Yeah, the flushing mecha

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-26 Thread Tejun Heo
Hello, On 07/26/2010 09:14 PM, Tejun Heo wrote: > On 07/26/2010 06:51 PM, Michael S. Tsirkin wrote: >> I noticed that with vhost, flush_work was getting the worker >> pointer as well. Can we live with this API change? > > Yeah, the flushing mechanism wouldn't wo

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-27 Thread Tejun Heo
Hello, On 07/26/2010 09:57 PM, Michael S. Tsirkin wrote: >> For freeze, it probably is okay but for stop, I think it's better to >> keep the semantics straight forward. > > What are the semantics then? What do we want stop followed > by queue and flush to do? One scenario I can think of is the f

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-27 Thread Tejun Heo
Hello, On 07/26/2010 10:19 PM, Michael S. Tsirkin wrote: > Let's try to define what do we want to achieve then. Do you want > code that flushes workers not to block when workers are frozen? How > will we handle work submitted when worker is frozen? As I wrote earlier, it's not necessarily about

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-28 Thread Tejun Heo
On 07/27/2010 09:19 PM, Michael S. Tsirkin wrote: >> Thinking a bit more about it, it kind of sucks that queueing to >> another worker from worker->func() breaks flush. Maybe the right >> thing to do there is using atomic_t for done_seq? > > I don't believe it will help: we might have: > > worke

Re: [PATCH UPDATED 1/3] vhost: replace vhost_workqueue with per-vhost kthread

2010-07-28 Thread Tejun Heo
Hello, On 07/28/2010 12:48 PM, Michael S. Tsirkin wrote: > I'm unsure how flush_work operates under these conditions. E.g. in > workqueue.c, this seems to work by keeping a pointer to current > workqueue in the work. But what prevents us from destroying the > workqueue when work might not be run

Re: [PATCH repost] sched: export sched_set/getaffinity to modules

2010-07-30 Thread Tejun Heo
Hello, On 07/30/2010 04:19 PM, Oleg Nesterov wrote: > But I must admit, I personally dislike this idea. A kernel thread which > is the child of the user-space process, and in fact it is not the "real" > kernel thread. I think this is against the common case. If you do not > care the signals/repare

Re: [PATCH] vhost: locking/rcu cleanup

2010-07-30 Thread Tejun Heo
Hello, On 07/29/2010 02:23 PM, Michael S. Tsirkin wrote: > I saw WARN_ON(!list_empty(&dev->work_list)) trigger > so our custom flush is not as airtight as need be. Could be but it's also possible that something has queued something after the last flush? Is the problem reproducible? > This patch

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Tejun Heo
Hello, On Tue, Sep 15, 2015 at 03:36:34PM +0200, Christian Borntraeger wrote: > >> The problem seems to be that the newly used percpu_rwsem does a > >> rcu_synchronize_sched_expedited for all write downs/ups. > > > > Can you try: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linu

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Tejun Heo
Hello, On Tue, Sep 15, 2015 at 11:11:45PM +0200, Christian Borntraeger wrote: > > In fact, I would say that any userspace-controlled call to *_expedited() > > is a bug waiting to happen and a bad idea---because userspace can, with > > little effort, end up calling it in a loop. > > Right. This al

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Tejun Heo
Hello, On Tue, Sep 15, 2015 at 02:38:30PM -0700, Paul E. McKenney wrote: > I did take a shot at adding the rcu_sync stuff during this past merge > window, but it did not converge quickly enough to make it. It looks > quite good for the next merge window. There have been changes in most > of the

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Tejun Heo
Hello, Paul. On Tue, Sep 15, 2015 at 04:38:18PM -0700, Paul E. McKenney wrote: > Well, the decision as to what is too big for -stable is owned by the > -stable maintainers, not by me. Is it tho? Usually the subsystem maintainer knows the best and has most say in it. I was mostly curious whether

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-16 Thread Tejun Heo
Hello, On Wed, Sep 16, 2015 at 12:58:00PM +0200, Christian Borntraeger wrote: > FWIW, I added a printk to percpu_down_write. With KVM and uprobes disabled, > just booting up a fedora20 gives me __6749__ percpu_down_write calls on 4.2. > systemd seems to do that for the processes. > > So a revert

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-16 Thread Tejun Heo
Hello, On Tue, Sep 15, 2015 at 09:35:47PM -0700, Paul E. McKenney wrote: > > > I am suggesting trying the options and seeing what works best, then > > > working to convince people as needed. > > > > Yeah, sure thing. Let's wait for Christian. > > Indeed. Is there enough benefit to risk jamming

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-16 Thread Tejun Heo
Hello, On Wed, Sep 16, 2015 at 02:22:49PM +0200, Oleg Nesterov wrote: > > If the revert isn't easy, I think backporting rcu_sync is the best bet. > > I leave this to Paul and Tejun... at least I think this is not v4.2 material. Will route reverts through cgroup branch. Should be pretty painless

[PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking"

2015-09-16 Thread Tejun Heo
>From f9f9e7b776142fb1c0782cade004cc8e0147a199 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 16 Sep 2015 11:51:12 -0400 This reverts commit b5ba75b5fc0e8404e2c50cb68f39bb6a53fc916f. d59cfc09c32a ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_

[PATCH cgroup/for-4.3-fixes 2/2] Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem"

2015-09-16 Thread Tejun Heo
>From 0c986253b939cc14c69d4adbe2b4121bdf4aa220 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 16 Sep 2015 11:51:12 -0400 This reverts commit d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14. d59cfc09c32a ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_