Oleg Nesterov wrote: > On 06/01, Mark Hounschell wrote: >> Oleg Nesterov wrote: >>> Yes, but see above. flush_scheduled_work() needs a cooperation from events/2 >>> which is bound to CPU 2. >>> >> Again I don't understand why flush_scheduled_work() running on behalf of a >> process >> affinitized to processor-1 requires cooperation from events/2 (affinitized >> to processor-2) >> when there is an events/1 already affinitized to processor 1? > > flush_workqueue() blocks until any scheduled work on any CPU has run to > completion. If we have some work_struct pending on CPU 2, it can be completed > only when events/2 executes it. > >>> If you changed irq/X/smp_affinity, the patch I sent should help, because >>> floppy_work can't be scheduled on CPU 2, but still I don't think it is right >>> to run 100% cpu-bound RT-process. >> The patch you sent helps with no other intervention from me. But then so >> does >> the patch mentioned in the original post. I am able to bang on the floppies >> pretty >> hard doing all kinds of things with no trouble using either. > > This patch replaces flush_scheduled_work() with cancel_work_sync(). The latter > can still hang if the floppy interrupt happens on CPU 2 and does > schedule_bh(), > events/2 starts running floppy_work->func() and preempted by RT-thread. This > is > very unlikely, but possible. > >>From your original post: > > > > The application only runs on SMP machines and uses process and irq > affinities > > if the floppy interrupt can't happen on CPU 2, the above scenario is not > possible. >
All the irq affinities but one are set to processor-1. The only irq not is from an rtom (Real-Time Option Module). It's irq is handled by processor-2. >> As far as a 100% cpu-bound RT-process goes, well I say I don't intentionally >> relinquish >> the processor but it's not really 100% cpu-bound. Running xosview I see some >> spare time. > > Well, I don't know what is xosview, sorry :) so I don't understand what does > "spare time" precisely mean. If this thread does some i/o or something which > can sleep, then... > I don't understand the _real_ meaning of spare time either but xosview is just a little graphical window showing information obtained from the proc fs i think. > OK. In that case we may have another reason for deadlock, say a pending > floppy_work needs open_lock or test_and_set_bit(0, &fdc_busy). > > Could you apply the trivial patch below, and change the i/o thread to do > > prctl(1234); // hangs ??? > printf(something); > ioctl(Q->DevSpec1, FDSETPRM, &medprm); // this hangs > > to see if prctl() hangs or not? This way we can narrow the problem. > (of course, you can just kill the above ioctl() if this is possible). > > Thanks! > > Oleg. > > --- OLD/kernel/sys.c~ 2007-04-03 13:05:02.000000000 +0400 > +++ OLD/kernel/sys.c 2007-06-01 18:56:22.000000000 +0400 > @@ -2147,6 +2147,11 @@ asmlinkage long sys_prctl(int option, un > { > long error; > > + if (option == 1234) { > + flush_scheduled_work(); > + return 0; > + } > + > error = security_task_prctl(option, arg2, arg3, arg4, arg5); > if (error) > return error; > > - Ok the prctl never returned. I just replaced the ioctl with it and added a printf before and after. I only get the one before. The thread is hung at this point just as if I'd done the ioctl? Regards Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/