Arjan van de Ven wrote:
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
broken.
What should it do when the NFS server doesn't answer anymore or
when the network to the SAN RAID array located a few hundred KM
* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > > Er, it won't play well if that happen when tasks are frozen for
> > > suspend.
> >
> > right now any suspend attempt times out after 20 seconds:
> >
> > $ grep TIMEOUT kernel/power/process.c
> > #define TIMEOUT (20 * HZ)
> > en
On Monday, 3 of December 2007, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
>
> > > This feature will save one full reporter-developer round-trip during
> > > investigation of a significant number of bug reports.
> > >
> > > It might be more practical if it were to dump
* Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:
> > This feature will save one full reporter-developer round-trip during
> > investigation of a significant number of bug reports.
> >
> > It might be more practical if it were to dump the traces for _all_
> > D-state processes when it fires - bas
On Monday, 3 of December 2007, Andrew Morton wrote:
> On Mon, 3 Dec 2007 15:19:25 +0100
> Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> > this patch extends the soft-lockup detector to automatically
> > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> > printed the following way:
> >
>
On Mon, 3 Dec 2007 15:19:25 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:
> this patch extends the soft-lockup detector to automatically
> detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> printed the following way:
>
> -->
> INFO: task prctl:3042 blocked for more tha
On Dec 3, 2007 6:17 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> That won't address my concerns about already "breaking" (as in
> frightening the user etc.) common error handling scenarios by default.
Andi, may I respectfully submit that you're not understanding real users here?
Real users either:
> the scsi layer will have the IO totally aborted within that time anyway;
> the retry timeout for disks is 30 seconds after all.
There are blocking waits who wait for multiple IOs.
Also i think the SCSI driver can tune this anyways and I suspect
iSCSI and friends increase it (?)
-Andi
--
To u
On Mon, 3 Dec 2007 11:27:15 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly
> > broken.
>
> What should it do when the NFS server doesn't answer anymore or
> when the network to the SAN RAID array located a few hundred KM away
> devel
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > debugging feature can be disabled/enabled on a wide scale already:
> >
> > - in the .config
> >
> > - runtime, temporarily, via:
> >
> > echo 0 > /proc/sys/kernel/hung_task_timeout_secs
>
> That won't address my concerns about already "breaki
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> Now Ingo's latest unreleased version with single line messages might
> be actually ok if he turns off the backtraces by default.
> Unfortunately I wasn't able to find out so far if he has done that or
> not, he always cuts away these parts of the email
On Mon, Dec 03, 2007 at 02:55:47PM +0100, Ingo Molnar wrote:
>
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > I would still appreciate if you could state what default value you
> > plan to set the backtrace sysctl to in the submitted patch.
>
> there's no "backtrace sysctl" planned for the mom
On Mon, Dec 03, 2007 at 02:59:16PM +0100, Ingo Molnar wrote:
> Andi, is that true? If yes, why didnt Andi state this concern outright,
> instead of pooh-pooh-ing the patch on various other grounds?
No of course not. Radoslaw is talking nonsense.
-Andi
--
To unsubscribe from this list: send the l
> It's more like "lets warn about it and fix the problems when we find
> some."
It is already known there are lots of problems. I won't repeat
them because I already wrote too much about them. Feel free
to read back in the thread.
Now if all the known problems are fixed and only some hard to kno
* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote:
> On Mon, 3 Dec 2007 14:29:56 +0100
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > > > feedback about an impending catastrophy has been duly noted
> > >
> > > The point was less about an impending catastrophe, but more of a
> > > timebomb
* Pekka Enberg <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > > "audit thousands of callsites in 8 million lines of code first" is a
> > > nice euphemism for hiding from the blame forever. We had 10 years for it
>
> On Dec 3, 2007 2:13 PM,
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> I would still appreciate if you could state what default value you
> plan to set the backtrace sysctl to in the submitted patch.
there's no "backtrace sysctl" planned for the moment. This "hung tasks"
debugging feature can be disabled/enabled on a wide
Hi,
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > "audit thousands of callsites in 8 million lines of code first" is a
> > nice euphemism for hiding from the blame forever. We had 10 years for it
On Dec 3, 2007 2:13 PM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> Ok your approach i
I would still appreciate if you could state what default value
you plan to set the backtrace sysctl to in the submitted patch.
-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.
On Mon, 3 Dec 2007 14:29:56 +0100
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > > feedback about an impending catastrophy has been duly noted
> >
> > The point was less about an impending catastrophe, but more of a
> > timebomb ticking until the next widely used release.
I think I know why An
> negative
I would consider it positive, but ok. If I was negative I would
probably not care and just make always sure to disable SOFTLOCKUP
in the kernels I use.
> feedback about an impending catastrophy has been duly noted
The point was less about an impending catastrophe, but more of a tim
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > you are over-designing it way too much - a backtrace is obviously
> > very helpful and it must be printed by default. There's enough
> > configurability in it already so that you can turn it off if you
> > want.
>
> So it will hit everybody first be
On Mon, Dec 03, 2007 at 01:28:33PM +0100, Ingo Molnar wrote:
>
> > On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > > no. (that's why i added the '(or a kill -9)' qualification above - if
> > > NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
> > > should no
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > no. (that's why i added the '(or a kill -9)' qualification above - if
> > NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
> > should not have an interrupting effect.
On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> no. (that's why i added the '(or a kill -9)' qualification above - if
> NFS is mounted noninterruptible then standard signals (such as Ctrl-C)
> should not have an interrupting effect.)
NFS is already interruptible with umount -f (I
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
> >
> > * Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
> > >
> > > What should it do when the NFS server doesn't ans
On Mon, Dec 03, 2007 at 11:38:15AM +0100, Ingo Molnar wrote:
>
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
> >
> > What should it do when the NFS server doesn't answer anymore or when
> > the network to the SAN RAID arr
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
>
> What should it do when the NFS server doesn't answer anymore or when
> the network to the SAN RAID array located a few hundred KM away
> develops some hickup? [...]
maybe: if
> Kernel waiting 2 minutes on TASK_UNINTERRUPTIBLE is certainly broken.
What should it do when the NFS server doesn't answer anymore or
when the network to the SAN RAID array located a few hundred KM away develops
some hickup? Or just the SCSI driver decides to do lengthy error
recovery -- yo
* Radoslaw Szkodzinski <[EMAIL PROTECTED]> wrote:
> > iirc TASK_KILLABLE fixed NFS only. While that's a good thing there
> > are unfortunately a lot more subsystems that would need the same
> > treatment.
>
> Yes, that's exactly why the patch is needed - to find the bugs and fix
> them. Other
On Mon, 3 Dec 2007 10:55:01 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
> > On Mon, 3 Dec 2007 01:07:41 +0100
> > Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > > This patch will likely work against that by breaking error paths.
>
On Sun, Dec 02, 2007 at 04:59:13PM -0800, Arjan van de Ven wrote:
> On Mon, 3 Dec 2007 01:07:41 +0100
> Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > > We really need to get better diagnostics for the
> > > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
> > > get to the scenario w
On Mon, 3 Dec 2007 01:07:41 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> > We really need to get better diagnostics for the
> > bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to
> > get to the scenario where we have a more or less robust measure of
> > kernel quality (and we're no
> We really need to get better diagnostics for the
> bad-kernel-behavior-that-is-seen-as-bug cases. If we ever want to get
> to the scenario where we have a more or less robust measure of kernel
> quality (and we're not all that far off for several cases), one thing
One measure to kernel quality i
> Delay accounting (or the /proc//sched fields that i added recently)
> only get updated once a task has finished its unreasonably long delay
> and has scheduled.
If it is stuck forever then you can just use sysrq-t
If it recovers delay accounting will catch it.
> detected_ this way. This is
On Sun, 2 Dec 2007 21:47:25 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Out of direct experience, 95% of the "too long delay" cases are
> > plain old bugs. The rest we can (and must!) convert to
> > TASK_KILLABLE or could
>
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_
> > something that most humans consider as "buggy" in the overwhelming
> > majority of cases, regardless of the reason? Yes, there are and will
> > be some exceptions, but not nearly
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Until now users had little direct recourse to get such problems
> > fixed. (we had sysrq-t, but that included no real metric of how long
> > a task was
>
> Actually task delay accounting can measure this now. iirc someone had
> a latencytop based o
On Sun, 2 Dec 2007 22:19:25 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > Until now users had little direct recourse to get such problems
> > fixed. (we had sysrq-t, but that included no real metric of how
> > long a task was
>
> Actually task delay accounting can measure this now. iirc
Ingo Molnar <[EMAIL PROTECTED]> writes:
>
> do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_
> something that most humans consider as "buggy" in the overwhelming
> majority of cases, regardless of the reason? Yes, there are and will be
> some exceptions, but not nearly as coun
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
> > what if you considered - just for a minute - the possibility of this
> > debug tool being the thing that actually animates developers to fix such
> > long delay bugs that have bothered use
On Sun, Dec 02, 2007 at 10:10:27PM +0100, Ingo Molnar wrote:
> what if you considered - just for a minute - the possibility of this
> debug tool being the thing that actually animates developers to fix such
> long delay bugs that have bothered users for almost a decade meanwhile?
Throwing freque
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > Out of direct experience, 95% of the "too long delay" cases are plain
> > old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
>
> I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs). It
> would be pretty bad to merg
> Out of direct experience, 95% of the "too long delay" cases are plain
> old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could
I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).
It would be pretty bad to merge this patch without converting them to
TASK_KILLA
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> > .. and it's even a tool to show where we missed making something
> > TASK_KILLABLE... anything that triggers from NFS and the like really
> > ought to be TASK_KILLABLE after all. This patch will point any
> > omissions out quite nicely without having
* Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > TASK_KILLABLE should be the right solution i think.
>
> .. and it's even a tool to show where we missed making something
> TASK_KILLABLE... anything that triggers from NFS and the like really
> ought to be TASK_KILLABLE after all. This patch wi
> .. and it's even a tool to show where we missed making something
> TASK_KILLABLE... anything that triggers from NFS and the like really
> ought to be TASK_KILLABLE after all. This patch will point any
> omissions out quite nicely without having to do any kind of destructive
> testing.
It would b
On Sun, 2 Dec 2007 19:59:45 +0100
Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > Ingo Molnar <[EMAIL PROTECTED]> writes:
> >
> > > this patch extends the soft-lockup detector to automatically
> > > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks
* Andi Kleen <[EMAIL PROTECTED]> wrote:
> Ingo Molnar <[EMAIL PROTECTED]> writes:
>
> > this patch extends the soft-lockup detector to automatically
> > detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> > printed the following way:
>
> That will likely trigger anytime a hard nfs/cif
Ingo Molnar <[EMAIL PROTECTED]> writes:
> this patch extends the soft-lockup detector to automatically
> detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> printed the following way:
That will likely trigger anytime a hard nfs/cifs mount loses its
server for 120s. To make this work you
On Sun, 2 Dec 2007, Ingo Oeser wrote:
> > maybe, but we'd have to see how often this gets triggered. An OOM is
> > something that could happen in any overloaded system - while a hung task
> > is likely due to a kernel bug.
>
> What about a client using hard mounted NFS shares here? That shouldn
* Ingo Oeser <[EMAIL PROTECTED]> wrote:
> On Saturday 01 December 2007, Ingo Molnar wrote:
> > maybe, but we'd have to see how often this gets triggered. An OOM is
> > something that could happen in any overloaded system - while a hung task
> > is likely due to a kernel bug.
>
> What about a c
On Saturday 01 December 2007, Ingo Molnar wrote:
> maybe, but we'd have to see how often this gets triggered. An OOM is
> something that could happen in any overloaded system - while a hung task
> is likely due to a kernel bug.
What about a client using hard mounted NFS shares here? That shouldn
* David Rientjes <[EMAIL PROTECTED]> wrote:
> > this patch extends the soft-lockup detector to automatically detect
> > hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are printed the
> > following way:
>
> Wouldn't a natural extension of this feature be to mark these hung
> TASK_UNINTERRUPT
On Sat, 1 Dec 2007, Ingo Molnar wrote:
> this patch extends the soft-lockup detector to automatically
> detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are
> printed the following way:
>
Wouldn't a natural extension of this feature be to mark these hung
TASK_UNINTERRUPTIBLE tasks with a
* David Rientjes <[EMAIL PROTECTED]> wrote:
> The checked auto variable isn't doing anything in
> check_hung_uninterruptible_tasks().
yeah, i used to print it out in a printk but removed it in the final
version.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-k
The checked auto variable isn't doing anything in
check_hung_uninterruptible_tasks().
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
kernel/softlockup.c |5 +
1 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/kernel/softlockup.c b/kernel/softlockup.c
--- a/kernel/softl
57 matches
Mail list logo