On Sunday, 29 April 2007 10:23, Pavel Machek wrote: > Hi! > > > > > The freezer has *caused* those deadlocks (eg by stopping threads that > > > > were > > > > needed for the suspend writeouts to succeed!), not solved them. > > > > > > I can't remember anything like this, but I believe you have a specific > > > test > > > case in mind. > > > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? > > > > Rafael, you really don't know what you're talking about, do you? > > > > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > > exactly *because* they do IO. You claim that kernel threads shouldn't do > > IO, but that's the point: if you cannot do IO when snapshotting to disk, > > here's a damn big clue for you: how do you think that snapshot is going to > > get written? > > > > I *guarantee* you that we've had a lot more problems with threads that > > should *not* have been frozen than with those hypothetical threads that > > you think should have been frozen. > > Well, we had nasty corruption on XFS, caused by thread that was not > frozen and should be. (While the other case leads "only" to deadlocks, > so it is easier to debug.) > > The locking point.. when I added freezing to swsusp, I knew very > little about kernel locking, so I "simply" decided to avoid the > problem altogether... using the freezer. > > You may be right that locks are not a big problem for the hibernation > after all; I just do not know.
Still, I think, if a kernel thread is a part of a device driver, then _in_ _principle_ it needs _some_ synchronization with the driver's suspend/freeze and resume/thaw callbacks. For example, it's reasonable to assume that the thread should be quiet between suspend/freeze and resume/thaw. With the freezing of kernel threads we provide a simple means of such synchronization: use try_to_freeze() in a suitable place of your kernel thread and you're done. [Well, there should be a second part for making the thread die if the thaw callback doesn't find the device, but that's in the works.] Without it, there may be race conditions that we are not even aware of and that may trigger in, say, 1 in 10 suspends or so and I wish you good luck with debugging such things. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/