At Tue, 4 Dec 2012 23:54:39 +0800, Daniel J Blueman wrote: > > On 4 December 2012 23:03, Takashi Iwai <ti...@suse.de> wrote: > > At Tue, 4 Dec 2012 22:46:47 +0800, > > Daniel J Blueman wrote: > >> > >> On 4 December 2012 21:55, Takashi Iwai <ti...@suse.de> wrote: > >> > At Tue, 04 Dec 2012 14:23:05 +0100, > >> > Takashi Iwai wrote: > >> >> > >> >> At Tue, 4 Dec 2012 20:58:55 +0800, > >> >> Daniel J Blueman wrote: > >> >> > > >> >> > On 4 December 2012 01:10, Takashi Iwai <ti...@suse.de> wrote: > >> >> > > At Tue, 4 Dec 2012 00:50:56 +0800, > >> >> > > Daniel J Blueman wrote: > >> >> > >> > >> >> > >> On 4 December 2012 00:23, Takashi Iwai <ti...@suse.de> wrote: > >> >> > >> > At Mon, 3 Dec 2012 23:08:28 +0800, > >> >> > >> > Daniel J Blueman wrote: > >> >> > >> >> > >> >> > >> >> On 3 December 2012 22:40, Takashi Iwai <ti...@suse.de> wrote: > >> >> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800, > >> >> > >> >> > Daniel J Blueman wrote: > >> >> > >> >> >> > >> >> > >> >> >> On 3 December 2012 19:17, Takashi Iwai <ti...@suse.de> wrote: > >> >> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100, > >> >> > >> >> >> > Takashi Iwai wrote: > >> >> > >> >> >> >> > >> >> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800, > >> >> > >> >> >> >> Daniel J Blueman wrote: > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > Hi Seth, Dave, Takashi, > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > If I power down the unused discrete GPU before lightdm > >> >> > >> >> >> >> > starts by > >> >> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, > >> >> > >> >> >> >> > I see a race > >> >> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing > >> >> > >> >> >> >> > out to > >> >> > >> >> >> >> > commands [2]. > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > Adding some debug, I see that the registered audio > >> >> > >> >> >> >> > devices are put > >> >> > >> >> >> >> > into D3 before the GPU is, but it turns out that the > >> >> > >> >> >> >> > discrete (and > >> >> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit > >> >> > >> >> >> >> > later, so the > >> >> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's > >> >> > >> >> >> >> > talking to > >> >> > >> >> >> >> > hardware which is now in D3. > >> >> > >> >> >> >> > > >> >> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for > >> >> > >> >> >> >> > the DGPU HDA > >> >> > >> >> >> >> > controller, but perhaps this should be solved at a > >> >> > >> >> >> >> > higher level in the > >> >> > >> >> >> >> > vgaswitcheroo code; what do you think? > >> >> > >> >> >> >> > >> >> > >> >> >> >> Maybe it's a side effect for the recent effort to fix > >> >> > >> >> >> >> another race in > >> >> > >> >> >> >> the probe. A part of them problem is that the > >> >> > >> >> >> >> registration is done at > >> >> > >> >> >> >> the very last of probing. > >> >> > >> >> >> >> > >> >> > >> >> >> >> Instead of delaying the registration, how about the patch > >> >> > >> >> >> >> below? > >> >> > >> >> >> > > >> >> > >> >> >> > Ping. If this really works, I'd like to queue it for 3.8 > >> >> > >> >> >> > merge, at > >> >> > >> >> >> > least... > >> >> > >> >> >> > >> >> > >> >> >> Ping ack; I was trying to find time to understand another > >> >> > >> >> >> race that > >> >> > >> >> >> occurs with GPU probing after switching, but is separate > >> >> > >> >> >> from the > >> >> > >> >> >> situation before switching, here. > >> >> > >> >> >> > >> >> > >> >> >> In the context of writing the switch, it looks like struct > >> >> > >> >> >> azx isn't > >> >> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; > >> >> > >> >> >> racing with > >> >> > >> >> >> azx_codec_create? > >> >> > >> >> > > >> >> > >> >> > It was allocated, but it wasn't assigned properly in pci > >> >> > >> >> > drvdata. > >> >> > >> >> > > >> >> > >> >> > Below is the revised patch. Just moved pci_set_drvdata() > >> >> > >> >> > before > >> >> > >> >> > register_vga_switcheroo(). Could you retest with it? > >> >> > >> >> > >> >> > >> >> Superb; this addresses the oops. > >> >> > >> > > >> >> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable. > >> >> > >> > > >> >> > >> >> ~1 second after the DGPU is put into D3, I still often see > >> >> > >> >> "hda-intel: > >> >> > >> >> spurious response 0x0:0x0, last cmd=0x170500": > >> >> > >> >> http://quora.org/2012/hda-switch-spurious.txt > >> >> > >> > > >> >> > >> > Hm, it's not clear who triggers these messages. I'll try to > >> >> > >> > check the > >> >> > >> > code paths. > >> >> > >> > > >> >> > >> >> Presumably this implies the read of the ring-buffer pointer > >> >> > >> >> returned > >> >> > >> >> 0xffffffff, so the HDA driver understands the pointer to have > >> >> > >> >> wrapped > >> >> > >> >> and processes the 191 unwritten entries? > >> >> > >> > > >> >> > >> > Good point. Actually there is one bug that looks obviously wrong > >> >> > >> > (writing 32bit value to CORBWP). Maybe it has been working just > >> >> > >> > because writing CORBRP doesn't influence except for the reset > >> >> > >> > bit. > >> >> > >> > > >> >> > >> > Reading CORBWP as a byte is OK, but this could be better in a > >> >> > >> > word so > >> >> > >> > that we can check 0xffff as invalid. > >> >> > >> > > >> >> > >> > A test patch is below. Hopefully this improves the situation... > >> >> > >> > >> >> > >> I'll check this out tomorrow and also instrument the code to get a > >> >> > >> backtrace, since there may still be an underlying race with the > >> >> > >> previous patches: > >> >> > > >> >> > [...] > >> >> > > >> >> > > That's odd. The Cirrus one (0000:00:1b.0) must be independent from > >> >> > > the vga switcheroo things for Nvidia... > >> >> > > > >> >> > > The patch below is the revised patch of the first one. Now the vga > >> >> > > switcheroo registration is done before the video controller D3 > >> >> > > check, > >> >> > > so the race should be smaller. > >> >> > > >> >> > This patch improves things further of course; a major improvement over > >> >> > without. ~15% of the time, I still get the 'spurious response' > >> >> > messages in this callpath: > >> >> > >> >> A possible scenario is that the graphics went in D3 before probing > >> >> hd-audio, and the hd-audio continues to initialize the hardware > >> >> without knowing the graphics counterpart is disabled. > >> >> > >> >> There is a code check_hdmi_disabled() in hda_intel.c that checks the > >> >> availability of the video driver, and it might be that this doesn't > >> >> work as expected... > >> > > >> > I think I understand this path. You are setting "OFF", right? > >> > This will set the audio client OFF before can_switch() is called. > >> > Thus it can be called even before the probing process finished. > >> > > >> > In short, wait_for_completion() must be put at the beginning of > >> > azx_vs_set_state() in addition to azx_vs_can_switch(). > >> > > >> > The revised patch is attached below. Hopefully this sorts out all > >> > races. > >> > >> This works great and instead, I get "Cannot lock devices!" sometimes > >> (see http://quora.org/2012/hda-switch-lock.txt ). > > > > This means that some apps (likely udev, PulseAudio or whatever) are > > accessing sound devices during you turn off the audio / D-GPU. It > > wouldn't be too big trouble, but it's still a bit risky. > > It looks like this prevents suspending, since the waiting process is > sleeping uninterruptably in snd_power_wait (updated > http://quora.org/2012/hda-switch-lock.txt with the details).
Grrr, yet missing piece. The patch below should fix it. Takashi --- From: Takashi Iwai <ti...@suse.de> Subject: [PATCH] ALSA: hda - Avoid doubly suspend after vga switcheroo The HD-audio driver artificially calls the suspend and the resume code path in the VGA switcheroo state changes. When a machine goes to suspend, it tries to suspend the device again, and it stalls at snd_power_wait(). This patch adds checks whether the devices were already in (forced) suspend in PM callbacks for avoiding the doubly suspend. Reported-by: Daniel J Blueman <dan...@quora.org> Signed-off-by: Takashi Iwai <ti...@suse.de> --- sound/pci/hda/hda_intel.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 22ecadc..6a251ec 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -2624,6 +2624,9 @@ static int azx_suspend(struct device *dev) struct azx *chip = card->private_data; struct azx_pcm *p; + if (chip->disabled) + return 0; + snd_power_change_state(card, SNDRV_CTL_POWER_D3hot); azx_clear_irq_pending(chip); list_for_each_entry(p, &chip->pcm_list, list) @@ -2649,6 +2652,9 @@ static int azx_resume(struct device *dev) struct snd_card *card = dev_get_drvdata(dev); struct azx *chip = card->private_data; + if (chip->disabled) + return 0; + pci_set_power_state(pci, PCI_D0); pci_restore_state(pci); if (pci_enable_device(pci) < 0) { -- 1.8.0.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/