At Tue, 04 Dec 2012 14:23:05 +0100,
Takashi Iwai wrote:
> 
> At Tue, 4 Dec 2012 20:58:55 +0800,
> Daniel J Blueman wrote:
> > 
> > On 4 December 2012 01:10, Takashi Iwai <ti...@suse.de> wrote:
> > > At Tue, 4 Dec 2012 00:50:56 +0800,
> > > Daniel J Blueman wrote:
> > >>
> > >> On 4 December 2012 00:23, Takashi Iwai <ti...@suse.de> wrote:
> > >> > At Mon, 3 Dec 2012 23:08:28 +0800,
> > >> > Daniel J Blueman wrote:
> > >> >>
> > >> >> On 3 December 2012 22:40, Takashi Iwai <ti...@suse.de> wrote:
> > >> >> > At Mon, 3 Dec 2012 22:25:52 +0800,
> > >> >> > Daniel J Blueman wrote:
> > >> >> >>
> > >> >> >> On 3 December 2012 19:17, Takashi Iwai <ti...@suse.de> wrote:
> > >> >> >> > At Wed, 28 Nov 2012 09:45:39 +0100,
> > >> >> >> > Takashi Iwai wrote:
> > >> >> >> >>
> > >> >> >> >> At Wed, 28 Nov 2012 11:45:07 +0800,
> > >> >> >> >> Daniel J Blueman wrote:
> > >> >> >> >> >
> > >> >> >> >> > Hi Seth, Dave, Takashi,
> > >> >> >> >> >
> > >> >> >> >> > If I power down the unused discrete GPU before lightdm starts 
> > >> >> >> >> > by
> > >> >> >> >> > fiddling with the sysfs file [1] in the upstart script, I see 
> > >> >> >> >> > a race
> > >> >> >> >> > manifesting as the discrete GPU's HDA controller timing out to
> > >> >> >> >> > commands [2].
> > >> >> >> >> >
> > >> >> >> >> > Adding some debug, I see that the registered audio devices 
> > >> >> >> >> > are put
> > >> >> >> >> > into D3 before the GPU is, but it turns out that the discrete 
> > >> >> >> >> > (and
> > >> >> >> >> > internal) GPU's HDA controller gets registered a bit later, 
> > >> >> >> >> > so the
> > >> >> >> >> > list is empty. The symptom is since the HDA driver it's 
> > >> >> >> >> > talking to
> > >> >> >> >> > hardware which is now in D3.
> > >> >> >> >> >
> > >> >> >> >> > We could add a mutex to nouveau to allow us to wait for the 
> > >> >> >> >> > DGPU HDA
> > >> >> >> >> > controller, but perhaps this should be solved at a higher 
> > >> >> >> >> > level in the
> > >> >> >> >> > vgaswitcheroo code; what do you think?
> > >> >> >> >>
> > >> >> >> >> Maybe it's a side effect for the recent effort to fix another 
> > >> >> >> >> race in
> > >> >> >> >> the probe.  A part of them problem is that the registration is 
> > >> >> >> >> done at
> > >> >> >> >> the very last of probing.
> > >> >> >> >>
> > >> >> >> >> Instead of delaying the registration, how about the patch below?
> > >> >> >> >
> > >> >> >> > Ping.  If this really works, I'd like to queue it for 3.8 merge, 
> > >> >> >> > at
> > >> >> >> > least...
> > >> >> >>
> > >> >> >> Ping ack; I was trying to find time to understand another race that
> > >> >> >> occurs with GPU probing after switching, but is separate from the
> > >> >> >> situation before switching, here.
> > >> >> >>
> > >> >> >> In the context of writing the switch, it looks like struct azx 
> > >> >> >> isn't
> > >> >> >> allocated by the time azx_vs_set_state accesses it [1,2]; racing 
> > >> >> >> with
> > >> >> >> azx_codec_create?
> > >> >> >
> > >> >> > It was allocated, but it wasn't assigned properly in pci drvdata.
> > >> >> >
> > >> >> > Below is the revised patch.  Just moved pci_set_drvdata() before
> > >> >> > register_vga_switcheroo().  Could you retest with it?
> > >> >>
> > >> >> Superb; this addresses the oops.
> > >> >
> > >> > OK, I'll queue it to sound tree for 3.8 kernel with Cc to stable.
> > >> >
> > >> >> ~1 second after the DGPU is put into D3, I still often see "hda-intel:
> > >> >> spurious response 0x0:0x0, last cmd=0x170500":
> > >> >> http://quora.org/2012/hda-switch-spurious.txt
> > >> >
> > >> > Hm, it's not clear who triggers these messages.  I'll try to check the
> > >> > code paths.
> > >> >
> > >> >> Presumably this implies the read of the ring-buffer pointer returned
> > >> >> 0xffffffff, so the HDA driver understands the pointer to have wrapped
> > >> >> and processes the 191 unwritten entries?
> > >> >
> > >> > Good point.  Actually there is one bug that looks obviously wrong
> > >> > (writing 32bit value to CORBWP).  Maybe it has been working just
> > >> > because writing CORBRP doesn't influence except for the reset bit.
> > >> >
> > >> > Reading CORBWP as a byte is OK, but this could be better in a word so
> > >> > that we can check 0xffff as invalid.
> > >> >
> > >> > A test patch is below.  Hopefully this improves the situation...
> > >>
> > >> I'll check this out tomorrow and also instrument the code to get a
> > >> backtrace, since there may still be an underlying race with the
> > >> previous patches:
> > 
> > [...]
> > 
> > > That's odd.  The Cirrus one (0000:00:1b.0) must be independent from
> > > the vga switcheroo things for Nvidia...
> > >
> > > The patch below is the revised patch of the first one.  Now the vga
> > > switcheroo registration is done before the video controller D3 check,
> > > so the race should be smaller.
> > 
> > This patch improves things further of course; a major improvement over
> > without. ~15% of the time, I still get the 'spurious response'
> > messages in this callpath:
> 
> A possible scenario is that the graphics went in D3 before probing
> hd-audio, and the hd-audio continues to initialize the hardware
> without knowing the graphics counterpart is disabled.
> 
> There is a code check_hdmi_disabled() in hda_intel.c that checks the 
> availability of the video driver, and it might be that this doesn't
> work as expected...

I think I understand this path.  You are setting "OFF", right?
This will set the audio client OFF before can_switch() is called.
Thus it can be called even before the probing process finished.

In short, wait_for_completion() must be put at the beginning of
azx_vs_set_state() in addition to azx_vs_can_switch().

The revised patch is attached below.  Hopefully this sorts out all
races.


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 4bb52da..22ecadc 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -49,6 +49,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/clocksource.h>
 #include <linux/time.h>
+#include <linux/completion.h>
 
 #ifdef CONFIG_X86
 /* for snoop control */
@@ -469,6 +470,7 @@ struct azx {
        /* locks */
        spinlock_t reg_lock;
        struct mutex open_mutex;
+       struct completion probe_wait;
 
        /* streams (x num_streams) */
        struct azx_dev *azx_dev;
@@ -2745,6 +2747,7 @@ static void azx_vs_set_state(struct pci_dev *pci,
        struct azx *chip = card->private_data;
        bool disabled;
 
+       wait_for_completion(&chip->probe_wait);
        if (chip->init_failed)
                return;
 
@@ -2790,6 +2793,7 @@ static bool azx_vs_can_switch(struct pci_dev *pci)
        struct snd_card *card = pci_get_drvdata(pci);
        struct azx *chip = card->private_data;
 
+       wait_for_completion(&chip->probe_wait);
        if (chip->init_failed)
                return false;
        if (chip->disabled || !chip->bus)
@@ -2851,6 +2855,9 @@ static int azx_free(struct azx *chip)
 
        azx_notifier_unregister(chip);
 
+       chip->init_failed = 1; /* to be sure */
+       complete(&chip->probe_wait);
+
        if (use_vga_switcheroo(chip)) {
                if (chip->disabled && chip->bus)
                        snd_hda_unlock_devices(chip->bus);
@@ -3156,6 +3163,7 @@ static int __devinit azx_create(struct snd_card *card, 
struct pci_dev *pci,
        INIT_LIST_HEAD(&chip->pcm_list);
        INIT_LIST_HEAD(&chip->list);
        init_vga_switcheroo(chip);
+       init_completion(&chip->probe_wait);
 
        chip->position_fix[0] = chip->position_fix[1] =
                check_position_fix(chip, position_fix[dev]);
@@ -3183,26 +3191,6 @@ static int __devinit azx_create(struct snd_card *card, 
struct pci_dev *pci,
                }
        }
 
-       if (check_hdmi_disabled(pci)) {
-               snd_printk(KERN_INFO SFX "VGA controller for %s is disabled\n",
-                          pci_name(pci));
-               if (use_vga_switcheroo(chip)) {
-                       snd_printk(KERN_INFO SFX "Delaying initialization\n");
-                       chip->disabled = true;
-                       goto ok;
-               }
-               kfree(chip);
-               pci_disable_device(pci);
-               return -ENXIO;
-       }
-
-       err = azx_first_init(chip);
-       if (err < 0) {
-               azx_free(chip);
-               return err;
-       }
-
- ok:
        err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops);
        if (err < 0) {
                snd_printk(KERN_ERR SFX "Error creating device [card]!\n");
@@ -3447,7 +3435,29 @@ static int __devinit azx_probe(struct pci_dev *pci,
        if (err < 0)
                goto out_free;
        card->private_data = chip;
+
+       pci_set_drvdata(pci, card);
+
+       err = register_vga_switcheroo(chip);
+       if (err < 0) {
+               snd_printk(KERN_ERR SFX
+                          "Error registering VGA-switcheroo client\n");
+               goto out_free;
+       }
+
+       if (check_hdmi_disabled(pci)) {
+               snd_printk(KERN_INFO SFX "VGA controller for %s is disabled\n",
+                          pci_name(pci));
+               snd_printk(KERN_INFO SFX "Delaying initialization\n");
+               chip->disabled = true;
+       }
+
        probe_now = !chip->disabled;
+       if (probe_now) {
+               err = azx_first_init(chip);
+               if (err < 0)
+                       goto out_free;
+       }
 
 #ifdef CONFIG_SND_HDA_PATCH_LOADER
        if (patch[dev] && *patch[dev]) {
@@ -3468,23 +3478,16 @@ static int __devinit azx_probe(struct pci_dev *pci,
                        goto out_free;
        }
 
-       pci_set_drvdata(pci, card);
-
        if (pci_dev_run_wake(pci))
                pm_runtime_put_noidle(&pci->dev);
 
-       err = register_vga_switcheroo(chip);
-       if (err < 0) {
-               snd_printk(KERN_ERR SFX
-                          "Error registering VGA-switcheroo client\n");
-               goto out_free;
-       }
-
        dev++;
+       complete(&chip->probe_wait);
        return 0;
 
 out_free:
        snd_card_free(card);
+       pci_set_drvdata(pci, NULL);
        return err;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to