Launchpad has imported 75 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=109681.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2015-12-20T03:39:56+00:00 wengxt wrote:

Wifi looks fine at first after boot, but after hours of use or even
idle, it goes into a unrecoverable state (unable to send and receive)
until next reboot. dmesg logs shows lots of following messages:

[ 6822.586704] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6822.586731] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6822.586899] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6822.586904] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6827.925487] mwifiex_pcie 0000:02:00.0: 4296924600 : Tx timeout(#40), 
bss_type-num = 0-0
[ 6828.584671] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6828.584687] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6828.584755] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6828.584763] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6834.587608] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6834.587623] mwifiex_pcie 0000:02:00.0: failed to get signal information
[ 6834.587698] mwifiex_pcie 0000:02:00.0: PREP_CMD: FW in reset state
[ 6834.587706] mwifiex_pcie 0000:02:00.0: failed to get signal information

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/0

------------------------------------------------------------------------
On 2015-12-21T06:39:55+00:00 wengxt wrote:

Created attachment 197921
a more complete log when problem happens.

Attach a more complete log when problem happens.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/1

------------------------------------------------------------------------
On 2015-12-21T06:45:09+00:00 akarwar wrote:

Thanks for reporting the problem. This is a command timeout problem caused 
mostly due to a firmware bug.
Which firmware is being used here?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/2

------------------------------------------------------------------------
On 2015-12-21T06:47:45+00:00 wengxt wrote:

It's commit bbe4917 from linux-firmware git repository, not quite sure
about the actual version.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/3

------------------------------------------------------------------------
On 2015-12-21T06:53:15+00:00 akarwar wrote:

Could you try following latest firmware?
http://git.marvell.com/?p=mwifiex-firmware.git;a=commit;h=495e89368ea4f59bf6f8e54e94b5e8448fadd541

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/4

------------------------------------------------------------------------
On 2015-12-22T00:14:21+00:00 wengxt wrote:

Created attachment 197961
log with firmware 15.68.7.p53

At first I though it's fixed, but I just got into similar situation. Log
attached.

BTW, I also seems to have unstable connection when wifi is working. E.g.
I open google.com in firefox with google's instant search feature
enabled. While typing, The result usually not displayed and I have to
refresh page several times to get web page displayed. The other laptop I
have doesn't show the same problem so it's not network issue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/5

------------------------------------------------------------------------
On 2016-01-03T16:27:33+00:00 anton wrote:

I have the same problem with mwifiex_pcie (latest git firmware) on
Surface Pro 3

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/6

------------------------------------------------------------------------
On 2016-01-23T02:25:07+00:00 wengxt wrote:

Created attachment 201031
A more complete log with debug flag set to 0xffffff

4.3.3 with firmware 15.68.7.p53

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/7

------------------------------------------------------------------------
On 2016-01-25T22:42:37+00:00 mikael wrote:

I have the same problem. In addition, using mwiflex_pcie on Surface Pro
4 seems to cause intermittent system crashes.

Using kernel 4.3.3-5 compiled from linux-source.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/8

------------------------------------------------------------------------
On 2016-01-26T12:39:25+00:00 andrea wrote:

I have the same problem on my Microsoft Surface 3 (not Pro model), same issues 
(even though the rest of system is stable) and same dmesg printout.
Lspci lists the card on my tablet as: Marvell Technology Group Ltd. 88W8897 
[AVASTAR] 802.11ac Wireless.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/9

------------------------------------------------------------------------
On 2016-01-26T12:41:12+00:00 andrea wrote:

I have also noticed that the issues appear as soon as I start using the
wireless card heavily (downloading files, apt update, etc.). As long as
I just use ping, everything is fine.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/10

------------------------------------------------------------------------
On 2016-03-14T21:25:40+00:00 wengxt wrote:

It becomes much worse in linux 4.5 comparing with 4.4.2.

Is there any update on this issue?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/11

------------------------------------------------------------------------
On 2016-03-15T16:35:20+00:00 anton wrote:

Created attachment 209311
attachment-2572-0.html

The driver was updated recently:

Из git://git.marvell.com/mwifiex-firmware
   e92f8b3..a96efa0  master     -> origin/master
Обновление e92f8b3..a96efa0
Fast-forward
 mrvl/pcie8897_uapsta.bin | Bin 803884 -> 816772 bytes
 mrvl/sd8897_uapsta.bin   | Bin 780760 -> 794148 bytes


2016-03-15 0:25 GMT+03:00 <bugzilla-dae...@bugzilla.kernel.org>:

> https://bugzilla.kernel.org/show_bug.cgi?id=109681
>
> --- Comment #11 from Weng Xuetian <wen...@gmail.com> ---
> It becomes much worse in linux 4.5 comparing with 4.4.2.
>
> Is there any update on this issue?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/12

------------------------------------------------------------------------
On 2016-03-16T03:52:19+00:00 wengxt wrote:

Actually I'm aware of that.

However, with latest firmware + linux 4.5, my system would simply freeze
in a few minutes that I can't get any useful information for this bug.

With 15.68.7.p53 my kernel could live longer, but with kernel 4.5, wifi
also enters bad state quite fast comparing with 4.4.2 or 4.3.3.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/13

------------------------------------------------------------------------
On 2016-03-16T03:55:05+00:00 wengxt wrote:

The only thing I notice so far is that:

With latest firmware + linux 4.3.3, I notice a lot of "mwifiex_pcie
0000:02:00.0: mwifiex_process_sleep_confirm_resp: cmd size is 0" when
wifi enters bad state, which doesn't happen with 15.68.7.p53.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/14

------------------------------------------------------------------------
On 2016-03-16T06:05:09+00:00 akarwar wrote:

Hi Weng,

Can you provide complete log for the issue mentioned in comment#14?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/15

------------------------------------------------------------------------
On 2016-03-16T09:35:54+00:00 anton wrote:

Created attachment 209441
attachment-7733-0.html

My system freezes too with 4.5 kernel and latest and previous versions of
firmware.

2016-03-16 6:52 GMT+03:00 <bugzilla-dae...@bugzilla.kernel.org>:

> https://bugzilla.kernel.org/show_bug.cgi?id=109681
>
> --- Comment #13 from Weng Xuetian <wen...@gmail.com> ---
> Actually I'm aware of that.
>
> However, with latest firmware + linux 4.5, my system would simply freeze
> in a
> few minutes that I can't get any useful information for this bug.
>
> With 15.68.7.p53 my kernel could live longer, but with kernel 4.5, wifi
> also
> enters bad state quite fast comparing with 4.4.2 or 4.3.3.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/16

------------------------------------------------------------------------
On 2016-03-24T17:29:41+00:00 wengxt wrote:

Created attachment 210651
kernel 4.5 log with debug mask 0xfffffff

The complete log is too large(286MB), so I cut it to last 1 minute
before PREP_CMD: FW is in bad state happens.

mwifiex_pcie 0000:02:00.0: info: MWIFIEX VERSION: mwifiex 1.0
(15.68.7.p66)

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/17

------------------------------------------------------------------------
On 2016-04-04T19:17:14+00:00 jwhite wrote:

I'm experiencing a variety of problems with this driver as well.

Using Linux v4.6-rc1-64-g6ddf37d, firmware 15.68.7.p66.

The first failure is an inability to connect.  This is sporadic; it seems to 
happen about 1 time in 4 boots, but one time I had it three or four times in a 
row.  I will attach a dmesg log with debug increased part way through.  It 
seems as though this may be similar to the problem reported here:
  https://www.mail-archive.com/linux-wireless@vger.kernel.org/msg19911.html

The second failure is horrific performance, accompanied by constantly
repeated mwifiex_process_sleep_confirm_resp: cmd size is 0 messages.
The user experience is harder to quantify; you get absolutely terrible
network performance (7-10% packet loss), and general instability.  I'll
attach a dmesg log for that as well, again with debug increased part way
through.  This is similar to what Weng Xuetian reported in Comment 14.

Finally, as I use this device, I find one consistent and persistent
oddity.  That is, even when everything is 'working', I get poor latency
connecting to the device.  That is, my latency when I ping 'out' is fine
(< 10 ms, not much variation).  But when I ping the device from a
different system, I get highly variable latency.   (Ranging from 10 ms
to 4000 ms).  No dropped packets, but horrible latency.  You really
notice it when you're ssh'd in :-/.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/18

------------------------------------------------------------------------
On 2016-04-04T19:17:51+00:00 jwhite wrote:

Created attachment 211711
Log of the failure to connect.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/19

------------------------------------------------------------------------
On 2016-04-04T19:18:46+00:00 jwhite wrote:

Created attachment 211721
Log of the cmd size 0 failure mode.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/20

------------------------------------------------------------------------
On 2016-04-13T14:44:56+00:00 jwhite wrote:

I've spent some time looking at the source code and trying to debug what
is happening.  I'm using the kvalo wireless-drivers-next pending branch,
and I've applied recent patches sent to lkml that seem valuable; but the
problems persist.

It seems as though in the 'normal' case we get an unusual number of 'max
count reached while accessing sleep cookie' messages.  The usleep calls
in that case make me suspicious, and being slow to wake up would seem to
be a good explanation of the performance we see.  (That is, sending a
packet is fast, but responding is slow).  It also seems like we're in a
rather startling loop of awake/sleep events; doing multiple spins within
a millisecond, as far as I can tell.

In the disconnect case, I see an 'invalid cmd resp' error followed by
'There is no command but got a command response'.

I have now started booting with debug_mask=0xffffffff; I'll replace my
logs with those, as they appear to be more complete.

For reference, here are related areas of the code:

Sleep message:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next/+/pending/drivers/net/wireless/marvell/mwifiex/pcie.c#377

There is no command:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next/+/pending/drivers/net/wireless/marvell/mwifiex/pcie.c#1637
(although intriguingly to my naive mind, that code block is related to the 
sleep check as well).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/21

------------------------------------------------------------------------
On 2016-04-13T14:45:42+00:00 jwhite wrote:

Created attachment 212611
Log of a failure to connect, with debug mask set from boot.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/22

------------------------------------------------------------------------
On 2016-04-13T14:47:18+00:00 jwhite wrote:

Created attachment 212621
Log of a 'working' session, which has poor latency.  debug_mask ffffffff from 
start.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/23

------------------------------------------------------------------------
On 2016-04-13T15:44:26+00:00 jwhite wrote:

Disabling power management (via a Network Manager dispatcher.d script)
seems to help the first problem (latency issues).  But it leaves the
sporadic failures to connect.

A hack to prevent power management from ever starting seems to resolve
those failures as well, although it's hard to claim that as true; it
could be I was simply not patient enough to reboot enough times to
reproduce the failure.

ping latency and dropped packets can still suffer, so there is clearly
something else still wrong, but I'm hoping this is a clue to help
understand the issue.

I'll attach the hack.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/24

------------------------------------------------------------------------
On 2016-04-13T15:45:07+00:00 jwhite wrote:

Created attachment 212631
Hack to prevent power saving from ever starting

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/25

------------------------------------------------------------------------
On 2016-04-25T15:16:03+00:00 akarwar wrote:

Created attachment 214061
Debug patch for sleep issue

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/26

------------------------------------------------------------------------
On 2016-04-25T15:16:38+00:00 akarwar wrote:

We could not recreate below error message on our reference platform.

[   16.506296] mwifiex_pcie 0000:02:00.0: mwifiex_process_sleep_confirm_resp: 
cmd size is 0
[   16.506513] mwifiex_pcie 0000:02:00.0: max count reached while accessing 
sleep cookie

Could you apply attached debug patch and share dmesg log for the issue?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/27

------------------------------------------------------------------------
On 2016-04-25T16:05:36+00:00 jwhite wrote:

Created attachment 214081
Log with new debug patch; this is a case where we cannot connect.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/28

------------------------------------------------------------------------
On 2016-04-25T16:07:20+00:00 jwhite wrote:

Created attachment 214091
Case where we get just a few slow pings, but it otherwise works.

Interestingly, the constant problem I was having (most packets delayed)
seems to have softened, for no discernable reason.  But I do still see a
few slow packets (e.g. replies in 80-300 ms range), and in the log, I do
see a few of the 'max count reached while accessing sleep cookie'
messages.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/29

------------------------------------------------------------------------
On 2016-04-25T18:47:21+00:00 akarwar wrote:

Created attachment 214141
debug sleep issue changes

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/30

------------------------------------------------------------------------
On 2016-04-25T18:48:27+00:00 akarwar wrote:

Please try attached patch. It includes a fix and debug messages.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/31

------------------------------------------------------------------------
On 2016-04-25T19:41:36+00:00 jwhite wrote:

Created attachment 214161
Run with dbg3

This does seem to fix the lack of connection, although again, it's tough
to claim that for sure.  But I did get 8 or 9 clean connections.

I do still get hangs at reboot and shutdown sporadically; I don't have a
strong sense of whether my hack eliminated those or not.  (And they seem
to happen after file systems are unmounted so I don't have an easy way
to capture them.).

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/32

------------------------------------------------------------------------
On 2016-04-26T15:01:38+00:00 akarwar wrote:

Thanks for the tests.
Please share the logs for the observed issues.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/33

------------------------------------------------------------------------
On 2016-04-26T17:14:08+00:00 duyhieu.bui wrote:

Created attachment 214431
mwifiex on Surface Pro 3

I have a Surface pro 3 model and I had the same problems. I started
booting my system with debug flag for mwifiex module with wireless-next
kernel yesterday. I have some logs and I hope that they will help you
find the problems.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/34

------------------------------------------------------------------------
On 2016-04-27T02:39:00+00:00 jwhite wrote:

My apologies; I'm traveling for work, and am away from my normal work
environment, which is making it harder for me to be thorough and
responsive.  However, what I can tell with the tablet and my laptop is
that the DBG3 patch seems to improve matters, but it does not appear to
be a fix.

I spent some time rebooting the tablet again and again.  I eventually
was able to get a failure to connect.  That was without the
debug_mask=0xffffffff; but I will attach that log regardless.  I will
try to reproduce it with debug on.

I also get frequent kernel panics at reboot or shutdown.  They seem to
come after the file system is unmounted; I can't seem to find the logs
in /var/log or using journalctl -k -b -1.  Forgive me if there is kernel
dev 101 I am forgetting to properly capture those.

I will try to attach a photograph of one of the kernel panics; the stack
trace, at least, appears to be legible.

I do not seem to get the kernel panics on reboot with a 4.4 kernel using
my hack.

Full disclosure:  I do not have a 'pure' 4.6 tree; I'm using the
wireless-next tree, and I have one or two of the interesting mwifiex
patches from lkml applied.  I will also try to reproduce these tests
with a pure 4.6 tree.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/35

------------------------------------------------------------------------
On 2016-04-27T02:39:25+00:00 jwhite wrote:

Created attachment 214471
dmesg log of a failure to connect

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/36

------------------------------------------------------------------------
On 2016-04-27T02:39:51+00:00 jwhite wrote:

Created attachment 214481
Screenshot of sample kernel panic with dbg3

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/37

------------------------------------------------------------------------
On 2016-04-27T23:51:58+00:00 duyhieu.bui wrote:

Created attachment 214581
FW in bad state

With the wireless-next kernel and the power saving patch and the debug
sleep patch, sometimes I got this error very often. The driver says the
firmware is in bad state.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/38

------------------------------------------------------------------------
On 2016-04-29T05:57:01+00:00 duyhieu.bui wrote:

Created attachment 214681
Multiple "FW in bad state"

I got multiple "FW in bad state" errors with the latest wifi-next
kernel.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/39

------------------------------------------------------------------------
On 2016-05-04T01:32:33+00:00 fc wrote:

I am also experiencing `PREP_CMD: FW in reset state` from mwifiex_pcie
on a Surface Book, often after long idle periods.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/40

------------------------------------------------------------------------
On 2016-05-09T16:33:27+00:00 jwhite wrote:

Studying some of the logs, and the logic change in DBG3, a potential clue is 
that in pcie.c, around line 1610, there is logic something like:
  skb_trim(skb, rx_len)
  skb_pull(skb, INTF_HEADER_LEN)
  if (...) {
    skb_push(skb, INTF_HEADER_LEN)
  }

It appears logical that the skb_push and skb_pull are meant to balance
each other.  However, I see from studying logs that it we get cases
where rx_len is 0.    In that case, the skb_pull will be a NOP, and the
skb_push will move the data pointer, potentially incorrectly.

I have one skb_under_crash that seems likely to be related to that.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/41

------------------------------------------------------------------------
On 2016-05-10T21:48:44+00:00 jwhite wrote:

I have a hypothesis as to what I am seeing.

That is, with a fair bit of debug logging, I found that two interrupts in a row 
could arrive, even if the handler for the first one sent a mask of 0 to the 
PCIE_HOST_INT_MASK port.  That combined with a unique situation involving the 
while loop in mwifiex_process_pcie_int().  That is, it seems that the flow can 
be:
  Interrupt #1 (set int_status)
  Start processing interrupt #1.  Start by clearing int_status.  While 
processing, interrupt #2 arrives, once again setting int_status.  After 
interrupt #1 is processed, the while loop manually checks the interrupt status 
on the card and reloops.  That causes it to fully process the second interrupt 
*without clearing int_status*.
  Now we re-enter, see that int_status is set to 0x4, and try to process a 
command that is no longer there.  That is a 0 length command, and we get into a 
lot of trouble.

I have found that if I comment out the recheck (and effectively disable the 
while loop) here:
  
https://kernel.googlesource.com/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next/+/pending/drivers/net/wireless/marvell/mwifiex/pcie.c#2238

I get stable results.  It's not clear if that's a correct fix.  And,
frankly, my results are sporadic enough that I'm going to need some time
before I believe it's a real result.  But I thought I'd share what seems
like a promising lead.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/42

------------------------------------------------------------------------
On 2016-05-11T14:20:48+00:00 jwhite wrote:

Created attachment 215931
Diff that shows problem, and has current working solution

The attached diff is the patch set I was running with (against wireless-
drivers-next/master; HEAD is e1ca790c8a32c0c77b9d89089ac7e73b72c2adfc).

Parts of it are the hack to prevent power saving.  Parts of it are debug
messages that seem to exacerbate the problem.  That is, with these
messages in, I get a fairly reproducible problem set.

Finally, there is an #ifdef; JPW_EXPERIMENT_1 that 'fixes' the problems
for me.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/43

------------------------------------------------------------------------
On 2016-05-11T18:50:12+00:00 viz wrote:

I have tried the patch from Jeremy White (but without disabling power-
saving feature) on Linux 4.5.3 (Gentoo Source and latest firmware
15.68.7.p66) and it runs for several hours without any problem, thanks.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/44

------------------------------------------------------------------------
On 2016-05-12T10:38:41+00:00 fc wrote:

Thanks Jeremy, I applied the patch on top of tigerite's kernel
(https://launchpad.net/~tigerite/+archive/ubuntu/kernel) on top of
4.4.6-3-surface. I will run it for a while on my Surface Book and see if
that solves the problem.

It's here in case someone wants to save some time:
https://github.com/EiNSTeiN-/mwifiex

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/45

------------------------------------------------------------------------
On 2016-05-12T11:28:39+00:00 akarwar wrote:

Hi Jeremy,

I got the race condition you described in comment #42. It’s not expected
as per our design. Here is the flow.

1) Interrupt is received over PCIe interface
2) Interrupt handler does following things.
    a) Read and clear the interrupt register, copy the interrupt bitmap to 
"adapter->int_status"
    b) Disable interrupts and exit.

3) Other thread will call mwifiex_process_int_status() to serve the
interrupts in while loop.

4) At the end of loop, we read interrupt register for any further
interrupts and serve them.

5) We exit from while loop if there are no more interrupts.

6) Interrupts are enabled while exiting mwifiex_process_int_status()

Basically we don't expect interrupt handler being called while serving
the interrupts. The reason is interrupts are disabled at that time.

Looks like mwifiex_pcie_disable_host_int() call in
mwifiex_interrupt_status() didn't take an effect for some reason.

Can you confirm by adding debug message if you receive an interrupt when
it's disabled?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/46

------------------------------------------------------------------------
On 2016-05-12T11:35:29+00:00 akarwar wrote:

The fix from Jeremy is ok and doesn't have any side-effects. It removes
the code which was added for improving the performance.

I am curious to know why interrupt is received even after disabling it
which caused a race condition.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/47

------------------------------------------------------------------------
On 2016-05-12T13:31:19+00:00 jwhite wrote:

Created attachment 216101
Log of failing session

The attached log should reflect the wireless next tree, with the patch I
provided applied.  (Except that I've left the bottom part of the while
loop enabled, so as to reproduce the failure).

I believe the problem case starts at time index 30.893034.

(I had instrumented the 0 byte command failure; so I generally search
for '0 byte cmd' and then look backwards to hunt for the reason we got a
0 byte command).

Again, my sense of the events is:
  30.893034 Event (0x00000008) interrupt arrives, we disable
  30.893300 We detect a command response, reloop, and start processing
  30.893300 Command (0x00000004) interrupt arrives, we stash it into
            adapter->int_status
  30.893314 We re-enable (but too late; damage is done)
  30.893374 We 'pop' the already processed command from adapter->int_status,
            and think it's a 0 byte cmd response

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/48

------------------------------------------------------------------------
On 2016-05-12T17:08:47+00:00 anton wrote:

Created attachment 216161
attachment-8673-0.html

I have applied the patch above to the 4.6-rc7 kernel on my Surface Pro 3.
After some time of usage I am unable to turn of wifi or change the AP.
The dmesg log:

[ 8155.017776] mwifiex_pcie 0000:01:00.0: disable_host_int
[ 8155.017781] mwifiex_pcie 0000:01:00.0: jpw:
mwifiex_pcie_process_cmd_complete skb ffff880244ee1700, skb->len 12,
skb->data_len 0, skb->head ffff880242b2a940, skb->data ffff880242b2a980,
skb->tail 76, skb->end 2432
[ 8155.017784] mwifiex_pcie 0000:01:00.0: enable_host_int
[ 8155.017789] mwifiex_pcie 0000:01:00.0: jpw: mwifiex_pcie_cmdrsp_complete
skb ffff880244ee1700, skb->len 8, skb->data_len 0, skb->head
ffff880242b2a940, skb->data ffff880242b2a984, skb->tail 76, skb->end 2432
[ 8155.017867] mwifiex_pcie 0000:01:00.0: disable_host_int
[ 8155.017873] mwifiex_pcie 0000:01:00.0: jpw:
mwifiex_pcie_process_cmd_complete skb ffff880244ee1700, skb->len 12,
skb->data_len 0, skb->head ffff880242b2a940, skb->data ffff880242b2a980,
skb->tail 76, skb->end 2432
[ 8155.017875] mwifiex_pcie 0000:01:00.0: enable_host_int
[ 8155.017879] mwifiex_pcie 0000:01:00.0: jpw: mwifiex_pcie_cmdrsp_complete
skb ffff880244ee1700, skb->len 8, skb->data_len 0, skb->head
ffff880242b2a940, skb->data ffff880242b2a984, skb->tail 76, skb->end 2432


2016-05-12 16:31 GMT+03:00 <bugzilla-dae...@bugzilla.kernel.org>:

> https://bugzilla.kernel.org/show_bug.cgi?id=109681
>
> --- Comment #48 from Jeremy White <jwh...@codeweavers.com> ---
> Created attachment 216101
>   --> https://bugzilla.kernel.org/attachment.cgi?id=216101&action=edit
> Log of failing session
>
> The attached log should reflect the wireless next tree, with the patch I
> provided applied.  (Except that I've left the bottom part of the while loop
> enabled, so as to reproduce the failure).
>
> I believe the problem case starts at time index 30.893034.
>
> (I had instrumented the 0 byte command failure; so I generally search for
> '0
> byte cmd' and then look backwards to hunt for the reason we got a 0 byte
> command).
>
> Again, my sense of the events is:
>   30.893034 Event (0x00000008) interrupt arrives, we disable
>   30.893300 We detect a command response, reloop, and start processing
>   30.893300 Command (0x00000004) interrupt arrives, we stash it into
>             adapter->int_status
>   30.893314 We re-enable (but too late; damage is done)
>   30.893374 We 'pop' the already processed command from
> adapter->int_status,
>             and think it's a 0 byte cmd response
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/49

------------------------------------------------------------------------
On 2016-05-14T19:36:02+00:00 anton wrote:

Created attachment 216261
attachment-11994-0.html

Well, I have tested it for 3 days and it seems to work fine

2016-05-12 20:08 GMT+03:00 Anton Anikin <an...@anikin.name>:

> I have applied the patch above to the 4.6-rc7 kernel on my Surface Pro 3.
> After some time of usage I am unable to turn of wifi or change the AP.
> The dmesg log:
>
> [ 8155.017776] mwifiex_pcie 0000:01:00.0: disable_host_int
> [ 8155.017781] mwifiex_pcie 0000:01:00.0: jpw:
> mwifiex_pcie_process_cmd_complete skb ffff880244ee1700, skb->len 12,
> skb->data_len 0, skb->head ffff880242b2a940, skb->data ffff880242b2a980,
> skb->tail 76, skb->end 2432
> [ 8155.017784] mwifiex_pcie 0000:01:00.0: enable_host_int
> [ 8155.017789] mwifiex_pcie 0000:01:00.0: jpw:
> mwifiex_pcie_cmdrsp_complete skb ffff880244ee1700, skb->len 8,
> skb->data_len 0, skb->head ffff880242b2a940, skb->data ffff880242b2a984,
> skb->tail 76, skb->end 2432
> [ 8155.017867] mwifiex_pcie 0000:01:00.0: disable_host_int
> [ 8155.017873] mwifiex_pcie 0000:01:00.0: jpw:
> mwifiex_pcie_process_cmd_complete skb ffff880244ee1700, skb->len 12,
> skb->data_len 0, skb->head ffff880242b2a940, skb->data ffff880242b2a980,
> skb->tail 76, skb->end 2432
> [ 8155.017875] mwifiex_pcie 0000:01:00.0: enable_host_int
> [ 8155.017879] mwifiex_pcie 0000:01:00.0: jpw:
> mwifiex_pcie_cmdrsp_complete skb ffff880244ee1700, skb->len 8,
> skb->data_len 0, skb->head ffff880242b2a940, skb->data ffff880242b2a984,
> skb->tail 76, skb->end 2432
>
>
> 2016-05-12 16:31 GMT+03:00 <bugzilla-dae...@bugzilla.kernel.org>:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=109681
>>
>> --- Comment #48 from Jeremy White <jwh...@codeweavers.com> ---
>> Created attachment 216101
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=216101&action=edit
>> Log of failing session
>>
>> The attached log should reflect the wireless next tree, with the patch I
>> provided applied.  (Except that I've left the bottom part of the while
>> loop
>> enabled, so as to reproduce the failure).
>>
>> I believe the problem case starts at time index 30.893034.
>>
>> (I had instrumented the 0 byte command failure; so I generally search for
>> '0
>> byte cmd' and then look backwards to hunt for the reason we got a 0 byte
>> command).
>>
>> Again, my sense of the events is:
>>   30.893034 Event (0x00000008) interrupt arrives, we disable
>>   30.893300 We detect a command response, reloop, and start processing
>>   30.893300 Command (0x00000004) interrupt arrives, we stash it into
>>             adapter->int_status
>>   30.893314 We re-enable (but too late; damage is done)
>>   30.893374 We 'pop' the already processed command from
>> adapter->int_status,
>>             and think it's a 0 byte cmd response
>>
>> --
>> You are receiving this mail because:
>> You are on the CC list for the bug.
>>
>
>
>
>
>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/50

------------------------------------------------------------------------
On 2016-05-17T16:31:45+00:00 fc wrote:

I have been running a version of the patch with the power management
changes removed (i.e. power management turned on) to see if it made a
difference. I had several instances of unusable wifi (until next reboot)
in 2 days. So disabling power management definitely seems to be part of
the solution.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/51

------------------------------------------------------------------------
On 2016-05-29T17:46:17+00:00 david.lopez wrote:

Created attachment 218101
Trace error Surface Pro 3 (kernel 4.6, Jeremy White's patch, archlinux)

I use archlinux in a Surface Pro 3, I think is the same wifi card as
SP4. I use the package linux-surfacepro3 4.6-1
(https://aur.archlinux.org/packages/linux-surfacepro3/), which includes
kernel 4.6 and Jeremy White's patch. Although wifi works, I receive an
error message in dmesg which didn't appear in previous versions of the
package.

I don't know if this error message could be of interest.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/52

------------------------------------------------------------------------
On 2016-06-03T15:50:32+00:00 jwhite wrote:

(In reply to David López from comment #52)
> Created attachment 218101 [details]
> Trace error Surface Pro 3 (kernel 4.6, Jeremy White's patch, archlinux)
> 
> I use archlinux in a Surface Pro 3, I think is the same wifi card as SP4. I
> use the package linux-surfacepro3 4.6-1
> (https://aur.archlinux.org/packages/linux-surfacepro3/), which includes
> kernel 4.6 and Jeremy White's patch. Although wifi works, I receive an error
> message in dmesg which didn't appear in previous versions of the package.
> 
> I don't know if this error message could be of interest.

That's not related or (as far as I can tell) especially harmful.  If you
look at the referenced line of code (net/wireless/core.c, line 363) you
see it's a message generated because we have a set_tx_power function
without a get_tx_power function.  (The second message is the same; but
for get_antenna).  I suspect you're seeing it because you've updated
your driver and/or firmware.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/53

------------------------------------------------------------------------
On 2016-06-17T19:58:02+00:00 stephenjust wrote:

There is some new firmware available from Marvell [1]. So far, I've had
no issues with it, but I have only tried it for a few days. Perhaps this
new firmware makes any patch unnecessary?

I've actually managed to dig up some release notes in ChromiumOS's
commit logs that suggest the newer firmware may help with stability [2].

[1] http://git.marvell.com/?p=mwifiex-firmware.git;a=tree;f=mrvl;hb=HEAD
[2] https://chromium.googlesource.com/chromiumos/third_party/marvell/

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/54

------------------------------------------------------------------------
On 2016-07-12T21:59:26+00:00 cfeck wrote:

The Marvell firmware from link [1] in (previous) comment #54 does not
help avoiding the random disconnects on my Surface Pro 3 with openSUSE
Tumbleweed kernel 4.6.3. Not sure if this kernel carries any relevant
patches compared to upstream.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/55

------------------------------------------------------------------------
On 2016-07-15T13:42:09+00:00 akarwar wrote:

I have just submitted patches to resolve original issue reported in this
bug. i.e. below error messages followed by command timeout.

[   11.522123] mwifiex_pcie 0000:01:00.0: CMD_RESP: invalid cmd resp
[   11.680412] mwifiex_pcie 0000:01:00.0: There is no command but got cmdrsp

https://patchwork.kernel.org/patch/9232091/
https://patchwork.kernel.org/patch/9232093/

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/56

------------------------------------------------------------------------
On 2016-08-02T02:40:20+00:00 duyhieu.bui wrote:

(In reply to Amitkumar Karwar from comment #56)
> I have just submitted patches to resolve original issue reported in this
> bug. i.e. below error messages followed by command timeout.
> 
> [   11.522123] mwifiex_pcie 0000:01:00.0: CMD_RESP: invalid cmd resp
> [   11.680412] mwifiex_pcie 0000:01:00.0: There is no command but got cmdrsp
> 
> https://patchwork.kernel.org/patch/9232091/
> https://patchwork.kernel.org/patch/9232093/

It seems more stable but the 'CMD_RESP' problem persists on my system
(Surface Pro 3 running Linux 4.7.0).

    7.720645] mwifiex_pcie 0000:01:00.0: WLAN FW is active
[    7.808716] mwifiex_pcie 0000:01:00.0: info: MWIFIEX VERSION: mwifiex 1.0 
(15.68.7.p77) 
[    7.808721] mwifiex_pcie 0000:01:00.0: driver_version = mwifiex 1.0 
(15.68.7.p77) 
[    7.812614] mwifiex_pcie 0000:01:00.0 eth0: renamed from mlan0
[    9.696401] mwifiex_pcie 0000:01:00.0: info: trying to associate to 'n0wifi' 
bssid 4c:e6:76:71:25:1b
[    9.726026] mwifiex_pcie 0000:01:00.0: info: associated to bssid 
4c:e6:76:71:25:1b successfully
[ 1145.239295] mwifiex_pcie 0000:01:00.0: CMD_RESP: invalid cmd resp
[ 1146.012946] mwifiex_pcie 0000:01:00.0: There is no command but got cmdrsp
[ 1624.849858] mwifiex_pcie 0000:01:00.0: CMD_RESP: invalid cmd resp
[ 1625.236366] mwifiex_pcie 0000:01:00.0: CMD_RESP: invalid cmd resp
[ 1626.009534] mwifiex_pcie 0000:01:00.0: There is no command but got cmdrsp
[ 3988.474123] mwifiex_pcie 0000:01:00.0: info: successfully disconnected from 
4c:e6:76:71:25:1b: reason code 3
[ 3988.474151] mwifiex_pcie 0000:01:00.0: info: successfully disconnected from 
00:00:00:00:00:00:       reason code 3

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/57

------------------------------------------------------------------------
On 2016-08-02T05:13:19+00:00 duyhieu.bui wrote:

Created attachment 227261
Firmware in bad state after sleep on Surface pro 3

I got the previous errors (firmware in bad state, 'CMD_RESP') after
sleep with the new patch. I tried to reset the pci card/rmmod and
modprobe the module again without success.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/58

------------------------------------------------------------------------
On 2016-09-25T09:27:57+00:00 mikael wrote:

I'm now running a 4.7 kernel which has Amitkumar's patches from
2016-07-15 included. I get errors like this:

Sep 25 11:22:14 hat kernel: [ 1058.526154] mwifiex_pcie 0000:02:00.0:
CMD_RESP: cmd 0x10f error, result=0x2

and the driver crashes after a couple of hours.

With Jeremy White's patch from 2016-05-11 above, the driver is
completely stable.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/59

------------------------------------------------------------------------
On 2016-09-25T09:36:56+00:00 mikael wrote:

To clarify my last comment:

With JW:s patch, I *still* get the CMD_RESP error (which I assume is
harmless) but the driver doesn't crash.

Could this have something to do with the following change?:

--- a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c    
+++ a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c    
@@ -2237,7 +2237,7 @@ int mwifiex_sta_init_cmd(struct mwifiex_private *priv, u8 
first_sta, bool init)
                if (ret)
                        return -1;
 
-               if (priv->bss_type != MWIFIEX_BSS_TYPE_UAP) {
+               if (0 && priv->bss_type != MWIFIEX_BSS_TYPE_UAP) {
                        /* Enable IEEE PS by default */
                        priv->adapter->ps_mode = MWIFIEX_802_11_POWER_MODE_PSP;
                        ret = mwifiex_send_cmd(priv,
@@ -2300,7 +2300,7 @@ int mwifiex_sta_init_cmd(struct mwifiex_private *priv, u8 
first_sta, bool init)
        if (ret)
                return -1;
 
-       if (!disable_auto_ds &&
+       if (0 && !disable_auto_ds &&
            first_sta && priv->adapter->iface_type != MWIFIEX_USB &&
            priv->bss_type != MWIFIEX_BSS_TYPE_UAP) {
                /* Enable auto deep sleep */

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/60

------------------------------------------------------------------------
On 2016-10-06T07:39:51+00:00 mikael wrote:

This is the behavior for 4.8.0 (without any patches):

The driver works for 25 s. Then I get the message:

Oct  6 09:34:15 hat wpa_supplicant[1959]: wlp2s0: CTRL-EVENT-SCAN-FAILED
ret=-16 retry=1

every second.

If I shut down network-manager and reload the module, it again works for
25 s, followed by the above message. I also cannot scan for any wireless
networks when the message starts, although network communication works.

No sign, yet, of the CMD_RESP error which I've reported about above for
previous kernels.

Should I, perhaps, report this as a new bug?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/61

------------------------------------------------------------------------
On 2016-10-06T11:23:59+00:00 mikael wrote:

I can now report that I've tested the kernel from Linus master branch.
It seems like the patches by Amitkumar Karwar from July which went into
the mainline kernel yesterday finally makes the mwifiex driver stable on
MS Surface Pro 4, so I suggest that this bug report can now be closed.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/62

------------------------------------------------------------------------
On 2017-02-17T17:24:59+00:00 kj.kuhns wrote:

Mikael,

What version of both the kernel and mwifiex are you using.

I am using mwifiex 1.0 (15.68.7.p77)
and kernel 4.8.6-300.fc25 (I am trying to install Fedora25)

if the console ever goes into a suspend state, I end up with what you
were seeing: "PREP_CMD: FW in reset state"

Thanks, 
Kevin

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/63

------------------------------------------------------------------------
On 2017-05-11T11:48:58+00:00 cfeck wrote:

I can confirm that with newest firmware and kernel 4.10, the connection
is stable on a SP3. I do get constant 5% CPU usage in a kworker/uN:N
process caused by the mwifiex driver, though. Not sure if this is
expected, or worth a separate bug report.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/64

------------------------------------------------------------------------
On 2017-05-19T16:37:19+00:00 cfeck wrote:

With kernel 4.11.0, the issue is back.

[    7.890616] mwifiex_pcie 0000:01:00.0: info: trying to associate to 
'<routername>' bssid <ID>
[    7.920245] mwifiex_pcie 0000:01:00.0: info: associated to bssid <ID> 
successfully
[ 7834.643967] mwifiex_pcie 0000:01:00.0: Firmware wakeup failed
[ 8947.336070] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/65

------------------------------------------------------------------------
On 2017-05-20T12:48:19+00:00 mikael wrote:

Created attachment 256641
no-power-save

Dear Kevin,

Sorry for taking so long to respond.

I have got this error in many different kernels. I still get it in 4.11.0.
But I also know what causes it: The current linux driver doesn't handle
power management properly. So, it is necessary to insert if-up rules which
switches off power management. In my case, where the interface is governed
by NetworkManager, I wrote a file
/etc/NetworkManager/dispatcher.d/no-power-save which I attach to this email.

With this fix, my mwifiex driver is now fully stable under kernel
4.11.0.

Best regards,
Mikael

On Fri, Feb 17, 2017 at 6:24 PM, <bugzilla-dae...@bugzilla.kernel.org>
wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=109681
>
> Kevin (kj.ku...@gmail.com) changed:
>
>            What    |Removed                     |Added
> ------------------------------------------------------------
> ----------------
>                  CC|                            |kj.ku...@gmail.com
>
> --- Comment #63 from Kevin (kj.ku...@gmail.com) ---
> Mikael,
>
> What version of both the kernel and mwifiex are you using.
>
> I am using mwifiex 1.0 (15.68.7.p77)
> and kernel 4.8.6-300.fc25 (I am trying to install Fedora25)
>
> if the console ever goes into a suspend state, I end up with what you were
> seeing: "PREP_CMD: FW in reset state"
>
> Thanks,
> Kevin
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/66

------------------------------------------------------------------------
On 2017-11-02T15:06:59+00:00 gbhat wrote:

Hi Christoph

(In reply to Christoph from comment #65)
> With kernel 4.11.0, the issue is back.
> 
> [    7.890616] mwifiex_pcie 0000:01:00.0: info: trying to associate to
> '<routername>' bssid <ID>
> [    7.920245] mwifiex_pcie 0000:01:00.0: info: associated to bssid <ID>
> successfully
> [ 7834.643967] mwifiex_pcie 0000:01:00.0: Firmware wakeup failed
> [ 8947.336070] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state

Can you please share the firmware dump after the above issue occur.

1. To manually trigger the dump you can use below command:

cat /sys/kernel/debug/mwifiex/mlan0/device_dump

2. When the dump is complete dmesg will log "mwifiex firmware dump end".

3. Collect the dump with below script:
-------------------------------------------------------------
#!/bin/bash
/sbin/ethtool --set-dump mlan0 0
/sbin/ethtool --get-dump mlan0
/sbin/ethtool --get-dump mlan0 data /tmp/ITCM.log

/sbin/ethtool --set-dump mlan0 1
/sbin/ethtool --get-dump mlan0
/sbin/ethtool --get-dump mlan0 data /tmp/DTCM.log

/sbin/ethtool --set-dump mlan0 2
/sbin/ethtool --get-dump mlan0
/sbin/ethtool --get-dump mlan0 data /tmp/SQRAM.log

/sbin/ethtool --set-dump mlan0 3
/sbin/ethtool --get-dump mlan0
/sbin/ethtool --get-dump mlan0 data /tmp/IRAM.log
-------------------------------------------------------------

4. Share the /tmp/*.log files collected above for further debugging.


5. Also, please share the dmesg logs generated after you have run the above 
command.

Thanks,
Ganapathi

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/67

------------------------------------------------------------------------
On 2017-11-21T16:42:58+00:00 jwhite wrote:

I see this issue on a newer Surface Pro 2017, with kernel 4.14, firmware
15.68.7.p119, and the symptoms are very familiar.  Unfortunately,
disabling power save does not seem to be as persistent a cure; it
improves matters, but does not resolve the issue.

I've attempted to capture a dump, but I am being told that --set-dump
mlan0 0 has no such device, and --set-dump wlp1s0 0 returns no such
device.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/81

------------------------------------------------------------------------
On 2017-11-22T05:00:30+00:00 gbhat wrote:

Hi Jeremy,

Thanks for the intimation.

>>I've attempted to capture a dump, but I am being told that --set-dump mlan0 0
>>has no such device, and --set-dump wlp1s0 0 returns no such device.

Ok. Firmware dump is written into below file:
/sys/devices/virtual/devcoredump/devcd1/data

Can you please check if this file is generated for you. If generated
kindly share the same.

Regards,
Ganapathi

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/82

------------------------------------------------------------------------
On 2017-11-22T14:44:26+00:00 jwhite wrote:

Created attachment 260791
dmesg log and data file from firmware dump

Attached is a tarball with a dmesg log and the 
/sys/devices/virtual/devcoredump/devcd1/data
 file resulting after requesting a firmware dump.

Note that this is a 4.14 kernel, modified for Surface Pro use, but with
the mwifiex module identical to the current tip of the 4.14 tree.  (This
tree to be precise https://github.com/jakeday/linux-surface).

I also get this result with a variety of other kernels; it seems
specific to this hardware + firmware, and perhaps to this particular use
case.  (It is an odd case; a modified Fedora Core 23 stack.)

Nicely, I now get fairly consistent behavior.  That is, upon first
launch, the wifi stack is very laggy - you'll get ping times of 43, 100,
243, 79, and so on (when you'd expect steady 1.43).  That shows up in
interactive ssh sessions.  Further, after a modest amount of use
(usually just a few meg will do it), the wireless will stop working.
You also can no longer successfully reboot or poweroff; you get a hang
at kernel shutdown time.

The attached came from a session with a clean boot, debug_mask
0xffffffff, and about 3M of the kernel source downloaded before the wifi
stopped functioning.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/83

------------------------------------------------------------------------
On 2018-01-04T21:11:00+00:00 andy.shevchenko wrote:

Any updates on this from Marvell side?

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/88

------------------------------------------------------------------------
On 2018-01-05T15:19:17+00:00 gbhat wrote:

Hi Jeremy,

Sorry for the delay. I checked the logs you shared. Here it is a scan
command(0x107) timeout. But the firmware dump did not give any hints for
firmware going to bad state. Moreover the scratch registers dumps tell
us that the timed out command 0x107 did not reach the firmware at all.
We are checking firmware dumps and will share the updates soon.


On the other hand, are you not getting below error:

> With kernel 4.11.0, the issue is back.
> 
> [    7.890616] mwifiex_pcie 0000:01:00.0: info: trying to associate to
> '<routername>' bssid <ID>
> [    7.920245] mwifiex_pcie 0000:01:00.0: info: associated to bssid <ID>
> successfully
> [ 7834.643967] mwifiex_pcie 0000:01:00.0: Firmware wakeup failed
> [ 8947.336070] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state


Thanks,
Ganapathi

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/89

------------------------------------------------------------------------
On 2018-01-05T17:48:40+00:00 jwhite wrote:

(In reply to Ganapathi Bhat from comment #72)
> Hi Jeremy,
> 
> Sorry for the delay. I checked the logs you shared. Here it is a scan
> command(0x107) timeout. But the firmware dump did not give any hints for
> firmware going to bad state. Moreover the scratch registers dumps tell us
> that the timed out command 0x107 did not reach the firmware at all. We are
> checking firmware dumps and will share the updates soon.

Thanks, I look forward to the update.

> 
> 
> On the other hand, are you not getting below error:

No, I don't recall seeing that error recently.  Note that was Christoph
that reported that, not I. I haven't tried with 4.11.

Cheers,

Jeremy

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/90

------------------------------------------------------------------------
On 2018-02-27T07:30:34+00:00 hugh-coleman wrote:

I am seeing that error message on a Surface Pro 2017. The connection appears 
stable regardless of powersaving but after suspending I get:
[  361.092570] mwifiex_pcie 0000:01:00.0: Firmware wakeup failed

Followed by an endless loop of:
[  366.098914] mwifiex_pcie 0000:01:00.0: PREP_CMD: FW in reset state
[  366.098926] mwifiex_pcie 0000:01:00.0: scan failed: -1

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730924/comments/91


** Changed in: linux
       Status: Unknown => Confirmed

** Changed in: linux
   Importance: Unknown => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1730924

Title:
  Wifi does down "crash" in Surface Pro 4

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I have a Surface Pro 4. The wifi works well in principle, but unfortunately 
it drops every x minutes. The only way to fix it I've found is to reboot the 
computer.
  lsb_release -rd
  Description:    Ubuntu 17.10
  Release:        17.10

  ProblemType: Bug
  DistroRelease: Ubuntu 17.10
  Package: linux-image-4.13.0-16-generic 4.13.0-16.19
  ProcVersionSignature: Ubuntu 4.13.0-16.19-generic 4.13.4
  Uname: Linux 4.13.0-16-generic x86_64
  ApportVersion: 2.20.7-0ubuntu3.1
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  predatux   1537 F.... pulseaudio
  CurrentDesktop: KDE
  Date: Wed Nov  8 10:41:26 2017
  HibernationDevice: RESUME=UUID=147af4ba-a4ce-41fe-a176-b36a1f6a590b
  Lsusb:
   Bus 002 Device 002: ID 045e:090c Microsoft Corp. 
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 002: ID 045e:07e8 Microsoft Corp. 
   Bus 001 Device 003: ID 1286:204c Marvell Semiconductor, Inc. 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Microsoft Corporation Surface Pro 4
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-16-generic 
root=UUID=125200b0-7377-4985-a217-15503781a525 ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-4.13.0-16-generic N/A
   linux-backports-modules-4.13.0-16-generic  N/A
   linux-firmware                             1.169
  SourcePackage: linux
  UpgradeStatus: Upgraded to artful on 2017-10-22 (16 days ago)
  dmi.bios.date: 02/24/2017
  dmi.bios.vendor: Microsoft Corporation
  dmi.bios.version: 106.1624.768
  dmi.board.name: Surface Pro 4
  dmi.board.vendor: Microsoft Corporation
  dmi.chassis.type: 9
  dmi.chassis.vendor: Microsoft Corporation
  dmi.modalias: 
dmi:bvnMicrosoftCorporation:bvr106.1624.768:bd02/24/2017:svnMicrosoftCorporation:pnSurfacePro4:pvrD0B08F1C03P38:rvnMicrosoftCorporation:rnSurfacePro4:rvr:cvnMicrosoftCorporation:ct9:cvr:
  dmi.product.family: Surface
  dmi.product.name: Surface Pro 4
  dmi.product.version: D:0B:08F:1C:03P:38
  dmi.sys.vendor: Microsoft Corporation

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1730924/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to