https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #76 from Warner Losh ---
(In reply to Kevin Zheng from comment #75)
>The issue that I'm writing about is the system behavior. It seemed that all
>I/O (or maybe just writes?) to the ZFS pool were stalled waiting of the disk
>t
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #75 from Kevin Zheng ---
(In reply to Warner Losh from comment #74)
I concur that there is definitely a hardware problem somewhere that I need to
investigate.
The issue that I'm writing about is the system behavior. It seemed t
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #74 from Warner Losh ---
(In reply to Kevin Zheng from comment #73)
Cam tried to send a command to the drive and it took too long to reply. Either
sync cache needs a lot more than 30s sometimes or the drive has gone out to
lunch
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Kevin Zheng changed:
What|Removed |Added
CC||kevinz5...@gmail.com
--- Comment #73
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Jonathan Reynolds changed:
What|Removed |Added
CC||jreynolds1...@gmail.com
--- Co
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #71 from Chris ---
(In reply to Daniel Menelkir from comment #70)
I didn't get any zpool errors. The system was hanging with cam status timeouts.
Since I changed the disks the performance of the system improved considerably.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #70 from Daniel Menelkir ---
(In reply to Chris from comment #69)
A zpool error isn't supposed to clear with a reboot. The test I've mentioned
can be done with a live USB.
--
You are receiving this mail because:
You are the as
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #69 from Chris ---
(In reply to Daniel Menelkir from comment #68)
The issue happens with 12 as well and requires vfs.zfs.cache_flush_disable=1
and a power cycle to clear.
I don't think it's a just a ZFS issue since a reboot ju
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #68 from Daniel Menelkir ---
(In reply to Chris from comment #67)
I know, I mean to try different platforms.
And to mention, this is an specific issue with 13.1+ZFS, I don't think it's
something of the motherboard itself, since
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #67 from Chris ---
In reply to Daniel (comment #66)
Smartmontools are available on FreeBSD. The drives show no issue with the
smartmontools. I believe there is a compatibility issue between the drives and
the motherboard. I upd
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Daniel Menelkir changed:
What|Removed |Added
CC||dmenel...@gmail.com
--- Comment
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #65 from Andriy Gapon ---
It could be a combination of what FreeBSD 13 or, more specifically,
OpenZFS/FreeBSD does (legitimately) and a bug in drives' firmware or even
hardware.
E.g., I have a mirror of two SSDs from different
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #64 from Chris ---
This is a hardware issue. I have 5 Western Digital Ultrastar Data Center Hard
drives that show up as HGST with camcontrol. The errors popped up when I
upgraded to 12.2 and stopped by adding vfs.zfs.cache_flush
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #63 from Daniel Menelkir ---
(In reply to Warner Losh from comment #62)
This issue only happens specifically with FreeBSD 13.1 and with zfs.
So S.M.A.R.T is wrong, every other thing I throw there doesn't generate this
issue, on
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #62 from Warner Losh ---
This bug should be closed. There's too many different symptoms that have been
co-located in this bug that are likely unrelated. There's clearly some bad
hardware here. There's clearly some issues with ah
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Daniel Menelkir changed:
What|Removed |Added
CC||dmenel...@gmail.com
--- Comment
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Chris changed:
What|Removed |Added
CC||ch...@tellme3times.com
--- Comment #60 fro
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #59 from ekoort ---
Hello again!
In my case disabling vfs.zfs.cache_flush_disable="1" did not help.
Instead this seems to work:
root@pine:~ # camcontrol tags ada0
(pass0:ahcich0:0:0:0): device openings: 32
root@pine:~ # camc
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
eimar.ko...@tutamail.com changed:
What|Removed |Added
CC||eimar.ko...@tutamail.com
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #57 from sec ---
(In reply to Wayne Willcox from comment #56)
Mine drives are on 32 atm (unless vfs.zfs.cache_flush_disable=1 if changing
that). Without that tweak, I had those errors.
If there any update on this issue, as I'm
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #56 from Wayne Willcox ---
Ok so it looks like I solved this for my system. I did it by reducing the
command tags.
The default was 255 and I started by just cutting that in half. I kept dividing
by 2 until I got to 25. At 50 I
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Nicolas Richeton changed:
What|Removed |Added
CC||nicolas.riche...@gmail.com
---
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Daniel Morante changed:
What|Removed |Added
CC||dan...@morante.net
--- Comment #5
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #53 from sec ---
(In reply to Wayne Willcox from comment #52)
At the moment my drives are:
WDC WD1005FBYZ-01YCBB2 RR07 (WD Gold 1TB)
WDC WD1005FBYZ-01YCBB1 RR04 (WD Gold 1TB)
I had two with RR07, but I RMA'd one of the drives b
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #52 from Wayne Willcox ---
well this is interesting could these Western Digital drives also be using SMR?
Might that explain the problem?
Seems Western Digital is trying to include SMR without telling us?
https://www.tomshardw
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #51 from Wayne Willcox ---
Ok have some new information in my case.
I commented out from /boot/loader.conf
#vfs.zfs.cache_flush_disable="1"
I commented out from /boot/device.hints
#hint.ahcich.0.sata_rev=2
so that I was back t
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #50 from sec ---
(In reply to Warner Losh from comment #48)
My error were like this -
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745#c20
There were WRITE, READ and FLUSHCACHE48, first there were timeouts messages,
the
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #49 from Wayne Willcox ---
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Retrying command, 0 more tries remain
ahcic
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #48 from Warner Losh ---
(In reply to sec from comment #47)
Thanks for the feedback. And was your issue 'timeout' or was it just
disappearing?
--
You are receiving this mail because:
You are the assignee for the bug.
_
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #47 from sec ---
(In reply to Warner Losh from comment #46)
Hi,
>From my expierence:
1) I didn't have this issue until I've updated to 11.2 IIRC (that was almost
one year ago)
2) First I replaced my drives to new ones - issue w
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #46 from Warner Losh ---
I've re-read things, and I think everybody is hanging the 'timeout' issue on
this bug report when really it looks like there's several different
pathologies... Those that clear up after a power cycle, f
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #45 from Wayne Willcox ---
(In reply to Warner Losh from comment #44)
one of the power supplies is a 600 corsair (not going to take the other system
apart that also had this issue). I will try a third one power supply when I
ta
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #44 from Warner Losh ---
(In reply to Wayne Willcox from comment #42)
I ask because we have many thousands of machines deployed, and a thousand or so
of those have intermittent power supply issues which means at any time we hav
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #43 from Daniel Bell ---
(In reply to sec from comment #38)
I'm sorry I didn't respond sooner, but for my issue, sec was 100% right: a cold
power cycle, as bizarre as that sounded to me, along with the loader.conf
tweaks mentio
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #42 from Wayne Willcox ---
Absolutely sure? No having said that I have seen the problem on 2 different
systems with 2 different power supplies. Additionally the power supplies have
been fine and not had any other problems AND
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Warner Losh changed:
What|Removed |Added
CC||i...@freebsd.org
--- Comment #41 fro
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #40 from Wayne Willcox ---
Ok so even with the write_cache disabled it started getting the errors and
hung. So this is a pretty serious issue at this point the WD Blue drives (and
maybe others) are not at all reliable with zfs
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Wayne Willcox changed:
What|Removed |Added
CC||wwillc...@gmail.com
--- Comment #3
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #38 from sec ---
(In reply to Daniel Bell from comment #37)
I'm running on 11.3-STABLE without issues, just have this in /boot/loader.conf:
vfs.zfs.cache_flush_disable=1
Just be sure to power off server (maybe even take the po
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #37 from Daniel Bell ---
I'm having this problem on 12.1-RELEASE-p2 GENERIC, with a Supermicro X11DPi-N,
6 drive 3-mirror stripe zpool of "WD Gold" disks.
at scbus6 target 0 lun 0 (ada0,pass1)
at scbus7 target 0 lun
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #36 from John Baldwin ---
My brand new drives all detached after about 10 days of use:
[-- MARK -- Sat Dec 28 06:00:00 2019]
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: s/n WD-WCC6Y2ZRADXP detached
ada1 at ahcich1 bus 0
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #35 from John Baldwin ---
I tried the wdidle3 program but disabling the idle timer did not help in my
case. I ended up replacing the drives with WD blue drives. I talked with
Allan Jude a bit and apparently he has only seen th
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Simon G. changed:
What|Removed |Added
CC||sem...@fly777.net
--- Comment #34 from
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #33 from John Baldwin ---
(In reply to John Baldwin from comment #31)
I was fine for about a week with two active drives, then I tried to add a 3rd
drive to replace one that had died and to bring my zpool back to full health.
I
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #32 from d...@jetcafe.org ---
(In reply to dave from comment #27)
Ok more data is good right? :)
This timeout issue has resurfaced on the weekly scrubs. It is not solved by
what I said in comment #27.
For me, it's time to get
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
John Baldwin changed:
What|Removed |Added
CC||j...@freebsd.org
--- Comment #31 fr
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #30 from Rick ---
(In reply to Rick from comment #29)
Please forget this Remark. It lasted only for a few days :S
--
You are receiving this mail because:
You are the assignee for the bug.
__
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Rick changed:
What|Removed |Added
CC||r...@blommersit.nl
--- Comment #29 from Ric
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Samuel Chow changed:
What|Removed |Added
CC||cysc...@shaw.ca
--- Comment #28 from
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #27 from d...@jetcafe.org ---
(In reply to Alexey from comment #26)
I have another more drastic workaround that works. In /boot/device.hints I
placed the following lines:
hint.ahcich.0.sata_rev=2
hint.ahcich.1.sata_rev=2
hint.a
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #26 from Alexey ---
it works for me since august 2018 with vfs.zfs.cache_flush_disable=1 w/o
reboots and disk timeouts.
--
You are receiving this mail because:
You are the assignee for the bug.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #25 from d...@jetcafe.org ---
(In reply to Allan Jude from comment #24)
Did you find a suitable workaround?
--
You are receiving this mail because:
You are the assignee for the bug.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Allan Jude changed:
What|Removed |Added
CC||allanj...@freebsd.org
--- Comment #24
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #23 from d...@jetcafe.org ---
I've taken smartctl out of the picture and this problem still occurs. I can
trigger it consistently with high write load on the disk subsystem.
Isn't this the same bug as
https://bugs.freebsd.org/bu
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #22 from d...@jetcafe.org ---
Comment on attachment 202427
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=202427
Log messages in sequence for the problem
Having the same problem here, just when I upgraded to FreeBSD 1
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
d...@jetcafe.org changed:
What|Removed |Added
CC||d...@jetcafe.org
--- Comment #21
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #20 from sec ---
(In reply to sec from comment #19)
Ok, so I tried to downgrade pool to 11.1 - didn't helped.
Then I also started to get those errors with only one drive connected (which
was fine before).
Also tried to boot on 1
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
sec changed:
What|Removed |Added
CC||szcze...@szczepan.net
--- Comment #19 from s
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #18 from s...@os2.kiev.ua ---
One more update - after disabling NCQ on both affected drives things are going
well - i do not see any errors anymore. So from my POV - it is a buggy drive
firmware.
--
You are receiving this mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #17 from s...@os2.kiev.ua ---
One update - after disabling NCQ on ada0 i do not see any problems with it, but
ada2 still failing. I am disabling NCQ on both for now.
--
You are receiving this mail because:
You are the assignee
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #16 from s...@os2.kiev.ua ---
To confirm if it is NCQ related or not i decided to disable it on one of the 2
affected drives:
camcontrol negotiate ada0 -T disable
camcontrol reset ada0
Second affected drive will still use NCQ f
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #15 from s...@os2.kiev.ua ---
TBH, it makes me thinking that drives itself or controller may have an issues
with NCQ
--
You are receiving this mail because:
You are the assignee for the bug.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #14 from s...@os2.kiev.ua ---
P.S. We also using
11.2-RELEASE-p4/amd64 on X10SLM-F server,
ahci0: port
0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf000-0xf01f mem
0xf7232000-0xf72327ff irq 19 at device 31.2 on p
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #13 from s...@os2.kiev.ua ---
Also from smart standpoint drives are looking healthy - selftests are passing
fine, attributes are good, no UDMA CRC errors, etc.
--
You are receiving this mail because:
You are the assignee for th
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
s...@os2.kiev.ua changed:
What|Removed |Added
CC||s...@os2.kiev.ua
--- Comment #12
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #11 from cryx-free...@h3q.com ---
(In reply to cryx-freebsd from comment #10)
Setting kern.cam.ada.write_cache=0 in loader.conf and doing a power-cycle makes
the problem go away, with the obvious hits on IO performance.
IHMO th
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
cryx-free...@h3q.com changed:
What|Removed |Added
CC||cryx-free...@h3q.com
--- Com
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #9 from skillcoder ---
Simple shutdown -r now
But i tried, shutdown -p now (with "vfs.zfs.cache_flush_disable=1" in
"/boot/loader.conf") and switch power off.
And now uptime +18 hours and not just one hang and log without any ah
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #8 from Alexey ---
Power off ? or simple /sbin/reboot ? you will need power off + power on. not
reboot or shutdown -r if timeouts already started.
--
You are receiving this mail because:
You are the assignee for the bug.
_
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #7 from skillcoder ---
Unfortunately, this workaround was not help me.
After add "vfs.zfs.cache_flush_disable=1" to "/boot/loader.conf" and reboot.
I still have the same log after boot:
Aug 2 01:30:32 skillcoder kernel: ahcich
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #6 from Alexey ---
(In reply to Alexey from comment #5)
But problem is, if we check spec for HGST HUS722T1TALA604 drive,
https://www.hgst.com/sites/default/files/resources/Ultrastar-7K2-EN-US-DS.pdf I
do not see the drive is "S
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #5 from Alexey ---
Yes, looks like vfs.zfs.cache_flush_disable=1 hide the problem. (At list
servers with raidz1 ZFS pool, who used to start timeouts after no more 2 days
uptime, now have 6 days uptime and no any timeout on conso
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #4 from skillcoder ---
The problem is still relevant
May be this problem be related to bug #224536
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224536
--
You are receiving this mail because:
You are the assignee for the b
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Alexey changed:
What|Removed |Added
Severity|Affects Only Me |Affects Some People
--
You are receiving
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #3 from Alexey ---
Yes, it possibe problem related to disk. We have 8 servers Supermicro X10DRW-i,
4 of them are with HGST HUS722T1TALA604 and another 4 with ТOSHIBA DT01ACA100
MS2OA750 HDDs. All of them use to be installed near
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
skillcoder changed:
What|Removed |Added
CC||m...@skillcoder.com
--- Comment #2 fr
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
--- Comment #1 from Alexey ---
Also we have some recently installed Supermicro X10DRW-i servers with same
problems. Such servers were installed to 11.2, so we have no statistic for 11.1
ahci0: port
0x70b0-0x70b7,0x70a0-0x70a3,0x7090-0x709
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745
Bug ID: 229745
Summary: ahcich: CAM status: Command timeout
Product: Base System
Version: 11.2-STABLE
Hardware: amd64
OS: Any
Status: New
Severity:
78 matches
Mail list logo