Re: Unable to shutdown

2011-08-30 Thread Kevin Oberman
On Mon, Aug 29, 2011 at 1:06 PM, Eli Dart  wrote:
>
>
> On 8/28/11 1:06 PM, Bengt Ahlgren wrote:
>>
>> Kevin Oberman  writes:
>>
>>> I've run into an odd problem with dismounting file systems on a
>>> Seagate Expansion portable
>>> USB drive. Running 8-stable on an amd64 system and with two FAT32
>>> (msdosfs) file systems
>>> on the drive.
>>>
>>> The drive is "green" and spins down when idle.  If an attempt is made
>>> to shutdown the
>>> system while the drive is spun down, the system goes through the usual
>>> shutdown including
>>> flushing all buffer out to disk, but when the final disk access to
>>> mark the file systems as
>>> clean, the drive never spins up and the system hangs until it is
>>> powered down. I've found no
>>> way to avoid this other then to remember to access the disk and cause
>>> it to spin up before
>>> shutting down.
>>>
>>> If I attempt to unmount the file systems when the drive is shut down.
>>> the same thing
>>> happens, but I can recover as the second file system is still mounted
>>> and an ls(1) to that file
>>> system will cause the disk to spin up and everything is fine.
>>>
>>> This looks like a bug, but I don't see why the unmounting of an
>>> msdosfs system does not
>>> spin up the drive. It's clearly hanging on some operation that is not
>>> spinning up the drive,
>>> but does block.
>>>
>>> Any ideas what is going on? Possible fix?
>>
>> Not a solution to your problem, but a data point:
>>
>> I have a WD Passport 750GB (2.5") drive with an UFS filesystem on it.  I
>> don't think I've tried shutdown with the drive mounted, but I've
>> experienced no problems after the drive has spun down, including umount.
>> There is just a delay while it spins up.  This is on 8.2-REL/i386, that
>> is, with the new USB stack.
>
> In my experience, the issues don't show up at lower capacities.  I've seen
> problems with 2TB drives, but 1TB and 1.5TB drives seem to work fine.
>
> Kevin - how big is the disk in question?

"Only" 750G. It's just a little portable drive and not even a new one.
It was big back
when I bought it, but not any more. I think it might be more of an
issue with the
particular firmware on the drive. Some CAM operation seems to never complete
when the drive is spun down. Either:
1. The command cannot be completed with until the drive is spun up,
but a firmware
bug is not triggering a spin-up
or:
2. The command does not need the drive spun up, but a bug in the firmware is not
allowing the completion wen the drive is not spinning.

The more I look at this, the more it seems to me that it is an issue
with the Seagate
drive and not a FreeBSD issue. Probably a bug that is never triggered
on Windows,
so is largely unnoticed. I suspect Widows probably orders the command
is a subtly
different order.

It is probably an issue that FreeBSD fails to ever timeout when this
happens, though.
That makes me suspect that the command in question is one that should
always return
something immediately. I suppose it is also possible that it is some
oddity in the USB
stack, too, but I still suspect that the root issue is a firmware bug
in the drive.
-- 
R. Kevin Oberman, Network Engineer - Retired
E-mail: kob6...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Unable to shutdown

2011-08-30 Thread David Magda
On Tue, August 30, 2011 11:50, Kevin Oberman wrote:
[...]
> The more I look at this, the more it seems to me that it is an issue
> with the Seagate drive and not a FreeBSD issue. Probably a bug that is
> never triggered on Windows, so is largely unnoticed. I suspect Widows
> probably orders the command is a subtly different order.
[...]

Or not the drive per se, but the USB-to-IDE/SATA chipset.

A while back on the OpenSolaris zfs-discuss list there was an issue where
USB drives would have corrupt ZFS pools if a drive was yanked without a
'zpool export' being run. Even though ZFS is supposed to always be
consistent on-disk (because it's transactional), this wasn't happening.

It turned that the chipset had a list of particular SATA commands that it
allowed through to the drive, and all others were simply answered with
"OK", regardless of what actual actions needed to be taken. One of the
SATA commands that was NOT whitelisted was the 'cache flush'
command--which ZFS needs to make sure that it's data structures were
written in the proper order.

Turns out the drive and its firmware were fine and doing things properly,
it's just that the necessary commands weren't getting to it because of the
USB adaptor's chipsset.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Unable to shutdown

2011-08-30 Thread Jeremy Chadwick
On Tue, Aug 30, 2011 at 01:29:02PM -0400, David Magda wrote:
> On Tue, August 30, 2011 11:50, Kevin Oberman wrote:
> [...]
> > The more I look at this, the more it seems to me that it is an issue
> > with the Seagate drive and not a FreeBSD issue. Probably a bug that is
> > never triggered on Windows, so is largely unnoticed. I suspect Widows
> > probably orders the command is a subtly different order.
> [...]
> 
> Or not the drive per se, but the USB-to-IDE/SATA chipset.
> 
> A while back on the OpenSolaris zfs-discuss list there was an issue where
> USB drives would have corrupt ZFS pools if a drive was yanked without a
> 'zpool export' being run. Even though ZFS is supposed to always be
> consistent on-disk (because it's transactional), this wasn't happening.
> 
> It turned that the chipset had a list of particular SATA commands that it
> allowed through to the drive, and all others were simply answered with
> "OK", regardless of what actual actions needed to be taken. One of the
> SATA commands that was NOT whitelisted was the 'cache flush'
> command--which ZFS needs to make sure that it's data structures were
> written in the proper order.
> 
> Turns out the drive and its firmware were fine and doing things properly,
> it's just that the necessary commands weren't getting to it because of the
> USB adaptor's chipsset.

I don't think that advice is applicable in this situation.  Here's why:

Kevin's original description indicates that when the drive (or enclosure
translation ASIC for that matter) is in standby, when the system is shut
down, the drive/ASIC never spins back up on I/O (flushing all I/O
buffers to disk).

If he issues "ls" commands or similar userland-induced I/O to the drive
prior to shutting the system down, the drive/ASIC spins up normally.

Here's Kevin's original quote:

>> The drive is "green" and spins down when idle.  If an attempt is made
>> to shutdown the system while the drive is spun down, the system goes
>> through the usual shutdown including flushing all buffer out to disk,
>> but when the final disk access to mark the file systems as clean, the
>> drive never spins up and the system hangs until it is powered down.
>> I've found no way to avoid this other then to remember to access the
>> disk and cause it to spin up before shutting down.
>>
>> If I attempt to unmount the file systems when the drive is shut down.
>> the same thing happens, but I can recover as the second file system
>> is still mounted and an ls(1) to that file system will cause the disk
>> to spin up and everything is fine.

So the question is what's "unique" about flushing all I/O buffers to
disk during shutdown compared to issuing standard I/O in userland.  I
can speculate all day as to what the cause is, but it's highly unlikely
that the USB-to-SATA controller ASIC is causing the problem.

Furthermore, Windows doesn't have "special disk/enclosure drivers" for
such drives, so there's nothing "unique" Windows would be sending across
the wire, ATA-protocol-wise, that would explain why Windows works and
FreeBSD doesn't.  At least that's my opinion.

With ATA/SATA, the FLUSH CACHE (0xe7) and -EXT (0xea) (for 48-bit LBAs)
commands are separate from WRITE DMA (0xca) and -EXT (0x35) (for 48-bit
LBAs).  Both FLUSH CACHE commands do not take LBAs in their input CDB.
To "flush buffers to disk" I imagine what the kernel should be doing is
issuing WRITE commands followed by FLUSH CACHE.  The WRITEs should be
"waking" the drive up.

But wait, there's more.

I want to point out to people that "sleep" and "standby" are two very
different things (they're separate ATA commands too).  So if you're
using "camcontrol sleep" you probably should be using "camcontrol
standby".  The man page is quite clear about the repercussions of the
former (and in the latter case I can imagine I/O to the drive failing or
simply timing out given that a bus reset is not performed during
shutdown TMK).

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Unable to shutdown

2011-08-30 Thread Kevin Oberman
On Tue, Aug 30, 2011 at 2:48 PM, Jeremy Chadwick
 wrote:
> On Tue, Aug 30, 2011 at 01:29:02PM -0400, David Magda wrote:
>> On Tue, August 30, 2011 11:50, Kevin Oberman wrote:
>> [...]
>> > The more I look at this, the more it seems to me that it is an issue
>> > with the Seagate drive and not a FreeBSD issue. Probably a bug that is
>> > never triggered on Windows, so is largely unnoticed. I suspect Widows
>> > probably orders the command is a subtly different order.
>> [...]
>>
>> Or not the drive per se, but the USB-to-IDE/SATA chipset.
>>
>> A while back on the OpenSolaris zfs-discuss list there was an issue where
>> USB drives would have corrupt ZFS pools if a drive was yanked without a
>> 'zpool export' being run. Even though ZFS is supposed to always be
>> consistent on-disk (because it's transactional), this wasn't happening.
>>
>> It turned that the chipset had a list of particular SATA commands that it
>> allowed through to the drive, and all others were simply answered with
>> "OK", regardless of what actual actions needed to be taken. One of the
>> SATA commands that was NOT whitelisted was the 'cache flush'
>> command--which ZFS needs to make sure that it's data structures were
>> written in the proper order.
>>
>> Turns out the drive and its firmware were fine and doing things properly,
>> it's just that the necessary commands weren't getting to it because of the
>> USB adaptor's chipsset.
>
> I don't think that advice is applicable in this situation.  Here's why:
>
> Kevin's original description indicates that when the drive (or enclosure
> translation ASIC for that matter) is in standby, when the system is shut
> down, the drive/ASIC never spins back up on I/O (flushing all I/O
> buffers to disk).
>
> If he issues "ls" commands or similar userland-induced I/O to the drive
> prior to shutting the system down, the drive/ASIC spins up normally.
>
> Here's Kevin's original quote:
>
>>> The drive is "green" and spins down when idle.  If an attempt is made
>>> to shutdown the system while the drive is spun down, the system goes
>>> through the usual shutdown including flushing all buffer out to disk,
>>> but when the final disk access to mark the file systems as clean, the
>>> drive never spins up and the system hangs until it is powered down.
>>> I've found no way to avoid this other then to remember to access the
>>> disk and cause it to spin up before shutting down.
>>>
>>> If I attempt to unmount the file systems when the drive is shut down.
>>> the same thing happens, but I can recover as the second file system
>>> is still mounted and an ls(1) to that file system will cause the disk
>>> to spin up and everything is fine.
>
> So the question is what's "unique" about flushing all I/O buffers to
> disk during shutdown compared to issuing standard I/O in userland.  I
> can speculate all day as to what the cause is, but it's highly unlikely
> that the USB-to-SATA controller ASIC is causing the problem.

You are perhaps assuming a bit too much. Since I know that a disk read or write
WILL spin up the drive, I can only assume that the msdosfs is not finding
anything to flush, so is not writing. I see the full "flushing all
buffers" countdown
and it always runs successfully to zero. This, without the drive
spinning up. This
begs at least the question of whether the drive is receiving any writes or
whether the "writes" are simply being cached by the drive to save energy. I
suspect that the drive only spins up when enough of its write cache is filled.

In that case, the "flush cache" might actually be what is issued, but
I can't claim
any certainly about that. I'm not willing to completely clear the
USB-SATA chip as
the culprit.

> Furthermore, Windows doesn't have "special disk/enclosure drivers" for
> such drives, so there's nothing "unique" Windows would be sending across
> the wire, ATA-protocol-wise, that would explain why Windows works and
> FreeBSD doesn't.  At least that's my opinion.

This is not always quite true, but it is true for the general case. (I
know some WD
enclosures do install a custom driver.)
>
> With ATA/SATA, the FLUSH CACHE (0xe7) and -EXT (0xea) (for 48-bit LBAs)
> commands are separate from WRITE DMA (0xca) and -EXT (0x35) (for 48-bit
> LBAs).  Both FLUSH CACHE commands do not take LBAs in their input CDB.
> To "flush buffers to disk" I imagine what the kernel should be doing is
> issuing WRITE commands followed by FLUSH CACHE.  The WRITEs should be
> "waking" the drive up.

Should they? As I pointed out above, that is not necessarily the case.
>
> But wait, there's more.
>
> I want to point out to people that "sleep" and "standby" are two very
> different things (they're separate ATA commands too).  So if you're
> using "camcontrol sleep" you probably should be using "camcontrol
> standby".  The man page is quite clear about the repercussions of the
> former (and in the latter case I can imagine I/O to the drive failing or
> simply timing out given that a bus reset is not pe

Re: Unable to shutdown

2011-08-30 Thread Jeremy Chadwick
On Tue, Aug 30, 2011 at 04:10:13PM -0700, Kevin Oberman wrote:
> On Tue, Aug 30, 2011 at 2:48 PM, Jeremy Chadwick
>  wrote:
> > On Tue, Aug 30, 2011 at 01:29:02PM -0400, David Magda wrote:
> >> On Tue, August 30, 2011 11:50, Kevin Oberman wrote:
> >> [...]
> >> > The more I look at this, the more it seems to me that it is an issue
> >> > with the Seagate drive and not a FreeBSD issue. Probably a bug that is
> >> > never triggered on Windows, so is largely unnoticed. I suspect Widows
> >> > probably orders the command is a subtly different order.
> >> [...]
> >>
> >> Or not the drive per se, but the USB-to-IDE/SATA chipset.
> >>
> >> A while back on the OpenSolaris zfs-discuss list there was an issue where
> >> USB drives would have corrupt ZFS pools if a drive was yanked without a
> >> 'zpool export' being run. Even though ZFS is supposed to always be
> >> consistent on-disk (because it's transactional), this wasn't happening.
> >>
> >> It turned that the chipset had a list of particular SATA commands that it
> >> allowed through to the drive, and all others were simply answered with
> >> "OK", regardless of what actual actions needed to be taken. One of the
> >> SATA commands that was NOT whitelisted was the 'cache flush'
> >> command--which ZFS needs to make sure that it's data structures were
> >> written in the proper order.
> >>
> >> Turns out the drive and its firmware were fine and doing things properly,
> >> it's just that the necessary commands weren't getting to it because of the
> >> USB adaptor's chipsset.
> >
> > I don't think that advice is applicable in this situation. ?Here's why:
> >
> > Kevin's original description indicates that when the drive (or enclosure
> > translation ASIC for that matter) is in standby, when the system is shut
> > down, the drive/ASIC never spins back up on I/O (flushing all I/O
> > buffers to disk).
> >
> > If he issues "ls" commands or similar userland-induced I/O to the drive
> > prior to shutting the system down, the drive/ASIC spins up normally.
> >
> > Here's Kevin's original quote:
> >
> >>> The drive is "green" and spins down when idle. ?If an attempt is made
> >>> to shutdown the system while the drive is spun down, the system goes
> >>> through the usual shutdown including flushing all buffer out to disk,
> >>> but when the final disk access to mark the file systems as clean, the
> >>> drive never spins up and the system hangs until it is powered down.
> >>> I've found no way to avoid this other then to remember to access the
> >>> disk and cause it to spin up before shutting down.
> >>>
> >>> If I attempt to unmount the file systems when the drive is shut down.
> >>> the same thing happens, but I can recover as the second file system
> >>> is still mounted and an ls(1) to that file system will cause the disk
> >>> to spin up and everything is fine.
> >
> > So the question is what's "unique" about flushing all I/O buffers to
> > disk during shutdown compared to issuing standard I/O in userland. ?I
> > can speculate all day as to what the cause is, but it's highly unlikely
> > that the USB-to-SATA controller ASIC is causing the problem.
> 
> You are perhaps assuming a bit too much. Since I know that a disk read
> or write WILL spin up the drive, I can only assume that the msdosfs is
> not finding anything to flush, so is not writing. I see the full
> "flushing all buffers" countdown and it always runs successfully to
> zero. This, without the drive spinning up. This begs at least the
> question of whether the drive is receiving any writes or whether the
> "writes" are simply being cached by the drive to save energy. I
> suspect that the drive only spins up when enough of its write cache is
> filled.

If there's "nothing to flush", then why is the kernel indefinitely
looping (finally giving up, and it usually prints something when it
encounters that condition) when trying to flush buffers when the drive
is spun down?  What exactly is it trying to flush if there's "nothing to
flush"?

Let me ask you this: can you stop using msdosfs on said USB device and
instead use UFS2 and see if the problem disappears?  This is in no way a
permanent solution.  If this workaround fixes the problem, then I'm
inclined to believe msdosfs is to blame.  There have been a lot of
discussion of this driver in the kernel as of late, and the general
opinion of it is that it's crummy.

And here's another thought: what if the issue is limited, somehow, to
just writes?  Meaning, could the kernel issue a "false" read to the
device (for some random LBA, even LBA 0 for all I care) and then proceed
with its write/flushing?  I wonder if that would cause the drive to spin
up first.  That would be a "quirk" in my opinion.

There's also the possibility the USB stack on FreeBSD is doing something
really stupid... man, I don't even want to go down that road.  Hans
should be able to help determine if that's the case, but not using
msdosfs as a test would be a good start.

> In that case, the "flu

Re: Unable to shutdown

2011-08-30 Thread Kevin Oberman
Jeremy,

I think we are simply not communicating, I guess. You are arguing
point with which I agree.

Comments in line:
On Tue, Aug 30, 2011 at 4:43 PM, Jeremy Chadwick
 wrote:
> On Tue, Aug 30, 2011 at 04:10:13PM -0700, Kevin Oberman wrote:
>> On Tue, Aug 30, 2011 at 2:48 PM, Jeremy Chadwick
>>  wrote:
>> > On Tue, Aug 30, 2011 at 01:29:02PM -0400, David Magda wrote:
>> >> On Tue, August 30, 2011 11:50, Kevin Oberman wrote:
>> >> [...]
>> >> > The more I look at this, the more it seems to me that it is an issue
>> >> > with the Seagate drive and not a FreeBSD issue. Probably a bug that is
>> >> > never triggered on Windows, so is largely unnoticed. I suspect Widows
>> >> > probably orders the command is a subtly different order.
>> >> [...]
>> >>
>> >> Or not the drive per se, but the USB-to-IDE/SATA chipset.
>> >>
>> >> A while back on the OpenSolaris zfs-discuss list there was an issue where
>> >> USB drives would have corrupt ZFS pools if a drive was yanked without a
>> >> 'zpool export' being run. Even though ZFS is supposed to always be
>> >> consistent on-disk (because it's transactional), this wasn't happening.
>> >>
>> >> It turned that the chipset had a list of particular SATA commands that it
>> >> allowed through to the drive, and all others were simply answered with
>> >> "OK", regardless of what actual actions needed to be taken. One of the
>> >> SATA commands that was NOT whitelisted was the 'cache flush'
>> >> command--which ZFS needs to make sure that it's data structures were
>> >> written in the proper order.
>> >>
>> >> Turns out the drive and its firmware were fine and doing things properly,
>> >> it's just that the necessary commands weren't getting to it because of the
>> >> USB adaptor's chipsset.
>> >
>> > I don't think that advice is applicable in this situation. ?Here's why:
>> >
>> > Kevin's original description indicates that when the drive (or enclosure
>> > translation ASIC for that matter) is in standby, when the system is shut
>> > down, the drive/ASIC never spins back up on I/O (flushing all I/O
>> > buffers to disk).
>> >
>> > If he issues "ls" commands or similar userland-induced I/O to the drive
>> > prior to shutting the system down, the drive/ASIC spins up normally.
>> >
>> > Here's Kevin's original quote:
>> >
>> >>> The drive is "green" and spins down when idle. ?If an attempt is made
>> >>> to shutdown the system while the drive is spun down, the system goes
>> >>> through the usual shutdown including flushing all buffer out to disk,
>> >>> but when the final disk access to mark the file systems as clean, the
>> >>> drive never spins up and the system hangs until it is powered down.
>> >>> I've found no way to avoid this other then to remember to access the
>> >>> disk and cause it to spin up before shutting down.
>> >>>
>> >>> If I attempt to unmount the file systems when the drive is shut down.
>> >>> the same thing happens, but I can recover as the second file system
>> >>> is still mounted and an ls(1) to that file system will cause the disk
>> >>> to spin up and everything is fine.
>> >
>> > So the question is what's "unique" about flushing all I/O buffers to
>> > disk during shutdown compared to issuing standard I/O in userland. ?I
>> > can speculate all day as to what the cause is, but it's highly unlikely
>> > that the USB-to-SATA controller ASIC is causing the problem.
>>
>> You are perhaps assuming a bit too much. Since I know that a disk read
>> or write WILL spin up the drive, I can only assume that the msdosfs is
>> not finding anything to flush, so is not writing. I see the full
>> "flushing all buffers" countdown and it always runs successfully to
>> zero. This, without the drive spinning up. This begs at least the
>> question of whether the drive is receiving any writes or whether the
>> "writes" are simply being cached by the drive to save energy. I
>> suspect that the drive only spins up when enough of its write cache is
>> filled.
>
> If there's "nothing to flush", then why is the kernel indefinitely
> looping (finally giving up, and it usually prints something when it
> encounters that condition) when trying to flush buffers when the drive
> is spun down?  What exactly is it trying to flush if there's "nothing to
> flush"?

I think you may be focusing on things you believe I meant when I didn't mean or
say them. I don't have any reason to believe that a cache flush is or is not the
command that is hanging. I have absolutely no doubt that a flush is requested by
the OS during the unmount process.  I'm just not sure what other commands might
be issued. And, of course, they are CAM operations that the box is probably
converting to SATA, but I can't even say this for sure as the Seagate
drive in question
is a SATA drive in the box. I can only say that the drive is not a
standard 9mm laptop
drive It is longer, thicker and heavier than a laptop drive. It is the
same width as a
normal 2.5 in. drive.

As to the issue of "nothing to flush", that was my f