Re: Has anyone usd hast in production yet - opinions ?

2010-10-04 Thread Jeremy Chadwick
On Mon, Oct 04, 2010 at 09:37:40AM +0100, Pete French wrote:
> I've been testing out hast intrenally here, and as a replacement for the
> gmirror+ggate stack which we are currently using it seems excellent. I am
> actually about to take the plunge and deploy this in production for our
> main database server as it seems to operate fine in testing, but before I
> do has anyone else done this and are there any interesting things I need
> to know ?

Please see the freebsd-fs mailing list, which has quite a large number
of problem reports/issues being posted to it on a regular basis (and
patches are often provided).

http://lists.freebsd.org/pipermail/freebsd-fs/

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Has anyone usd hast in production yet - opinions ?

2010-10-04 Thread Pete French
I've been testing out hast intrenally here, and as a replacement for the
gmirror+ggate stack which we are currently using it seems excellent. I am
actually about to take the plunge and deploy this in production for our
main database server as it seems to operate fine in testing, but before I
do has anyone else done this and are there any interesting things I need
to know ?

cheers,

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs send/receive: is this slow?

2010-10-04 Thread Martin Matuska
Try using zfs receive with the -v flag (gives you some stats at the end):
# zfs send storage/bac...@transfer | zfs receive -v
storage/compressed/bacula

And use the following sysctl (you may set that in /boot/loader.conf, too):
# sysctl vfs.zfs.txg.write_limit_override=805306368

I have good results with the 768MB writelimit on systems with at least
8GB RAM. With 4GB ram, you might want to try to set the TXG write limit
to a lower threshold (e.g. 256MB):
# sysctl vfs.zfs.txg.write_limit_override=268435456

You can experiment with that setting to get the best results on your
system. A value of 0 means using calculated default (which is very high).

During the operation you can observe what your disks actually do:
a) via ZFS pool I/O statistics:
# zpool iostat -v 1
b) via GEOM:
# gstat -a

mm

Dňa 4. 10. 2010 4:06, Artem Belevich  wrote / napísal(a):
> On Sun, Oct 3, 2010 at 6:11 PM, Dan Langille  wrote:
>> I'm rerunning my test after I had a drive go offline[1].  But I'm not
>> getting anything like the previous test:
>>
>> time zfs send storage/bac...@transfer | mbuffer | zfs receive
>> storage/compressed/bacula-buffer
>>
>> $ zpool iostat 10 10
>>   capacity operationsbandwidth
>> pool used  avail   read  write   read  write
>> --  -  -  -  -  -  -
>> storage 6.83T  5.86T  8 31  1.00M  2.11M
>> storage 6.83T  5.86T207481  25.7M  17.8M
> 
> It may be worth checking individual disk activity using gstat -f 'da.$'
> 
> Some time back I had one drive that was noticeably slower than the
> rest of the  drives in RAID-Z2 vdev and was holding everything back.
> SMART looked OK, there were no obvious errors and yet performance was
> much worse than what I'd expect. gstat clearly showed that one drive
> was almost constantly busy with much lower number of reads and writes
> per second than its peers.
> 
> Perhaps previously fast transfer rates were due to caching effects.
> I.e. if all metadata already made it into ARC, subsequent "zfs send"
> commands would avoid a lot of random seeks and would show much better
> throughput.
> 
> --Artem
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: out of HDD space - zfs degraded

2010-10-04 Thread Alexander Motin
Alexander Leidinger wrote:
> On Sat, 02 Oct 2010 22:25:18 -0400 Steve Polyack 
> wrote:
> 
>> I thin its worth it to think about TLER (or the absence of it) here - 
>> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery .  Your 
>> consumer / SATA Hitachi drives likely do not put a limit on the time
>> the drive may block on a command while handling inernal errors.  If
>> we consider that gpt/gisk06-live encountered some kind of error and
>> had to relocate a significant number of blocks or perform some other
>> error recovery, then it very well may have timed out long enough for
>> siis(4) to drop the device.  I have no idea what the timeouts are set
>> to in the siis(4) driver, nor does anything in your SMART report
>> stick out to me (though I'm certainly no expert with SMART data, and
>> my understanding is that many drive manufacturers report the various
>> parameters in different ways).

Timeouts for commands usually defined by ada(4) peripheral driver and
ATA transport layer of CAM. Most of timeouts set to 30 seconds. Only
time value defined by siis(4) is hard reset time - 15 seconds now.

As soon as drive didn't reappeared after `camcontrol reset/rescan ...`
done after significant period of time, but required power cycle, I have
doubt that any timeout value could help it.

It may be also theoretically possible that it was controller firmware
stuck, not drive. It would be interesting to power cycle specific drive
if problem repeats.

> IIRC mav@ (CCed) made a commit regarding this to -current in the not so
> distant past. I do not know about the MFC status of this, or if it may
> have helped or not in this situation.

My last commit to siis(4) 2 weeks ago (merged recently) fixed specific
bug in timeout handling, leading to system crash. I don't see alike
symptoms here.

If there was any messages before "Oct  2 00:50:53 kraken kernel:
(ada0:siisch0:0:0:0): lost device", they could give some hints about
original problem. Messages after it could be consequence.

Enabling verbose kernel messages could give some more information about
what happened there.

-- 
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: attempted pmap_enter on 2MB page

2010-10-04 Thread Dave Hayes
Alan Cox  writes:
> I'm afraid that I can't offer much insight without a stack trace.  At
> initialization time, we map the kernel with 2MB pages.  I suspect that
> something within the kernel is later trying to change one those mappings.
> If I had to guess, it's related to the mfs root.

Here is the stack trace. The machine is sitting here in KDB if you
need me to extract any information from it. I

  db> bt
  Tracing pid 0 tid 0 td 0x80c67140
  kdb_enter() at kdbenter+0x3d
  panic() at panic+0x17b
  pmap_enter() at pmap_enter+0x641
  kmem_malloc() at kmem_malloc+0x1b5
  uma_large_malloc() at uma_large_malloc+0x4a
  malloc() at malloc+0xd7
  acpi_alloc_wakeup_handler() at acpi_alloc_wakeup_handler+0x82
  mi_startup() at mi_startup+0x59
  btext() at btext+0x2c
  db>

-- 
Dave Hayes - Consultant - Altadena CA, USA - d...@jetcafe.org 
>>> The opinions expressed above are entirely my own <<<

Only one who is seeking certainty can be uncertain.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: out of HDD space - zfs degraded

2010-10-04 Thread Jeremy Chadwick
On Mon, Oct 04, 2010 at 05:03:47PM +0300, Alexander Motin wrote:
> Alexander Leidinger wrote:
> > On Sat, 02 Oct 2010 22:25:18 -0400 Steve Polyack 
> > wrote:
> > 
> >> I thin its worth it to think about TLER (or the absence of it) here - 
> >> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery .  Your 
> >> consumer / SATA Hitachi drives likely do not put a limit on the time
> >> the drive may block on a command while handling inernal errors.  If
> >> we consider that gpt/gisk06-live encountered some kind of error and
> >> had to relocate a significant number of blocks or perform some other
> >> error recovery, then it very well may have timed out long enough for
> >> siis(4) to drop the device.  I have no idea what the timeouts are set
> >> to in the siis(4) driver, nor does anything in your SMART report
> >> stick out to me (though I'm certainly no expert with SMART data, and
> >> my understanding is that many drive manufacturers report the various
> >> parameters in different ways).
> 
> Timeouts for commands usually defined by ada(4) peripheral driver and
> ATA transport layer of CAM. Most of timeouts set to 30 seconds. Only
> time value defined by siis(4) is hard reset time - 15 seconds now.
> 
> As soon as drive didn't reappeared after `camcontrol reset/rescan ...`
> done after significant period of time, but required power cycle, I have
> doubt that any timeout value could help it.
>
> It may be also theoretically possible that it was controller firmware
> stuck, not drive. It would be interesting to power cycle specific drive
> if problem repeats.

FWIW, I agree with m...@.  I also wanted to talk a bit about TLER, since
it wouldn't have helped in this situation.

TLER wouldn't have helped because the drive (either mechanically or the
firmware) went catatonic and required a power-cycle.  TLER would have
caused siis(4) to witness a proper ATA error code for the read/write it
submit to the drive, and the timeout would (ideally) be shorter than
what of the siis(4) or ada(4) layers.  So that ATA command would have
failed and the OS, almost certainly, would have continued to submit
more requests, resulted in an error response, etc...

Depending on how the drivers were written, this situation could cause
the storage driver to get stuck in an infinite loop trying to read or
write to a device that's catatonic.  I've seen this happen on Solaris
with, again, those Fujitsu disks (the drives return an error status in
response to the CDB, the OS/driver says "that's nice" and continues to
submit commands to the drive because it's still responding).

What I'm trying to say: what ada(4) and/or siis(4) did was the Correct
Thing(tm) in my opinion.

This is one of the reasons I don't blindly enable TLER on WDC Black
disks that I have.  Someone would need to quite honestly test the
behaviour of TLER with FreeBSD (testing both disk catatonic state as
well as transient error + recovery) to see how things behave.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: out of HDD space - zfs degraded

2010-10-04 Thread Alexander Leidinger
Quoting Dan Langille  (from Sun, 03 Oct 2010  
08:08:19 -0400):



Overnight, the following appeared in /var/log/messages:

Oct  2 21:56:46 kraken root: ZFS: checksum mismatch, zpool=storage  
path=/dev/gpt/disk06-live offset=123103157760 size=1024
Oct  2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage  
path=/dev/gpt/disk06-live offset=123103159808 size=1024


[...]

Given the outage from yesterday when ada0 was offline for several  
hours, I'm guessing that checksum mismatches on that drive are  
expected.  Yes, /dev/gpt/disk06-live == ada0.


If you have the possibility to run a scrub of the pool, there should  
be no additional checksum errors accouring *after* the scrub is  
*finished*. If checksum errors still appear on this disk after the  
scrub is finished, you should have a look at the hardware (cable/disk)  
and take appropriate replacement actions.


Bye,
Alexander.

--
If your mother knew what you're doing,
she'd probably hang her head and cry.

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Has anyone usd hast in production yet - opinions ?

2010-10-04 Thread Pete French
> Please see the freebsd-fs mailing list, which has quite a large number
> of problem reports/issues being posted to it on a regular basis (and
> patches are often provided).

Thanks have signed up - I was signed up to 'geom' but not that one.
A large number of problem reports is not quite what I was hping for, but
good to know,a nd maybe I shall hold off for a while :-)

cheers,

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: is there a bug in AWK on 6.x and 7.x (fixed in 8.x)?

2010-10-04 Thread Peter C. Lai
Is this becuase the behavior of "FS=" was changed to match the behavior
of awk -F

On 2010-10-02 09:58:27PM +0200, Miroslav Lachman wrote:
> I think there is a bug in AWK in base of FreeBSD 6.x and 7.x (tested on 
> 6.4 i386 and 7.3 i386)
> 
> I have this simple test case, where I want 2 columns from GeoIP CSV file:
> 
> awk 'FS="," { print $1"-"$2 }' GeoIPCountryWhois.csv
> 
> It should produce output like this:
> 
> # awk 'FS="," { print $1"-"$2 }' GeoIPCountryWhois.csv | head -n 5
> "1.0.0.0"-"1.7.255.255"
> "1.9.0.0"-"1.9.255.255"
> "1.10.10.0"-"1.10.10.255"
> "1.11.0.0"-"1.11.255.255"
> "1.12.0.0"-"1.15.255.255"
> 
> (above is taken from FreeBSD 8.1 i386)
> 
> On FreeBSD 6.4 and 7.3 it results in broken first line:
> 
> awk 'FS="," { print $1"-"$2 }' GeoIPCountryWhois.csv | head -n 5
> "1.0.0.0","1.7.255.255","16777216","17301503","AU","Australia"-
> "1.9.0.0"-"1.9.255.255"
> "1.10.10.0"-"1.10.10.255"
> "1.11.0.0"-"1.11.255.255"
> "1.12.0.0"-"1.15.255.255"
> 
> There are no errors in CSV file, it doesn't metter if I delete the 
> affected first line from the file.
> 
> It is reproducible with handmade file:
> 
> # cat test.csv
> "1.9.0.0","1.9.255.255","17367040","17432575","MY","Malaysia"
> "1.10.10.0","1.10.10.255","17435136","17435391","AU","Australia"
> "1.11.0.0","1.11.255.255","17498112","17563647","KR","Korea, Republic of"
> "1.12.0.0","1.15.255.255","17563648","17825791","CN","China"
> "1.16.0.0","1.19.255.255","17825792","18087935","KR","Korea, Republic of"
> "1.21.0.0","1.21.255.255","18153472","18219007","JP","Japan"
> 
> 
> # awk 'FS="," { print $1"-"$2 }' test.csv
> "1.9.0.0","1.9.255.255","17367040","17432575","MY","Malaysia"-
> "1.10.10.0"-"1.10.10.255"
> "1.11.0.0"-"1.11.255.255"
> "1.12.0.0"-"1.15.255.255"
> "1.16.0.0"-"1.19.255.255"
> "1.21.0.0"-"1.21.255.255"
> 
> 
> As it works in 8.1, can it be fixed in 7-STABLE?
> (I don't know if it was purposely fixed or if it is coincidence of newer 
> version of AWK in 8.x)
> 
> Should I file PR for it?
> 
> Miroslav Lachman
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

-- 
===
Peter C. Lai | Bard College at Simon's Rock
Systems Administrator| 84 Alford Rd.
Information Technology Svcs. | Gt. Barrington, MA 01230 USA
peter AT simons-rock.edu | (413) 528-7428
===

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs send/receive: is this slow?

2010-10-04 Thread Dan Langille

On Mon, October 4, 2010 3:27 am, Martin Matuska wrote:
> Try using zfs receive with the -v flag (gives you some stats at the end):
> # zfs send storage/bac...@transfer | zfs receive -v
> storage/compressed/bacula
>
> And use the following sysctl (you may set that in /boot/loader.conf, too):
> # sysctl vfs.zfs.txg.write_limit_override=805306368
>
> I have good results with the 768MB writelimit on systems with at least
> 8GB RAM. With 4GB ram, you might want to try to set the TXG write limit
> to a lower threshold (e.g. 256MB):
> # sysctl vfs.zfs.txg.write_limit_override=268435456
>
> You can experiment with that setting to get the best results on your
> system. A value of 0 means using calculated default (which is very high).

I will experiment with the above.  In the meantime:

> During the operation you can observe what your disks actually do:
> a) via ZFS pool I/O statistics:
> # zpool iostat -v 1
> b) via GEOM:
> # gstat -a

The following output was produced while the original copy was underway.

$ sudo gstat -a -b -I 20s
dT: 20.002s  w: 20.000s
 L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
7452387  248019.5 64   21287.1   79.4  ada0
7452387  248019.5 64   21287.2   79.4  ada0p1
4492427  246556.7 64   21286.6   63.0  ada1
4494428  246916.9 65   21276.6   66.9  ada2
8379313  24798   13.5 65   21277.5   78.6  ada3
5372306  24774   14.2 64   21277.5   77.6  ada4
   10355291  24741   15.9 63   21277.4   79.6  ada5
4380316  24807   13.2 64   21287.7   77.0  ada6
7452387  248019.5 64   21287.4   79.7 
gpt/disk06-live
4492427  246556.7 64   21286.7   63.1  ada1p1
4494428  246916.9 65   21276.6   66.9  ada2p1
8379313  24798   13.5 65   21277.6   78.6  ada3p1
5372306  24774   14.2 64   21277.6   77.6  ada4p1
   10355291  24741   15.9 63   21277.5   79.6  ada5p1
4380316  24807   13.2 64   21287.8   77.0  ada6p1
4492427  246556.8 64   21286.9   63.4 
gpt/disk01-live
4494428  246916.9 65   21276.8   67.2 
gpt/disk02-live
8379313  24798   13.5 65   21277.7   78.8 
gpt/disk03-live
5372306  24774   14.2 64   21277.8   77.8 
gpt/disk04-live
   10355291  24741   15.9 63   21277.7   79.8 
gpt/disk05-live
4380316  24807   13.2 64   21288.0   77.2 
gpt/disk07-live


$ zpool ol iostat 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
storage 8.08T  4.60T364161  41.7M  7.94M
storage 8.08T  4.60T926133   112M  5.91M
storage 8.08T  4.60T738164  89.0M  9.75M
storage 8.08T  4.60T  1.18K179   146M  8.10M
storage 8.08T  4.60T  1.09K193   135M  9.94M
storage 8.08T  4.60T   1010185   122M  8.68M
storage 8.08T  4.60T  1.06K184   131M  9.65M
storage 8.08T  4.60T867178   105M  11.8M
storage 8.08T  4.60T  1.06K198   131M  12.0M
storage 8.08T  4.60T  1.06K185   131M  12.4M

Yeterday's write bandwidth was more 80-90M.  It's down, a lot.

I'll look closer this evening.


>
> mm
>
> Dňa 4. 10. 2010 4:06, Artem Belevich  wrote / napísal(a):
>> On Sun, Oct 3, 2010 at 6:11 PM, Dan Langille  wrote:
>>> I'm rerunning my test after I had a drive go offline[1].  But I'm not
>>> getting anything like the previous test:
>>>
>>> time zfs send storage/bac...@transfer | mbuffer | zfs receive
>>> storage/compressed/bacula-buffer
>>>
>>> $ zpool iostat 10 10
>>>   capacity operationsbandwidth
>>> pool used  avail   read  write   read  write
>>> --  -  -  -  -  -  -
>>> storage 6.83T  5.86T  8 31  1.00M  2.11M
>>> storage 6.83T  5.86T207481  25.7M  17.8M
>>
>> It may be worth checking individual disk activity using gstat -f 'da.$'
>>
>> Some time back I had one drive that was noticeably slower than the
>> rest of the  drives in RAID-Z2 vdev and was holding everything back.
>> SMART looked OK, there were no obvious errors and yet performance was
>> much worse than what I'd expect. gstat clearly showed that one drive
>> was almost constantly busy with much lower number of reads and writes
>> per second than its peers.
>>
>> Perhaps previously fast transfer rates were due to caching effects.
>> I.e. if all metadata already made it into ARC, subsequent "zfs send"
>> commands would avoid a lot of random seeks and would show much better
>> throughput.
>>
>> --Artem
>> ___
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable

Re: zfs send/receive: is this slow?

2010-10-04 Thread Jeremy Chadwick
On Mon, Oct 04, 2010 at 01:31:07PM -0400, Dan Langille wrote:
> 
> On Mon, October 4, 2010 3:27 am, Martin Matuska wrote:
> > Try using zfs receive with the -v flag (gives you some stats at the end):
> > # zfs send storage/bac...@transfer | zfs receive -v
> > storage/compressed/bacula
> >
> > And use the following sysctl (you may set that in /boot/loader.conf, too):
> > # sysctl vfs.zfs.txg.write_limit_override=805306368
> >
> > I have good results with the 768MB writelimit on systems with at least
> > 8GB RAM. With 4GB ram, you might want to try to set the TXG write limit
> > to a lower threshold (e.g. 256MB):
> > # sysctl vfs.zfs.txg.write_limit_override=268435456
> >
> > You can experiment with that setting to get the best results on your
> > system. A value of 0 means using calculated default (which is very high).
> 
> I will experiment with the above.  In the meantime:
> 
> > During the operation you can observe what your disks actually do:
> > a) via ZFS pool I/O statistics:
> > # zpool iostat -v 1
> > b) via GEOM:
> > # gstat -a
> 
> The following output was produced while the original copy was underway.
> 
> $ sudo gstat -a -b -I 20s
> dT: 20.002s  w: 20.000s
>  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
> 7452387  248019.5 64   21287.1   79.4  ada0
> 7452387  248019.5 64   21287.2   79.4  ada0p1
> 4492427  246556.7 64   21286.6   63.0  ada1
> 4494428  246916.9 65   21276.6   66.9  ada2
> 8379313  24798   13.5 65   21277.5   78.6  ada3
> 5372306  24774   14.2 64   21277.5   77.6  ada4
>10355291  24741   15.9 63   21277.4   79.6  ada5
> 4380316  24807   13.2 64   21287.7   77.0  ada6
> 7452387  248019.5 64   21287.4   79.7  gpt/disk06-live
> 4492427  246556.7 64   21286.7   63.1  ada1p1
> 4494428  246916.9 65   21276.6   66.9  ada2p1
> 8379313  24798   13.5 65   21277.6   78.6  ada3p1
> 5372306  24774   14.2 64   21277.6   77.6  ada4p1
>10355291  24741   15.9 63   21277.5   79.6  ada5p1
> 4380316  24807   13.2 64   21287.8   77.0  ada6p1
> 4492427  246556.8 64   21286.9   63.4  gpt/disk01-live
> 4494428  246916.9 65   21276.8   67.2  gpt/disk02-live
> 8379313  24798   13.5 65   21277.7   78.8  gpt/disk03-live
> 5372306  24774   14.2 64   21277.8   77.8  gpt/disk04-live
>10355291  24741   15.9 63   21277.7   79.8  gpt/disk05-live
> 4380316  24807   13.2 64   21288.0   77.2  gpt/disk07-live
> 
> $ zpool ol iostat 10
>capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> storage 8.08T  4.60T364161  41.7M  7.94M
> storage 8.08T  4.60T926133   112M  5.91M
> storage 8.08T  4.60T738164  89.0M  9.75M
> storage 8.08T  4.60T  1.18K179   146M  8.10M
> storage 8.08T  4.60T  1.09K193   135M  9.94M
> storage 8.08T  4.60T   1010185   122M  8.68M
> storage 8.08T  4.60T  1.06K184   131M  9.65M
> storage 8.08T  4.60T867178   105M  11.8M
> storage 8.08T  4.60T  1.06K198   131M  12.0M
> storage 8.08T  4.60T  1.06K185   131M  12.4M
> 
> Yeterday's write bandwidth was more 80-90M.  It's down, a lot.
> 
> I'll look closer this evening.
> 
> 
> >
> > mm
> >
> > Dňa 4. 10. 2010 4:06, Artem Belevich  wrote / napísal(a):
> >> On Sun, Oct 3, 2010 at 6:11 PM, Dan Langille  wrote:
> >>> I'm rerunning my test after I had a drive go offline[1].  But I'm not
> >>> getting anything like the previous test:
> >>>
> >>> time zfs send storage/bac...@transfer | mbuffer | zfs receive
> >>> storage/compressed/bacula-buffer
> >>>
> >>> $ zpool iostat 10 10
> >>>   capacity operationsbandwidth
> >>> pool used  avail   read  write   read  write
> >>> --  -  -  -  -  -  -
> >>> storage 6.83T  5.86T  8 31  1.00M  2.11M
> >>> storage 6.83T  5.86T207481  25.7M  17.8M
> >>
> >> It may be worth checking individual disk activity using gstat -f 'da.$'
> >>
> >> Some time back I had one drive that was noticeably slower than the
> >> rest of the  drives in RAID-Z2 vdev and was holding everything back.
> >> SMART looked OK, there were no obvious errors and yet performance was
> >> much worse than what I'd expect. gstat clearly showed that one drive
> >> was almost constantly busy with much lower number of reads and writes
> >> per second than its peers.
> >>
> >> Perhaps previously fast transfer rates were due to caching effects.
> >> I.e. if all metadata already made it into ARC, subsequent "zf

Re: out of HDD space - zfs degraded

2010-10-04 Thread Dan Langille

On 10/4/2010 7:19 AM, Alexander Leidinger wrote:

Quoting Dan Langille  (from Sun, 03 Oct 2010 08:08:19
-0400):


Overnight, the following appeared in /var/log/messages:

Oct 2 21:56:46 kraken root: ZFS: checksum mismatch, zpool=storage
path=/dev/gpt/disk06-live offset=123103157760 size=1024
Oct 2 21:56:47 kraken root: ZFS: checksum mismatch, zpool=storage
path=/dev/gpt/disk06-live offset=123103159808 size=1024


[...]


Given the outage from yesterday when ada0 was offline for several
hours, I'm guessing that checksum mismatches on that drive are
expected. Yes, /dev/gpt/disk06-live == ada0.


If you have the possibility to run a scrub of the pool, there should be
no additional checksum errors accouring *after* the scrub is *finished*.
If checksum errors still appear on this disk after the scrub is
finished, you should have a look at the hardware (cable/disk) and take
appropriate replacement actions.


For the record, -just finished a scrub.

$ zpool status
  pool: storage
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.

action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 13h48m with 0 errors on Mon Oct  4 
16:54:15 2010

config:

NAME STATE READ WRITE CKSUM
storage  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
gpt/disk01-live  ONLINE   0 0 0
gpt/disk02-live  ONLINE   0 0 0
gpt/disk03-live  ONLINE   0 0 0
gpt/disk04-live  ONLINE   0 0 0
gpt/disk05-live  ONLINE   0 0 0
gpt/disk06-live  ONLINE   0 0  141K  3.47G repaired
gpt/disk07-live  ONLINE   0 0 0

errors: No known data errors


--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs send/receive: is this slow?

2010-10-04 Thread Dan Langille

On 10/4/2010 2:10 PM, Jeremy Chadwick wrote:

On Mon, Oct 04, 2010 at 01:31:07PM -0400, Dan Langille wrote:


On Mon, October 4, 2010 3:27 am, Martin Matuska wrote:

Try using zfs receive with the -v flag (gives you some stats at the end):
# zfs send storage/bac...@transfer | zfs receive -v
storage/compressed/bacula

And use the following sysctl (you may set that in /boot/loader.conf, too):
# sysctl vfs.zfs.txg.write_limit_override=805306368

I have good results with the 768MB writelimit on systems with at least
8GB RAM. With 4GB ram, you might want to try to set the TXG write limit
to a lower threshold (e.g. 256MB):
# sysctl vfs.zfs.txg.write_limit_override=268435456

You can experiment with that setting to get the best results on your
system. A value of 0 means using calculated default (which is very high).


I will experiment with the above.  In the meantime:


During the operation you can observe what your disks actually do:
a) via ZFS pool I/O statistics:
# zpool iostat -v 1
b) via GEOM:
# gstat -a


The following output was produced while the original copy was underway.

$ sudo gstat -a -b -I 20s
dT: 20.002s  w: 20.000s
  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
 7452387  248019.5 64   21287.1   79.4  ada0
 7452387  248019.5 64   21287.2   79.4  ada0p1
 4492427  246556.7 64   21286.6   63.0  ada1
 4494428  246916.9 65   21276.6   66.9  ada2
 8379313  24798   13.5 65   21277.5   78.6  ada3
 5372306  24774   14.2 64   21277.5   77.6  ada4
10355291  24741   15.9 63   21277.4   79.6  ada5
 4380316  24807   13.2 64   21287.7   77.0  ada6
 7452387  248019.5 64   21287.4   79.7  gpt/disk06-live
 4492427  246556.7 64   21286.7   63.1  ada1p1
 4494428  246916.9 65   21276.6   66.9  ada2p1
 8379313  24798   13.5 65   21277.6   78.6  ada3p1
 5372306  24774   14.2 64   21277.6   77.6  ada4p1
10355291  24741   15.9 63   21277.5   79.6  ada5p1
 4380316  24807   13.2 64   21287.8   77.0  ada6p1
 4492427  246556.8 64   21286.9   63.4  gpt/disk01-live
 4494428  246916.9 65   21276.8   67.2  gpt/disk02-live
 8379313  24798   13.5 65   21277.7   78.8  gpt/disk03-live
 5372306  24774   14.2 64   21277.8   77.8  gpt/disk04-live
10355291  24741   15.9 63   21277.7   79.8  gpt/disk05-live
 4380316  24807   13.2 64   21288.0   77.2  gpt/disk07-live

$ zpool ol iostat 10
capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
storage 8.08T  4.60T364161  41.7M  7.94M
storage 8.08T  4.60T926133   112M  5.91M
storage 8.08T  4.60T738164  89.0M  9.75M
storage 8.08T  4.60T  1.18K179   146M  8.10M
storage 8.08T  4.60T  1.09K193   135M  9.94M
storage 8.08T  4.60T   1010185   122M  8.68M
storage 8.08T  4.60T  1.06K184   131M  9.65M
storage 8.08T  4.60T867178   105M  11.8M
storage 8.08T  4.60T  1.06K198   131M  12.0M
storage 8.08T  4.60T  1.06K185   131M  12.4M

Yeterday's write bandwidth was more 80-90M.  It's down, a lot.

I'll look closer this evening.




mm

Dňa 4. 10. 2010 4:06, Artem Belevich  wrote / napísal(a):

On Sun, Oct 3, 2010 at 6:11 PM, Dan Langille  wrote:

I'm rerunning my test after I had a drive go offline[1].  But I'm not
getting anything like the previous test:

time zfs send storage/bac...@transfer | mbuffer | zfs receive
storage/compressed/bacula-buffer

$ zpool iostat 10 10
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
storage 6.83T  5.86T  8 31  1.00M  2.11M
storage 6.83T  5.86T207481  25.7M  17.8M


It may be worth checking individual disk activity using gstat -f 'da.$'

Some time back I had one drive that was noticeably slower than the
rest of the  drives in RAID-Z2 vdev and was holding everything back.
SMART looked OK, there were no obvious errors and yet performance was
much worse than what I'd expect. gstat clearly showed that one drive
was almost constantly busy with much lower number of reads and writes
per second than its peers.

Perhaps previously fast transfer rates were due to caching effects.
I.e. if all metadata already made it into ARC, subsequent "zfs send"
commands would avoid a lot of random seeks and would show much better
throughput.

--Artem


Please read all of the following items before responding in-line.  Some
are just informational items for other people reading the thread.


Than

Re: Panic: attempted pmap_enter on 2MB page

2010-10-04 Thread Alan Cox

Dave Hayes wrote:

Alan Cox  writes:
  

I'm afraid that I can't offer much insight without a stack trace.  At
initialization time, we map the kernel with 2MB pages.  I suspect that
something within the kernel is later trying to change one those mappings.
If I had to guess, it's related to the mfs root.



Here is the stack trace. The machine is sitting here in KDB if you
need me to extract any information from it. I

  db> bt
  Tracing pid 0 tid 0 td 0x80c67140
  kdb_enter() at kdbenter+0x3d
  panic() at panic+0x17b
  pmap_enter() at pmap_enter+0x641
  kmem_malloc() at kmem_malloc+0x1b5
  uma_large_malloc() at uma_large_malloc+0x4a
  malloc() at malloc+0xd7
  acpi_alloc_wakeup_handler() at acpi_alloc_wakeup_handler+0x82
  mi_startup() at mi_startup+0x59
  btext() at btext+0x2c
  db>

  


Thanks.

There are two pieces of information that might be helpful: the value of 
the global variable "kernel_vm_end" and the virtual address that was 
passed to pmap_enter().


Is this problem reproducible?  I don't recall if you mentioned that earlier.

Can you take a crash dump?

Alan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs send/receive: is this slow?

2010-10-04 Thread Jeremy Chadwick
On Mon, Oct 04, 2010 at 07:39:16PM -0400, Dan Langille wrote:
> On 10/4/2010 2:10 PM, Jeremy Chadwick wrote:
> >On Mon, Oct 04, 2010 at 01:31:07PM -0400, Dan Langille wrote:
> >>
> >>On Mon, October 4, 2010 3:27 am, Martin Matuska wrote:
> >>>Try using zfs receive with the -v flag (gives you some stats at the end):
> >>># zfs send storage/bac...@transfer | zfs receive -v
> >>>storage/compressed/bacula
> >>>
> >>>And use the following sysctl (you may set that in /boot/loader.conf, too):
> >>># sysctl vfs.zfs.txg.write_limit_override=805306368
> >>>
> >>>I have good results with the 768MB writelimit on systems with at least
> >>>8GB RAM. With 4GB ram, you might want to try to set the TXG write limit
> >>>to a lower threshold (e.g. 256MB):
> >>># sysctl vfs.zfs.txg.write_limit_override=268435456
> >>>
> >>>You can experiment with that setting to get the best results on your
> >>>system. A value of 0 means using calculated default (which is very high).
> >>
> >>I will experiment with the above.  In the meantime:
> >>
> >>>During the operation you can observe what your disks actually do:
> >>>a) via ZFS pool I/O statistics:
> >>># zpool iostat -v 1
> >>>b) via GEOM:
> >>># gstat -a
> >>
> >>The following output was produced while the original copy was underway.
> >>
> >>$ sudo gstat -a -b -I 20s
> >>dT: 20.002s  w: 20.000s
> >>  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
> >> 7452387  248019.5 64   21287.1   79.4  ada0
> >> 7452387  248019.5 64   21287.2   79.4  ada0p1
> >> 4492427  246556.7 64   21286.6   63.0  ada1
> >> 4494428  246916.9 65   21276.6   66.9  ada2
> >> 8379313  24798   13.5 65   21277.5   78.6  ada3
> >> 5372306  24774   14.2 64   21277.5   77.6  ada4
> >>10355291  24741   15.9 63   21277.4   79.6  ada5
> >> 4380316  24807   13.2 64   21287.7   77.0  ada6
> >> 7452387  248019.5 64   21287.4   79.7  
> >> gpt/disk06-live
> >> 4492427  246556.7 64   21286.7   63.1  ada1p1
> >> 4494428  246916.9 65   21276.6   66.9  ada2p1
> >> 8379313  24798   13.5 65   21277.6   78.6  ada3p1
> >> 5372306  24774   14.2 64   21277.6   77.6  ada4p1
> >>10355291  24741   15.9 63   21277.5   79.6  ada5p1
> >> 4380316  24807   13.2 64   21287.8   77.0  ada6p1
> >> 4492427  246556.8 64   21286.9   63.4  
> >> gpt/disk01-live
> >> 4494428  246916.9 65   21276.8   67.2  
> >> gpt/disk02-live
> >> 8379313  24798   13.5 65   21277.7   78.8  
> >> gpt/disk03-live
> >> 5372306  24774   14.2 64   21277.8   77.8  
> >> gpt/disk04-live
> >>10355291  24741   15.9 63   21277.7   79.8  
> >> gpt/disk05-live
> >> 4380316  24807   13.2 64   21288.0   77.2  
> >> gpt/disk07-live
> >>
> >>$ zpool ol iostat 10
> >>capacity operationsbandwidth
> >>pool used  avail   read  write   read  write
> >>--  -  -  -  -  -  -
> >>storage 8.08T  4.60T364161  41.7M  7.94M
> >>storage 8.08T  4.60T926133   112M  5.91M
> >>storage 8.08T  4.60T738164  89.0M  9.75M
> >>storage 8.08T  4.60T  1.18K179   146M  8.10M
> >>storage 8.08T  4.60T  1.09K193   135M  9.94M
> >>storage 8.08T  4.60T   1010185   122M  8.68M
> >>storage 8.08T  4.60T  1.06K184   131M  9.65M
> >>storage 8.08T  4.60T867178   105M  11.8M
> >>storage 8.08T  4.60T  1.06K198   131M  12.0M
> >>storage 8.08T  4.60T  1.06K185   131M  12.4M
> >>
> >>Yeterday's write bandwidth was more 80-90M.  It's down, a lot.
> >>
> >>I'll look closer this evening.
> >>
> >>
> >>>
> >>>mm
> >>>
> >>>Dňa 4. 10. 2010 4:06, Artem Belevich  wrote / napísal(a):
> On Sun, Oct 3, 2010 at 6:11 PM, Dan Langille  wrote:
> >I'm rerunning my test after I had a drive go offline[1].  But I'm not
> >getting anything like the previous test:
> >
> >time zfs send storage/bac...@transfer | mbuffer | zfs receive
> >storage/compressed/bacula-buffer
> >
> >$ zpool iostat 10 10
> >   capacity operationsbandwidth
> >pool used  avail   read  write   read  write
> >--  -  -  -  -  -  -
> >storage 6.83T  5.86T  8 31  1.00M  2.11M
> >storage 6.83T  5.86T207481  25.7M  17.8M
> 
> It may be worth checking individual disk activity using gstat -f 'da.$'
> 
> Some time back I had one drive that was noticeably slower than the
> rest of the  drives in RAID-Z2 vdev and was holding everything back.
> SMART looked OK, there were no obvious errors and yet perfor