[zfs-discuss] Re: Preferred backup mechanism for ZFS?

2007-04-20 Thread Anton B. Rang
> Initially I wanted a way to  do a dump to tape like ufsdump.  I
> don't know if this makes sense anymore because the tape market is
> crashing slowly.

It makes sense if you need to keep backups for more than a handful of years 
(think regulatory requirements or scientific data), or if cost is important. 
Storing tape is much cheaper than keeping disks running. (Storing disks isn't 
practical over long periods of time; not only does the signal on the media 
degrade, but so do some components.)

> People just don't backup 300MB per night anymore. We
> are looking at terabytes of data and I don't know how
> to backup a terabyte a night.

If you're actually generating a terabyte per day of data, I'm impressed.  :-)

Tape seems a reasonable way to back that up, in any case. A T1 stores 500 
GB on each tape and runs at 120 MB/sec, so a terabyte would take roughly 2.5 
hours to backup with a single tape drive. LTO-4 is in the same ballpark. Of 
course, that assumes your disk system can keep up.

The SAM-QFS approach of continuous archiving makes a lot of sense here since it 
effectively lets backups run continuously. I don't know how much Sun can say 
about the work going on to add SAM to ZFS.

> Or a really big question that I guess I have to ask, do we even care anymore?

If we're serious about disaster recovery, we do.

In particular, remote replication is NOT a substitute for backups.

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Bottlenecks in building a system

2007-04-20 Thread Anton B. Rang
If you're using this for multimedia, do some serious testing first. ZFS tends 
to have "bursty" write behaviour, and the worst-case latency can be measured in 
seconds. This has been improved a bit in recent builds but it still seems to 
"stall" periodically.

(QFS works extremely well for streaming, as evidenced in recent Sun press 
releases, but I'm not sure what the cost is these days.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Multi-tera, small-file filesystems

2007-04-20 Thread Anton B. Rang
You should definitely worry about the number of files when it comes to backup & 
management. It will also make a big difference in space overhead.

A ZFS filesystem with 2^35 files will have a minimum of 2^44 bytes of overhead 
just for the file nodes, which is about 16 TB.

If it takes about 20 ms for the overhead to backup a file (2 seeks), then 2^35 
files will take 21 years to back up.  ;-)

I'm guessing you didn't really mean 2^35, though. (If you did, you're likely to 
need a system along the lines of DARPA's HPCS program)

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random

2007-04-20 Thread Constantin Gonzalez
Hi,

>> 2. After going through the zfs-bootification, Solaris complains on
>> reboot that
>>/etc/dfs/sharetab is missing. Somehow this seems to have been
>> fallen through
>>the cracks of the find command. Well, touching /etc/dfs/sharetab
>> just fixes
>>the issue.
> 
> This is unrelated to ZFS boot issues, and sounds like this bug:
> 
> 6542481 No sharetab after BFU from snv_55
> 
> It's fixed in build 62.

hmm, that doesn't fit what I saw:

- Upgraded from snv_61 to snv_62
- snv_62 booted with not problems (other than the t_optmgmt bug)
- Then migrated to ZFS boot
- Now the sharetab issues shows up.

So why did the sharetab issue only show up after the ZFSification of the
boot process?

Best regards,
Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup mechanism for ZFS?

2007-04-20 Thread Tim Thomas

Hi Wee


I  run a setup of SAM-FS for our main file server and we loved the
backup/restore parts that you described.

That is great to hear.


The main concerns I have with SAM fronting the entire conversation is
data integrity. Unlike ZFS, SAMFS does not do end to end checksumming.
My initial reaction is that the world has got by without file systems 
that can do this for a long time...so I don't see the absence of this as 
a big deal. On the other hand, it hard to argue against a feature that 
improves data integrity, so I will not. Anyway, SAM-FS has been enhanced 
in this respect...in SAM-FS 4.6 you can enable the following...


/If required, you can enable data verification for archive copies. This 
feature checks
for data corruption on any data that is copied to secondary and/or 
tertiary media.
The data verification process performs a read-after-write verification 
test, and
records a confirmation of data validity in the metadata properties for 
that file.
An ssum option is used to mark files and directories as needing to be 
verified. Child
directories inherit the data verification properties of their parent. 
The normal
checksum method is employed to verify copies written to tape or disk 
archive.
Use the ssum -e command to set data verification for a file or 
directory. This forces
the generation and use of checksums for archiving and staging, and 
prevents the
release of the file until all archive copies have been created and their 
checksums

verified. Only a superuser can set this attribute on a file or directory./

This is  taken from the Sun StorageTek SAM Archive Configuration and 
Administration Guide Version 4, Update 6 (SAM-FS 4.6 was released April 
6th). You can get all the SAM-FS 4.6 docs from here...


http://www.sun.com/products-n-solutions/hardware/docs/Software/Storage_Software/Sun_SAM-FS_and_Sun_SAM-QFS_Software/index.html

This checksum model is different than ZFS and is more like the way a 
backup product verifies its backups.


We have considered the setup you proposed (samfs copy1 -> ZFS) but you
will run into problem with fs-cache.  Being only a copy, ZFS probably
do not need much caching but will win the battle for memory due to the
way its cache is managed.  Unless there is a visible memory shortfall,
ZFS will starve (sorry guys) samfs from memory it could use as cache.
Also, ZFS's data integrity feature is limited by the use of 2nd hand
data.

I don't know enough about how ZFS manages memory other than what I have 
seen on this alias (I just joined a couple of weeks ago) which seems to 
indicate it is a  memory hog...as is VxFS so we are in good company. I 
am not against keeping data in memory so long as it has also been 
written to somewhere non-volatile as well so that data is not lost if 
the lights go out... and applications don't fight for memory to run. I 
recall stories from years ago where VxFS hogged so much memory on a Sun 
Cluster node that the Cluster services stalled and the cluster failed over!


I need to go read some white papers on this...but I assume that 
something like direct I/O (which UFS, VxFS and QFS all have) is in the 
plans for ZFS so we don't end up double buffering data for apps like 
databases ? - that is just ugly.


Rgds

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bottlenecks in building a system

2007-04-20 Thread Ian Collins
Adam Lindsay wrote:

> In asking about ZFS performance in streaming IO situations, discussion
> quite quickly turned to potential bottlenecks. By coincidence, I was
> wondering about the same thing.
>
> Richard Elling said:
>
>> We know that channels, controllers, memory, network, and CPU bottlenecks
>> can and will impact actual performance, at least for large configs.
>> Modeling these bottlenecks is possible, but will require more work in
>> the tool.  If you know the hardware topology, you can do a
>> back-of-the-napkin
>> analysis, too.
>
>
> Well, I'm normally a Mac guy, so speccing server hardware is a bit of
> a revelation for me. I'm trying to come up with a ZFS storage server
> for a networked multimedia research project which hopefully has enough
> oomph to be a nice resource that outlasts the (2-year) project, but
> without breaking the bank.
>
> Does anyone have a clue as to where the bottlenecks are going to be
> with this:
>
> 16x hot swap SATAII hard drives (plus an internal boot drive)
> Tyan S2895 (K8WE) motherboard
> Dual GigE (integral nVidia ports)
> 2x Areca 8-port PCIe (8-lane) RAID drivers
> 2x AMD Opteron 275 CPUs (2.2GHz, dual core)
> 8 GiB RAM
>
I'm putting together a similar specified machine (Quad-FX with 8GB RAM),
but fewer drives.  If there any specific tests you want me to run on it
while it's still on my bench, drop me a line.

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[4]: [zfs-discuss] Preferred backup mechanism for ZFS?

2007-04-20 Thread Robert Milkowski
Hello Wee,

Friday, April 20, 2007, 5:20:00 AM, you wrote:

WYT> On 4/20/07, Robert Milkowski <[EMAIL PROTECTED]> wrote:
>> You can limit how much memory zfs can use for its caching.
>>

WYT> Indeed, but that memory will still be locked.  How can you tell the
WYT> system to be "flexible" with the caching?

It shouldn't be locked but in reality it can.

WYT> I deem that archiving will not present a cache challenge but we will
WYT> want zfs to prefetch (or do whatever magic) when staging in files.  We
WYT> do not want to limit ZFS's cache but we want to tell the system to
WYT> prefer SAMFS's cache to ZFS's.

I don't know how SAM-FS works (I've never used it) but I'm surprised
that you started paging via using swap - or perhaps you meant
something else.

If qfs uses standard page cache perhaps increasing segmap would also
help. It's still static however.

By limit ZFS arc you are not disabling prefetching or any other
features.

First I would be most interested what exactly happened when your
server started to crawl 'coz you can't "swap out" page cache or arc
cache...





-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Permanently removing vdevs from a pool

2007-04-20 Thread Robert Milkowski
Hello George,

Friday, April 20, 2007, 7:37:52 AM, you wrote:

GW> This is a high priority for us and is actively being worked.

GW> Vague enough for you. :-) Sorry I can't give you anything more exact 
GW> that that.

Can you at least give us feature list being developed?

Some answers to questions like:

1. evacuating a vdev resulting in a smaller pool for all raid configs - ?

2. adding new vdev and rewriting all existing data to new larger
   stripe - ?

3. expanding stripe width for raid-z1 and raid-z2 - ?

4. live conversion between different raid kinds on the same disk set - ?

5. live data migration from one disk set to another - ?
   [if 1 works it should be simple - first force adding new disks,
even if with different redundancy scheme then evacuate old disks.
This also partly solves 5 but you need different disks.]

6. rewriting data in a dataset (not entire pool) after changing some
   parameters like compression, encryption, ditto blocks, ... so it
   will affect also already written data in a dataset. This should be
   both pool wise and data set wise - ?

7. de-fragmentation of a pool - ?

8. anything else ?


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Preferred backup mechanism for ZFS?

2007-04-20 Thread Robert Milkowski
Hello Anton,

Friday, April 20, 2007, 9:02:12 AM, you wrote:

>> Initially I wanted a way to  do a dump to tape like ufsdump.  I
>> don't know if this makes sense anymore because the tape market is
>> crashing slowly.

ABR> It makes sense if you need to keep backups for more than a
ABR> handful of years (think regulatory requirements or scientific
ABR> data), or if cost is important. Storing tape is much cheaper than
ABR> keeping disks running. (Storing disks isn't practical over long
ABR> periods of time; not only does the signal on the media degrade, but so do 
some components.)

>> People just don't backup 300MB per night anymore. We
>> are looking at terabytes of data and I don't know how
>> to backup a terabyte a night.

ABR> If you're actually generating a terabyte per day of data, I'm impressed.  
:-)

ABR> Tape seems a reasonable way to back that up, in any case. A
ABR> T1 stores 500 GB on each tape and runs at 120 MB/sec, so a
ABR> terabyte would take roughly 2.5 hours to backup with a single
ABR> tape drive. LTO-4 is in the same ballpark. Of course, that
ABR> assumes your disk system can keep up.

ABR> The SAM-QFS approach of continuous archiving makes a lot of
ABR> sense here since it effectively lets backups run continuously. I
ABR> don't know how much Sun can say about the work going on to add SAM to ZFS.

>> Or a really big question that I guess I have to ask, do we even care anymore?

ABR> If we're serious about disaster recovery, we do.

ABR> In particular, remote replication is NOT a substitute for backups.

I can't entirely agree - it really depends.
If you do remote replication and also provide snapshoting it will work
extremely well. And your "restore" would be MUCH more efficient than
from tape. Then if your primary array is down you just switch to
secondary - depending on environment it could be all you need.
With tapes not only you will have to wait for restore you also need a
working array so you have a place to restore.

Of course if you need to take your backup outside then that's
different.

I'm really disappointed that our try at adding zfs async replication
hasn't worked out. We'll have to settle with 'while [ 1 ]; do
snapshot; zfs send -i | zfs recv ; sleep 10s; done' ...


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help me understand ZFS caching

2007-04-20 Thread Tony Galway
I have a few questions regarding ZFS, and would appreciate if someone could 
enlighten me as I work my way through.

First write cache.

If I look at traditional UFS / VxFS type file systems, they normally cache 
metadata to RAM before flushing it to disk. This helps increase their perceived 
write performance (perceived in the sense that if a power outage occurs, data 
loss can occur).

ZFS on the other hand, performs copy-on-write to ensure that the disk is always 
consistent, I see this as sort of being equivalent to using a directio option. 
I understand that the data is written first, then the points are updated, but 
if I were to use the directio analogy, would this be correct?

If that is the case, then is it true that ZFS really does not use a write cache 
at all? And if it does, then how is it used?

Read Cache.

Any of us that have started using or benchmakring ZFS, have seen its voracious 
appetite for memory, an appetite that is fully shared with VxFS for example, as 
I am not singling out ZFS (I'm rather a fan).  On reboot of my T2000 test 
server (32GB Ram) I see that the arc cache max size is set to 30.88GB - a 
sizeable piece of memory. 

Now, is all that cache space only for read cache? (given my assumption 
regarding write cache)

Tuneable Parameters:
I know that the philosophy of ZFS is that you should never have to tune your 
file system, but  might I suggest, that tuning the FS is not always a bad 
thing. You can't expect a FS to be all things for all people. If there are 
variables that can be modified to provide different performance characteristics 
and profiles, then I would contend that it could strengthen ZFS and lead to 
wider adoption and acceptance if you could, for example, limit the amount of 
memory used by items like the cache without messing with c_max / c_min directly 
in the kernel.

-Tony
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status -v

2007-04-20 Thread eric kustarz


On Apr 19, 2007, at 12:50 PM, Ricardo Correia wrote:


eric kustarz wrote:

Two reasons:
1) cluttered the output (as the path name is variable length).  We
could perhaps add another flag (-V or -vv or something) to display  
the

ranges.
2) i wasn't convinced that output was useful, especially to most
users/admins.

If we did provide the range information, how would you actually use
that information?

or would providing the number of checksum errors per file be what
you're really looking for?



I agree that the current display is more appropriate as a default.
But yes, I think adding a -vv flag to show the range output would be
useful. It seems interesting from an observability standpoint, since I
could easily tell how much damage did the file get. Simply telling the
number of checksum errors per file would be useful too, but not as
useful it was since each checksum error can be between 512 bytes  
and 128 KB.


I agree it would be interesting (especially for us developers).  What  
i'm curious is (and anyone can answer), what action would you take  
(or not take) based on this additional information?




ps: could you send me the 'zpool status -v' output for curiosity's  
sake

Sure :)


thanks...

eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: storage type for ZFS

2007-04-20 Thread eric kustarz


Has an analysis of most common storage system been done on how they
treat SYNC_NV bit and if any additional tweaking is needed? Would such
analysis be publicly available?



I am not aware of any analysis and would love to see it done (i'm  
sure any vendors who are lurking on this list that support the  
SYNC_NV would surely want to speak up now).


Due to not every vendor not supporting SYNC_NV, our solution is to  
first see if SYNC_NV is supported and if not, then provide a config  
file (as a short term necessity) that you can hardcore certain  
products to act as if they support SYNC_NV (which we would then not  
send a flushing of the cache).  If the SYNC_NV bit is not supported  
and the config file is not updated for the device, then we do what we  
do today.


But if anyone knows for certain if a particular device supports  
SYNC_NV, please post...


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread Tony Galway
Let me elaborate slightly on the reason I ask these questions.

I am performing some simple benchmarking, and during this a file is created by 
sequentially writing 64k blocks until the 100Gb file is created. I am seeing, 
and this is the exact same as VxFS, large pauses while the system reclaims the 
memory that it has consumed.

I assume that since ZFS (back to the write cache question) is copy-on-write and 
is not write caching anything (correct me if I am wrong), it is instead using 
memory for my read-cache. Also, since I have 32Gb of memory the reclaim periods 
are quite long while it frees this memory - basically rendering my volume 
unusable until that memory is reclaimed.

With VxFS I was able to tune the file system with write_throttle, and this 
allowed me to find a balance basically whereby the system writes crazy fast, 
and then reclaims memory, and repeats that cycle.

I guess I could modify c_max in the kernel, to provide the same type of result, 
but this is not a supported tuning practice - and thus I do not want to do that.

I am simply trying to determine where ZFS is different, the same, and where how 
I can modify its default behaviours (or if I ever will).

Also, FYI, I'm testing on Solaris 10 11/06 (All testing must be performed in 
production versions of Solaris) but if there are changes in Nevada that will 
show me different results, I would be interested in those as an aside.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread Anton B. Rang
ZFS uses caching heavily as well; much more so, in fact, than UFS.

Copy-on-write and direct i/o are not related. As you say, data gets written 
first, then the metadata which points to it, but this isn't anything like 
direct I/O. In particular, direct I/O avoids caching the data, instead 
transferring it directly to/from user buffers, while ZFS-style copy-on-write 
caches all data. ZFS does not have direct I/O at all right now.

One key difference between UFS & ZFS is that ZFS flushes the drive's write 
cache at key points. (It does this rather than using ordered commands, even on 
SCSI disks, which to me is a little disappointing.) This guarantees that the 
data is on-disk before the associated metadata. UFS relies on keeping the write 
cache disabled to ensure that its journal is written to disk before its 
metadata, again with the goal of keeping the file system consistent at all 
times.

I agree with you on tuning. It's clearly desirable that the "out-of-box" 
settings for a file system work well for "general purpose" loads; but there are 
almost always applications which require a different strategy. This is much of 
why UFS/QFS/VxFS added direct i/o, and it's why VxFS (which focuses heavily on 
database) added quick i/o.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread Tony Galway
Anton & Roch,

Thank you for helping me understand this. I didn't want to make too many 
assumptions that were unfounded and then incorrectly relay that information 
back to clients.

So if I might just repeat your statements, so my slow mind is sure it 
understands, and Roch, yes your assumption is correct that I am referencing 
File System Cache, not disk cache.

A. Copy-on-write exists solely to ensure on disk data integrity, and as Anton 
pointed out it is completely different than DirectIO.

b. ZFS still avail's itself of a file system cache, and therefore, it is 
possible that data can be lost if it hasn't been written to disk and the server 
fails.

c. The write throttling issue is known, and being looked at - when it is fixed 
we don't know?  I'll add myself to the notification list as an interested party 
:)

Now to another question related to Anton's post. You mention that directIO does 
not exist in ZFS at this point. Are their plan's to support DirectIO; any 
functionality that will simulate directIO or some other non-caching ability 
suitable for critical systems such as databases if the client still wanted to 
deploy on filesystems.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Preferred backup mechanism for ZFS?

2007-04-20 Thread Anton B. Rang
To clarify, there are at least two issues with remote replication vs. backups 
in my mind. (Feel free to joke about the state of my mind!  ;-)

The first, which as you point out can be alleviated with snapshots, is the 
ability to "go back" in time. If an accident wipes out a file, the missing file 
will shortly be deleted on the remote end. Snapshots help you here ... as long 
as you can keep sufficient space online. If your turnover is 1 TB/day and you 
require the ability to go back to the end of any week in the past year, that's 
52 TB.

The second is protection against file system failures. If a bug in file system 
code, or damage to the metadata structures on disk, results in the master being 
unreadable, then it could easily be replicated to the remote system. (Consider 
a bug which manifests itself only when 10^9 files have been created; both file 
systems will shortly fail.) Keeping backups in a file system independent manner 
(e.g. tar format, netbackup format, etc.) protects against this.

If you're not concerned about the latter, and you can afford to keep all of 
your backups on rotating rust (and have sufficient CPU & I/O bandwidth at the 
remote site to scrub those backups), and have sufficient bandwidth to actually 
move data between sites (for 1 TB/day, assuming continuous modification, that's 
11 MB/second if data is never rewritten during the day, or potentially much 
more in a real environment) then remote replication could work.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [nfs-discuss] NFSd and dtrace

2007-04-20 Thread eric kustarz


On Apr 18, 2007, at 9:33 PM, Robert Milkowski wrote:


Hello Robert,

Thursday, April 19, 2007, 1:57:38 AM, you wrote:

RM> Hello nfs-discuss,

RM>   Does anyone have a dtrace script (or any other means) to  
track which
RM>   files are open/read/write (ops and bytes) by nfsd? To make  
things

RM>   little bit harder lets assume that local storage in on zfs, nfsd
RM>   server using nfsv3 and system is S10U3.

RM>   The script would distinguish between cache read and disk read.

RM>   So something like:

RM>   ./nfsd_file.d

RM>   CLIENT_IP   OPERATION   BYTES   TYPE   FILE
RM>   X.X.X.X   READ3 32768   logical/nfs/d1000/fileA

RM>   ...



RM>   and something like:

RM>   ./nfsd_file_summ.d 100s

RM>   CLIENT_IP   OPERATION   OPSBYTES   TYPE   FILE
RM>   X.X.X.X   READ3 230 5MBlogical/nfs/d1000/ 
fileA
RM>   X.X.X.X   READ3  15 1MBphysical   /nfs/d1000/ 
fileA


RM>   ...

RM>
RM>
RM>


Looks like vopstat and rfileio from DTrace toolkit is what I'm looking
for (with some modifications)



very cool, would you mind posting your dscript when you get it working?

eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Preferred backup mechanism for ZFS?

2007-04-20 Thread Robert Milkowski
Hello Anton,

Friday, April 20, 2007, 3:54:52 PM, you wrote:

ABR> To clarify, there are at least two issues with remote
ABR> replication vs. backups in my mind. (Feel free to joke about the state of 
my mind!  ;-)

ABR> The first, which as you point out can be alleviated with
ABR> snapshots, is the ability to "go back" in time. If an accident
ABR> wipes out a file, the missing file will shortly be deleted on the
ABR> remote end. Snapshots help you here ... as long as you can keep
ABR> sufficient space online. If your turnover is 1 TB/day and you
ABR> require the ability to go back to the end of any week in the past year, 
that's 52 TB.

Really depends. With ZFS snapshots in order to consume 1TB by snapshot
you would have deleted 1TB of files or make 1TB modification to files
(or both with 1TB in SUM). There certainly are such workload.
But if you just put new data (append to files, or write new files)
then snapshots practically won't consume any storage. In that case it
works perfectly.


ABR> The second is protection against file system failures. If a bug
ABR> in file system code, or damage to the metadata structures on
ABR> disk, results in the master being unreadable, then it could
ABR> easily be replicated to the remote system. (Consider a bug which
ABR> manifests itself only when 10^9 files have been created; both
ABR> file systems will shortly fail.) Keeping backups in a file system
ABR> independent
ABR> ABR>  manner (e.g. tar format, netbackup format, etc.) protects against 
this.

Lets say I agree. :)


ABR> If you're not concerned about the latter, and you can afford to
ABR> keep all of your backups on rotating rust (and have sufficient
ABR> CPU & I/O bandwidth at the remote site to scrub those backups),
ABR> and have sufficient bandwidth to actually move data between sites
ABR> (for 1 TB/day, assuming continuous modification, that's 11
ABR> MB/second if data is never rewritten during the day, or
ABR> potentially much more in a real environment) then remote replication could 
work.

You need exactly the same bandwidth as with any other classical backup
solution - it doesn't matter how at the end you need to copy all those
data (differential) out of the box regardless if it's a tape or a
disk.

However instead of doing backup during the night, which you want to do
so there will be limited impact on production performance, with
replication you can do it continuously 24x7. The actual performance
impact will be minimal as you should get most data from memory without
touching much of disks on sending side. That also means you actually
need much less throughput available to remote side. Also with frequent
enough snapshoting you have your backup basically every 30 minutes or
every one hour.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[4]: [zfs-discuss] Re: storage type for ZFS

2007-04-20 Thread Robert Milkowski
Hello eric,

Friday, April 20, 2007, 3:36:20 PM, you wrote:

>>
>> Has an analysis of most common storage system been done on how they
>> treat SYNC_NV bit and if any additional tweaking is needed? Would such
>> analysis be publicly available?
>>

ek> I am not aware of any analysis and would love to see it done (i'm  
ek> sure any vendors who are lurking on this list that support the  
ek> SYNC_NV would surely want to speak up now).

ek> Due to not every vendor not supporting SYNC_NV, our solution is to  
ek> first see if SYNC_NV is supported and if not, then provide a config  
ek> file (as a short term necessity) that you can hardcore certain  
ek> products to act as if they support SYNC_NV (which we would then not  
ek> send a flushing of the cache).  If the SYNC_NV bit is not supported  
ek> and the config file is not updated for the device, then we do what we
ek> do today.

ek> But if anyone knows for certain if a particular device supports  
ek> SYNC_NV, please post...

Why config file and not a property for a pool?
A pool can have disks from different arrays :)

Useful thing would be to ba able to keep that config file in a pool so
if one exports/imports to different server... you get the idea.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help me understand ZFS caching

2007-04-20 Thread Roch - PAE

Tony Galway writes:

 > I have a few questions regarding ZFS, and would appreciate if someone
 > could enlighten me as I work my way through. 
 > 
 > First write cache.
 > 

We often use write cache to designate the cache present at
the disk level. Lets call this "disk write cache".
Most FS will cache information in host memory. Let's call
this "FS cache". I think your questions are more about FS
cache behavior for differnet types of loads


 > If I look at traditional UFS / VxFS type file systems, they normally
 > cache metadata to RAM before flushing it to disk. This helps increase
 > their perceived write performance (perceived in the sense that if a
 > power outage occurs, data loss can occur). 
 > 

Correct and application can influence the behavior this with
O_DSYNC,Fsync...

 > ZFS on the other hand, performs copy-on-write to ensure that the disk
 > is always consistent, I see this as sort of being equivalent to using
 > a directio option. I understand that the data is written first, then
 > the points are updated, but if I were to use the directio analogy,
 > would this be correct? 

As pointed out by Anton. That's a no here.
The COW ensures that ZFS is always consistent but it's not
really related to application consistency (that's the job of
O_DSYNC,fsync)...

So ZFS caches data on writes like most FS.

 > 
 > If that is the case, then is it true that ZFS really does not use a
 > write cache at all? And if it does, then how is it used? 
 > 

you write to cache and every 5 seconds, all the dirty data
if shipped to disk in a transaction group. On low memory we
also will not wait for the 5 second clock to hit and issue a 
txg.

The problem you and many face, is lack of write throttling.
This is being worked on and should be fix I hope soon.
The perception that ZFS is Ram hungry will have to be
reevaluated at that time.
See:
6429205 each zpool needs to monitor it's  throughput and throttle heavy 
writers


 > Read Cache.
 > 
 > Any of us that have started using or benchmakring ZFS, have seen its
 > voracious appetite for memory, an appetite that is fully shared with
 > VxFS for example, as I am not singling out ZFS (I'm rather a fan).  On
 > reboot of my T2000 test server (32GB Ram) I see that the arc cache max
 > size is set to 30.88GB - a sizeable piece of memory.  
 > 
 > Now, is all that cache space only for read cache? (given my assumption
 > regarding write cache) 
 > 
 > Tuneable Parameters:
 > I know that the philosophy of ZFS is that you should never have to
 > tune your file system, but  might I suggest, that tuning the FS is not
 > always a bad thing. You can't expect a FS to be all things for all
 > people. If there are variables that can be modified to provide
 > different performance characteristics and profiles, then I would
 > contend that it could strengthen ZFS and lead to wider adoption and
 > acceptance if you could, for example, limit the amount of memory used
 > by items like the cache without messing with c_max / c_min directly in
 > the kernel. 
 > 

Once we have write throttling, we will be better equipped to 
see if the ARC dynamical adjustments works or not. I believe 
most problems will go away and there will be less demand for 
such a tunable...

On to your next mail...


 > -Tony
 >  
 >  
 > This message posted from opensolaris.org
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bottlenecks in building a system

2007-04-20 Thread Adam Lindsay

Richard Elling wrote:

Does anyone have a clue as to where the bottlenecks are going to be 
with this:


16x hot swap SATAII hard drives (plus an internal boot drive)


Be sure to check the actual bandwidth of the drives when installed in the
final location.  We have been doing some studies on the impact of vibration
on performance and reliability.  If your enclosure does not dampen 
vibrations,
then you should see reduced performance, and it will be obvious for 
streaming
workloads.  There was a thread about this a year or so ago regarding 
thumpers,

but since then we've seen it in a number of other systems, too.  There have
also been industry papers on this topic.


Okay, we have a number of the chassis installed here from the same 
source, but none seem to share the high-throughput workflow, so that's 
one thing to quiz the integrator on.



Tyan S2895 (K8WE) motherboard
Dual GigE (integral nVidia ports)


All I can add to the existing NIC comments in this thread is that 
Neptune kicks

ass.  The GbE version is:
   
http://www.sun.com/products/networking/ethernet/sunx8quadgigethernet/index.xml 


... but know that I don't set pricing :-0


Oh, man, I didn't need to know about that NIC. Actually, it's something 
to shoot for.



2x Areca 8-port PCIe (8-lane) RAID drivers


I think this is overkill.


I'm getting convinced of that. With the additional comments in this 
thread, I'm now seriously considering replacing these PCIe cards with 
Supermicro's PCI-X cards, and switching over to a different Tyan board...


- 2x SuperMicro AOC-SAT2-MV8 PCI-X SATA2 interfaces
- Tyan S2892 (K8SE) motherboard, so that ditches nvidia for:
- Dual GigE (integral Broadcom ports)


2x AMD Opteron 275 CPUs (2.2GHz, dual core)


This should be a good choice.  For high networking loads, you can burn a 
lot

of cycles handling the NICs.  For example, using Opterons to drive the dual
10GbE version of Neptune can pretty much consume a significant number of 
cores.

I don't think your workload will come close to this, however.


No, but it's something to shoot for. :)


8 GiB RAM


I recommend ECC memory, not the cheap stuff... but I'm a RAS guy.


So noted.


Pretty much any SAS/SATA controller will work ok.  You'll be media speed
bound, not I/O channel bound.


Okay, that message is coming through.


RAM as a cache presumes two things: prefetching and data re-use.  Most
likely, you won't have re-use and prefetching only makes sense when the
disk subsystem is approximately the same speed as the network.  Personally,
I'd start at 2-4 GBytes and expand as needed (this is easily measured)


I'll start with 4GBytes, because I like to deploy services in 
containers, and so will need some elbow room.


Many thanks to all in this thread: my spec has certainly evolved, and I 
hope the machine has gotten cheaper in the process, with little 
sacrifice in theoretical performance.


adam
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive question

2007-04-20 Thread Constantin Gonzalez
Hi,

Krzys wrote:
> Ok, so -F option is not in U3, is there any way to replicate file system
> and not be able to mount it automatically? so when I do zfs send/receive
> it wont be mounted and changes would not be made so that further
> replications could be possible? What I did notice was that if I am doing
> zfs send/receive right one after another I am able to replicate all my
> snaps, but when I wait a day or even few hours I get notice that file
> system got changed, and that is because it was mounted and I guess
> because of that I am not able to perform any more snaps to be send...
> any idea what I could do meanwhile I am waiting for -F?

this should work:

  zfs unmount pool/filesystem
  zfs rollback (latest snapshot)
  zfs send ... | zfs receive
  zfs mount pool/filesystem

Better yet: Assuming you don't actually want to use the filesystem you
replicate to, but just use it as a sink for backup purposes, you can mark
it unmountable, then just send stuff to it.

  zfs set canmount=off pool/filesystem
  zfs rollback (latest snapshot, one last time)

Then, whenever you want to access the receiving filesystem, clone it.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread eric kustarz


On Apr 20, 2007, at 10:47 AM, Anton B. Rang wrote:


ZFS uses caching heavily as well; much more so, in fact, than UFS.

Copy-on-write and direct i/o are not related. As you say, data gets  
written first, then the metadata which points to it, but this isn't  
anything like direct I/O. In particular, direct I/O avoids caching  
the data, instead transferring it directly to/from user buffers,  
while ZFS-style copy-on-write caches all data. ZFS does not have  
direct I/O at all right now.


You're context is correct, but i'd be careful with "direct I/O", as i  
think its an overloaded term that most people don't understand what  
it does - just that it got them good performance (somehow).  Roch has  
a blog on this:

http://blogs.sun.com/roch/entry/zfs_and_directio

But you are correct that ZFS does not have the ability for the user  
to say "don't cache user data for this filesystem" (which is one part  
of direct I/O).


I've talked to some database people and they aren't convinced having  
this feature would be a win.  So if someone has a real world workload  
where having the ability to purposely not cache user data would be a  
win, please let me know.


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Bottlenecks in building a system

2007-04-20 Thread Adam Lindsay

Anton B. Rang wrote:

If you're using this for multimedia, do some serious testing first. ZFS tends to have 
"bursty" write behaviour, and the worst-case latency can be measured in seconds. This has 
been improved a bit in recent builds but it still seems to "stall" periodically.


I had wondered about that, after reading some old threads. For the 
high-performance stuff, the machine is mostly to be marked as 
experimental and will spend most of its time being "tested".


I'm watching Tony Galway's current thread most closely, as well.

adam
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re[2]: [nfs-discuss] NFSd and dtrace

2007-04-20 Thread Robert Milkowski
Hello eric,

Friday, April 20, 2007, 4:01:46 PM, you wrote:

ek> On Apr 18, 2007, at 9:33 PM, Robert Milkowski wrote:

>> Hello Robert,
>>
>> Thursday, April 19, 2007, 1:57:38 AM, you wrote:
>>
>> RM> Hello nfs-discuss,
>>
>> RM>   Does anyone have a dtrace script (or any other means) to  
>> track which
>> RM>   files are open/read/write (ops and bytes) by nfsd? To make  
>> things
>> RM>   little bit harder lets assume that local storage in on zfs, nfsd
>> RM>   server using nfsv3 and system is S10U3.
>>
>> RM>   The script would distinguish between cache read and disk read.
>>
>> RM>   So something like:
>>
>> RM>   ./nfsd_file.d
>>
>> RM>   CLIENT_IP   OPERATION   BYTES   TYPE   FILE
>> RM>   X.X.X.X   READ3 32768   logical/nfs/d1000/fileA
>>
>> RM>   ...
>>
>>
>>
>> RM>   and something like:
>>
>> RM>   ./nfsd_file_summ.d 100s
>>
>> RM>   CLIENT_IP   OPERATION   OPSBYTES   TYPE   FILE
>> RM>   X.X.X.X   READ3 230 5MBlogical/nfs/d1000/ 
>> fileA
>> RM>   X.X.X.X   READ3  15 1MBphysical   /nfs/d1000/ 
>> fileA
>>
>> RM>   ...
>>
>> RM>
>> RM>
>> RM>
>>
>>
>> Looks like vopstat and rfileio from DTrace toolkit is what I'm looking
>> for (with some modifications)
>>

ek> very cool, would you mind posting your dscript when you get it working?

ek> eric

Those scripts are from DTraceToolkit!
I've just make some simple modifications like parameterized frequency,
total summary, ...

As I see Brendan hooks into proper VOP operations.

The question however is why if I want to use dtrace io provider with
zfs + nfsd I don't get file names from args[2].fi_pathname?

Perhaps fsinfo::: could help but it's not on current s10 - I hope it
will be in U4 as it looks that it works with zfs (without manually
looking into vnodes, etc.):

bash-3.00# dtrace -n 
fsinfo::fop_read:read'{trace(args[0]->fi_pathname);trace(arg1);}'|grep -v 
unknown
dtrace: description 'fsinfo::fop_read:read' matched 1 probe
CPU IDFUNCTION:NAME
  0  65495fop_read:read   /usr/bin/cat  
   8
  0  65495fop_read:read   /usr/bin/cat  
  52
  0  65495fop_read:read   /usr/bin/cat  
 224
  0  65495fop_read:read   /usr/bin/cat  
  17
  0  65495fop_read:read   /lib/ld.so.1  
  52
  0  65495fop_read:read   /lib/ld.so.1  
 160
  0  65495fop_read:read   /home/milek/hs_err_pid23665.log   
8192
  0  65495fop_read:read   /home/milek/hs_err_pid23665.log   
3777
  0  65495fop_read:read   /home/milek/hs_err_pid23665.log   
   0
^Cbash-3.00#


/home is on zfs.

Looks like fsinfo::: should work properly with nfsd + zfs!


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bottlenecks in building a system

2007-04-20 Thread Adam Lindsay
Hi, hope you don't mind if I make some portions of your email public in 
a reply--I hadn't seen it come through on the list at all, so it's no 
duplicate to me.


Johansen wrote:
> Adam:
>
> Sorry if this is a duplicate, I had issues sending e-mail this morning.
>
> Based upon your CPU choices, I think you shouldn't have a problem
> saturating a GigE link with a pair of Operton 275's.  Just as a point of
> comparison, Sun sells a server with 48 SATA disks and 4 GigE ports:
>
> http://www.sun.com/servers/x64/x4500/specs.xml
>
> You have fewer disks, and nearly as much CPU power as the x4500.  I
> think you have plenty of CPU in your system.
>
> Your RAID controllers have as many SATA ports as the SATA cards in the
> x4500, and you seem to have the same ratio of disks to controllers.

I'm well aware of the Thumper, and it's fair to say it was an 
inspiration, just without two-thirds of the capacity or any of the 
serious redundancy. I also used the X4500 as a guide for


> I suspect that if you have a bottleneck in your system, it would be due
> to the available bandwidth on the PCI bus.

Mm. yeah, it's what I was worried about, too (mostly through ignorance 
of the issues), which is why I was hoping HyperTransport and PCIe were 
going to give that data enough room on the bus.
But after others expressed the opinion that the Areca PCIe cards were 
overkill, I'm now looking to putting some PCI-X cards on a different 
(probably slower) motherboard.


> Caching isn't going to be a huge help for writes, unless there's another
> thread reading simultaneoulsy from the same file.
>
> Prefetch will definitely use the additional RAM to try to boost the
> performance of sequential reads.  However, in the interest of full
> disclosure, there is a pathology that we've seen where the number of
> sequential readers exceeds the available space in the cache.  In this
> situation, sometimes the competeing prefetches for the different streams
> will cause more temporally favorable data to be evicted from the cache
> and performance will drop.  The workaround right now is just to disable
> prefetch.  We're looking into more comprehensive solutions.

Interesting. So noted. I will expect to have to test thoroughly.

>> I understand I'm not going to get terribly far in thought experiment
>> mode, but I want to be able to spec a box that balances cheap with
>> utility over time.
>
> If that's the case, I'm sure you could get by just fine with the pair of
> 275's.

Thanks,
adam
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread Roch - PAE
Tony Galway writes:

 > Anton & Roch,
 > 
 > Thank you for helping me understand this. I didn't want
to make too many assumptions that were unfounded and then
incorrectly relay that information back to clients. 
 > 
 > So if I might just repeat your statements, so my slow mind is sure it 
 > understands, and Roch, yes your assumption is correct that I am referencing 
 > File System Cache, not disk cache.
 > 
 > A. Copy-on-write exists solely to ensure on disk data
integrity, and as Anton pointed out it is completely
different than DirectIO. 

I would say 'ensure pool integrity' but you get the idea.
 > 
 > b. ZFS still avail's itself of a file system cache, and
therefore, it is possible that data can be lost if it hasn't
been written to disk and the server fails.

Yep.

 > 
 > c. The write throttling issue is known, and being looked
at - when it is fixed we don't know?  I'll add myself to the
notification list as an interested party :)

Yep.

 > 
 > Now to another question related to Anton's post. You mention that directIO 
 > does not exist in ZFS at this point. Are their plan's to support DirectIO; 
 > any functionality that will simulate directIO or some other non-caching 
 > ability suitable for critical systems such as databases if the client still 
 > wanted to deploy on filesystems.
 >  

here Anton and I disagree on this. I believe that ZFS
design would not gain much performance from something we'd call
directio. See:

http://blogs.sun.com/roch/entry/zfs_and_directio

-r

 >  
 > This message posted from opensolaris.org
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] problem mounting one of the zfs file system during boot

2007-04-20 Thread Krzys


hello everyone, I have strange issue and I am not sure why is this happening.

syncing file systems... done
rebooting...

SC Alert: Host System has Reset
Probing system devices
Probing memory
Probing I/O buses

Sun Fire V240, No Keyboard
Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.19, 8192 MB memory installed, Serial #65031515.
Ethernet address 0:3:ba:e0:4d:5b, Host ID: 83e04d5b.



Rebooting with command: boot
Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a  
File and args:
SunOS Release 5.10 Version Generic_125100-05 64-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hardware watchdog enabled
Hostname: chrysek
/kernel/drv/sparcv9/zpool symbol avl_add multiply defined
/kernel/drv/sparcv9/zpool symbol assfail3 multiply defined
WARNING: kstat_create('unix', 0, 'dmu_buf_impl_t'): namespace collision
mypool2/d3 uncorrectable error
checking ufs filesystems
/dev/rdsk/c1t0d0s7: is logging.

When my system is booting it does complain about mypool2/d3
mypool2/d3 uncorrectable error

but when system boots and I do
[11:31:01] [EMAIL PROTECTED]: /root > mount /d/d3
[11:31:06] [EMAIL PROTECTED]: /root > df -k /d/d3
Filesystem   1k-blocks  Used Available Use% Mounted on
mypool2/d3   648755898 179354764 469401134  28% /d/d3

not a problem
no errors, not compain, so manual mount works just fine while zfs boot mount of 
it does not work.


this is the entry in vfstab that I have:
mypool2/d3  mypool2/d3  /d/d3   zfs 2   yes logging

is there anything wrong that I do with it? As I said manual mount works just 
fine but during boot it complains about mounting it


[11:35:08] [EMAIL PROTECTED]: /root > zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
mypool 272G  2.12G  24.5K  /mypool
mypool/d   271G  2.12G   143G  /d/d2
mypool/[EMAIL PROTECTED]  3.72G  -   123G  -
mypool/[EMAIL PROTECTED]  22.3G  -   156G  -
mypool/[EMAIL PROTECTED]  23.3G  -   161G  -
mypool/[EMAIL PROTECTED]  16.1G  -   172G  -
mypool/[EMAIL PROTECTED]  13.8G  -   168G  -
mypool/[EMAIL PROTECTED]  15.7G  -   168G  -
mypool2489G   448G52K  /mypool2
mypool2/d3 171G   448G   171G  legacy

Regards,

Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive question

2007-04-20 Thread Krzys
It does not work, I did try to remove every snap and I ended destroying that 
pool all together and had to resend it all.. My goal is to use zfs send/receive 
for backup purposes to big storage system that I have, and keep snaps, I dont 
care if file system is mounted or not but I want to have ability every month to 
be able to send changes to it and update with current incremental snaps... but 
because of the main disk is beeing mounted, even that I dont go there it changes 
something on it like access times or something and that prevents me from sending 
incremental zfs snaps... :( So that -F option will work but its few months away 
from what I understand... and I would like to just do zfs send/receive now and 
keep updating it monthly or even daily.


Regards,

Chris


On Fri, 20 Apr 2007, Constantin Gonzalez wrote:


Hi,

Krzys wrote:

Ok, so -F option is not in U3, is there any way to replicate file system
and not be able to mount it automatically? so when I do zfs send/receive
it wont be mounted and changes would not be made so that further
replications could be possible? What I did notice was that if I am doing
zfs send/receive right one after another I am able to replicate all my
snaps, but when I wait a day or even few hours I get notice that file
system got changed, and that is because it was mounted and I guess
because of that I am not able to perform any more snaps to be send...
any idea what I could do meanwhile I am waiting for -F?


this should work:

 zfs unmount pool/filesystem
 zfs rollback (latest snapshot)
 zfs send ... | zfs receive
 zfs mount pool/filesystem

Better yet: Assuming you don't actually want to use the filesystem you
replicate to, but just use it as a sink for backup purposes, you can mark
it unmountable, then just send stuff to it.

 zfs set canmount=off pool/filesystem
 zfs rollback (latest snapshot, one last time)

Then, whenever you want to access the receiving filesystem, clone it.

Hope this helps,
  Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering


!DSPAM:122,4628d32121915021468!


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread Roch - PAE
Tony Galway writes:
 > Let me elaborate slightly on the reason I ask these questions.
 > 
 > I am performing some simple benchmarking, and during this a file is
 > created by sequentially writing 64k blocks until the 100Gb file is
 > created. I am seeing, and this is the exact same as VxFS, large pauses
 > while the system reclaims the memory that it has consumed. 
 > 
 > I assume that since ZFS (back to the write cache question) is
 > copy-on-write and is not write caching anything (correct me if I am
 > wrong), it is instead using memory for my read-cache. Also, since I
 > have 32Gb of memory the reclaim periods are quite long while it frees
 > this memory - basically rendering my volume unusable until that memory
 > is reclaimed. 
 > 
 > With VxFS I was able to tune the file system with write_throttle, and
 > this allowed me to find a balance basically whereby the system writes
 > crazy fast, and then reclaims memory, and repeats that cycle. 
 > 
 > I guess I could modify c_max in the kernel, to provide the same type
 > of result, but this is not a supported tuning practice - and thus I do
 > not want to do that. 
 > 
 > I am simply trying to determine where ZFS is different, the same, and
 > where how I can modify its default behaviours (or if I ever will). 
 > 
 > Also, FYI, I'm testing on Solaris 10 11/06 (All testing must be
 > performed in production versions of Solaris) but if there are changes
 > in Nevada that will show me different results, I would be interested
 > in those as an aside. 
 >  

Today, a txg sync can take a very long time for this type of 
workload. A first goal of write throttling will be to at
least bound the sync times. The amount of dirty memory (not quickly 
reclaimable) will then be limited and ARC should be much
better at adjusting itself. A second goal will be to keep
sync times close to 5 seconds further limiting the RAM
consumption.



 >  
 > This message posted from opensolaris.org
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive question

2007-04-20 Thread Krzys
Ok, so -F option is not in U3, is there any way to replicate file system and not 
be able to mount it automatically? so when I do zfs send/receive it wont be 
mounted and changes would not be made so that further replications could be 
possible? What I did notice was that if I am doing zfs send/receive right one 
after another I am able to replicate all my snaps, but when I wait a day or even 
few hours I get notice that file system got changed, and that is because it was 
mounted and I guess because of that I am not able to perform any more snaps to 
be send... any idea what I could do meanwhile I am waiting for -F?


Thank you.

Chris


On Tue, 17 Apr 2007, Nicholas Lee wrote:


On 4/17/07, Krzys <[EMAIL PROTECTED]> wrote:



and when I did try to run that last command I got the following error:
[16:26:00] [EMAIL PROTECTED]: /root > zfs send -i mypool/[EMAIL PROTECTED]
mypool/[EMAIL PROTECTED] |
zfs receive mypool2/[EMAIL PROTECTED]
cannot receive: destination has been modified since most recent snapshot

is there any way to do such replication by zfs send/receive and avoind
such
error message? Is there any way to force file system not to be mounted? Is
there
any way to make it maybe read only parition and then when its needed maybe
make
it live or whaverer?



Check the -F option to zfs receive. This automatically rolls back the
target.
Nicholas


!DSPAM:122,4623f42c1444623226276!


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re[2]: [nfs-discuss] NFSd and dtrace

2007-04-20 Thread Robert Milkowski
Hello Robert,

Friday, April 20, 2007, 4:54:33 PM, you wrote:

RM> Perhaps fsinfo::: could help but it's not on current s10 - I hope it
RM> will be in U4 as it looks that it works with zfs (without manually
RM> looking into vnodes, etc.):


Well, it's already in s10! (122641)
I missed that... :)

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Experience with Promise Tech. arrays/jbod's?

2007-04-20 Thread Marion Hakanson
Thanks to all for the helpful comments and questions.


[EMAIL PROTECTED] said:
> Isn't MPXIO support by HBA and hard drive identification (not by the
> enclosure)?  At least I don't see how the enclosure should matter, as long as
> it has 2 active paths.  So if you add the drive vendor info into /kernel/drv/
> scsi_vhci.conf it should work. 

If the enclosure is JBOD, then yes, the drives would be the targets of MPXIO.
But for a RAID enclosure, it's the RAID controller which speaks SCSI, adds and
removes LUN's, etc.  The three different arrays I've used have all had settings
where you specify what kind of alternate-path "protocol" to speak to the
various hosts involved.


[EMAIL PROTECTED] said:
> In a so called symmetric mode it should work as you described. But many entry
> level and midsize arrays aren't actually symmetric and they have to be
> treated specifically. 

This matches my limited experience.  What Sun calls "asymmetric" seems to
match what some array vendors call "active/active with LUN affinity" (or
"LUN ownership").  MPXIO "knows" about such asymmetric arrays, but some
arrays don't speak the right protocol (T10 ALUA), and there's so far no
way to manually tell MPXIO to do the asymmetric thing with them.

For example, our low-end HDS array looks to MPXIO as if it's symmetric, since
both controllers show their configured LUN's all the time.  But only one
controller can do I/O to a given LUN at one time, and the array takes a long
while to swap ownership between controllers, so MPXIO's default round-robin
load balancing yields terrible performance.  The workaround is to manually
set load-balancing to "none" and hope MPXIO uses the controller that you
were wanting to be primary.

And some people wonder why I prefer NAS over SAN...(:-).

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup mechanism for ZFS?

2007-04-20 Thread Richard Elling

Tim Thomas wrote:
I don't know enough about how ZFS manages memory other than what I have 
seen on this alias (I just joined a couple of weeks ago) which seems to 
indicate it is a  memory hog...as is VxFS so we are in good company. I 
am not against keeping data in memory so long as it has also been 
written to somewhere non-volatile as well so that data is not lost if 
the lights go out... and applications don't fight for memory to run. I 
recall stories from years ago where VxFS hogged so much memory on a Sun 
Cluster node that the Cluster services stalled and the cluster failed over!


Even after many years, I can still get mileage from this one :-)
http://www.sun.com/blueprints/0400/ram-vxfs.pdf

ZFS behaves differently, however, so the symptoms and prescriptions are
slightly different.

I need to go read some white papers on this...but I assume that 
something like direct I/O (which UFS, VxFS and QFS all have) is in the 
plans for ZFS so we don't end up double buffering data for apps like 
databases ? - that is just ugly.


Before you get very far down this path, it gets regularly rehashed here,
so Roch Bourbonnaise and Bob Sneed wrote some good blogs on the topic.
Especially:
http://blogs.sun.com/bobs/entry/one_i_o_two_i
http://blogs.sun.com/roch/entry/zfs_and_directio

 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanently removing vdevs from a pool

2007-04-20 Thread Matty

On 4/20/07, George Wilson <[EMAIL PROTECTED]> wrote:

This is a high priority for us and is actively being worked.

Vague enough for you. :-) Sorry I can't give you anything more exact
that that.


Hi George,

If ZFS is supposed to be part of "open"solaris, then why can't the
community get additional details? If really seems like much of the
development and design of ZFS goes on behind closed doors, and the
community as a whole is involved after the fact (Eric Shrock has
requested feedback from list members, which is awesome!). This makes
it difficult for folks to contribute, and to offer suggestions (or
code) that would better ZFS as a whole. Is there a reason that more of
the ZFS development discussions aren't occuring in public?

Thanks,
- Ryan
--
UNIX Administrator
http://prefetch.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: zfs boot image conversion kit is posted

2007-04-20 Thread MC
> Now the original question by MC I belive was about providing
VMware and/or Xen image with guest OS being snv_62 with / as zfs.

This is true.

I'm not sure what Jim meant about the host system needing to support zfs.  
Maybe you're on a different page, Jim :)

> I will setup a VM image that can be downloaded (I hope to get it done
tomorrow, but if not definitely by early next week) and played with
by anyone who is interested.

That would be golden, Brian.  Let me know if you can't get suitable hosting for 
it!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Help me understand ZFS caching

2007-04-20 Thread Anton B. Rang
> So if someone has a real world workload where having the ability to purposely 
> not cache user
> data would be a win, please let me know.

Multimedia streaming is an obvious one.

For databases, it depends on the application, but in general the database will 
do a better job of selecting which data to keep in memory than the file system 
can. (Of course, some low-end databases rely on the file system for this.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Andy Lubel

We are having a really tough time accepting the performance with ZFS and NFS 
interaction.  I have tried so many different ways trying to make it work (even 
zfs set:zil_disable 1) and I'm still no where near the performance of using a 
standard NFS mounted UFS filesystem - insanely slow; especially on file 
rewrites.

We have been combing the message boards and it looks like there was a lot of 
talk about this interaction of zfs+nfs back in november and before but since i 
have not seen much.  It seems the only fix up to that date was to disable zil, 
is that still the case?  Did anyone ever get closure on this?

We are running solaris 10 (SPARC) .latest patched 11/06 release connecting 
directly via FC to a 6120 with 2 raid 5 volumes over a bge interface (gigabit). 
 tried raidz, mirror and stripe with no negligible difference in speed.  the 
clients connecting to this machine are HP-UX 11i and OS X 10.4.9 and they both 
have corresponding performance characteristics.

Any insight would be appreciated - we really like zfs compared to any 
filesystem we have EVER worked on and dont want to revert if at all possible!


TIA,

Andy Lubel

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: zfs boot image conversion kit is posted

2007-04-20 Thread Brian Hechinger
On Fri, Apr 20, 2007 at 12:25:30PM -0700, MC wrote:
> 
> > I will setup a VM image that can be downloaded (I hope to get it done
> tomorrow, but if not definitely by early next week) and played with
> by anyone who is interested.
> 
> That would be golden, Brian.  Let me know if you can't get suitable hosting 
> for it!

I have somewhere I can put it, thanks for the offer though.  :)

I'm not going to get it done today.  What I will do, however, is upload
the tarball of the patches b62 dvd image (once it's done compressing)
for anyone who wants to snag it.  I'll probably turn it back into an
ISO come monday as well.

Probably a little later today.  I'll let you all know when it's up.

-brian
-- 
"Perl can be fast and elegant as much as J2EE can be fast and elegant.
In the hands of a skilled artisan, it can and does happen; it's just
that most of the shit out there is built by people who'd be better
suited to making sure that my burger is cooked thoroughly."  -- Jonathan 
Patschke
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)

2007-04-20 Thread Anton B. Rang
> You need exactly the same bandwidth as with any other
> classical backup solution - it doesn't matter how at the end you need
> to copy all those data (differential) out of the box regardless if it's
> a tape or a disk.

Sure.  However, it's somewhat cheaper to buy 100 MB/sec of local-attached tape 
than 100 MB/sec of long-distance networking.  (The pedant in me points out that 
you also need to move the tape to the remote site, which isn't entirely 
free)

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup mechanism for ZFS?

2007-04-20 Thread Frank Cusack

On April 20, 2007 9:54:07 AM +0100 Tim Thomas <[EMAIL PROTECTED]> wrote:

My initial reaction is that the world has got by without file systems
that can do [end-to-end data integrity] for a long time...so I don't see
the absence of this as a big deal.


How about

My initial reaction is that the world has got by without [email|cellphone|
other technology] for a long time ... so not a big deal.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bottlenecks in building a system

2007-04-20 Thread johansen-osdev
Adam:

> Hi, hope you don't mind if I make some portions of your email public in 
> a reply--I hadn't seen it come through on the list at all, so it's no 
> duplicate to me.

I don't mind at all.  I had hoped to avoid sending the list a duplicate
e-mail, although it looks like my first post never made it here.

> > I suspect that if you have a bottleneck in your system, it would be due
> > to the available bandwidth on the PCI bus.
> 
> Mm. yeah, it's what I was worried about, too (mostly through ignorance 
> of the issues), which is why I was hoping HyperTransport and PCIe were 
> going to give that data enough room on the bus.
> But after others expressed the opinion that the Areca PCIe cards were 
> overkill, I'm now looking to putting some PCI-X cards on a different 
> (probably slower) motherboard.

I dug up a copy of the S2895 block diagram and asked Bill Moore about
it.  He said that you should be able to get about 700mb/s off of each of
the PCI-X channels and that you only need 100mb/s to saturate a GigE
link.  He also observed that the RAID card you were using was
unnecessary and would probably hamper performance.  He reccomended
non-RAID SATA cards based upon the Marvell chipset.

Here's the e-mail trail on this list where he discusses Marvell SATA
cards in a bit more detail:

http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html

It sounds like if getting disk -> network is the concern, you'll have
plenty of bandwidth, assuming you have a reasonable controller card.

> > Caching isn't going to be a huge help for writes, unless there's another
> > thread reading simultaneoulsy from the same file.
> >
> > Prefetch will definitely use the additional RAM to try to boost the
> > performance of sequential reads.  However, in the interest of full
> > disclosure, there is a pathology that we've seen where the number of
> > sequential readers exceeds the available space in the cache.  In this
> > situation, sometimes the competeing prefetches for the different streams
> > will cause more temporally favorable data to be evicted from the cache
> > and performance will drop.  The workaround right now is just to disable
> > prefetch.  We're looking into more comprehensive solutions.
> 
> Interesting. So noted. I will expect to have to test thoroughly.

If you run across this problem and are willing to let me debug on your
system, shoot me an e-mail.  We've only seen this in a couple of
situations and it was combined with another problem where we were seeing
excessive overhead for kcopyout.  It's unlikely, but possible that you'll
hit this.

-K
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread johansen-osdev
Tony:

> Now to another question related to Anton's post. You mention that
> directIO does not exist in ZFS at this point. Are their plan's to
> support DirectIO; any functionality that will simulate directIO or
> some other non-caching ability suitable for critical systems such as
> databases if the client still wanted to deploy on filesystems.

I would describe DirectIO as the ability to map the application's
buffers directly for disk DMAs.  You need to disable the filesystem's
cache to do this correctly.  Having the cache disabled is an
implementation requirement for this feature.

Based upon this definition, are you seeking the ability to disable the
filesystem's cache or the ability to directly map application buffers
for DMA?

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Permanently removing vdevs from a pool

2007-04-20 Thread Mario Goebbels
Knowing that this is a planned feature and the ZFS team is actively working on 
it answers my question more than expected. Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: zfs boot image conversion kit is posted

2007-04-20 Thread MC
Good deal.  We'll have a race to build a a vm image, then :)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Bill Moore
When you say rewrites, can you give more detail?  For example, are you
rewriting in 8K chunks, random sizes, etc?  The reason I ask is because
ZFS will, by default, use 128K blocks for large files.  If you then
rewrite a small chunk at a time, ZFS is forced to read 128K, modify the
small chunk you're changing, and then write 128K.  Obviously, this has
adverse effects on performance.  :)  If your typical workload has a
preferred block size that it uses, you might try setting the recordsize
property in ZFS to match - that should help.

If you're completely rewriting the file, then I can't imagine why it
would be slow.  The only thing I can think of is the forced sync that
NFS does on a file closed.  But if you set zil_disable in /etc/system
and reboot, you shouldn't see poor performance in that case.

Other folks have had good success with NFS/ZFS performance (while other
have not).  If it's possible, could you characterize your workload in a
bit more detail?


--Bill

On Fri, Apr 20, 2007 at 04:07:44PM -0400, Andy Lubel wrote:
> 
> We are having a really tough time accepting the performance with ZFS
> and NFS interaction.  I have tried so many different ways trying to
> make it work (even zfs set:zil_disable 1) and I'm still no where near
> the performance of using a standard NFS mounted UFS filesystem -
> insanely slow; especially on file rewrites.
> 
> We have been combing the message boards and it looks like there was a
> lot of talk about this interaction of zfs+nfs back in november and
> before but since i have not seen much.  It seems the only fix up to
> that date was to disable zil, is that still the case?  Did anyone ever
> get closure on this?
> 
> We are running solaris 10 (SPARC) .latest patched 11/06 release
> connecting directly via FC to a 6120 with 2 raid 5 volumes over a bge
> interface (gigabit).  tried raidz, mirror and stripe with no
> negligible difference in speed.  the clients connecting to this
> machine are HP-UX 11i and OS X 10.4.9 and they both have corresponding
> performance characteristics.
> 
> Any insight would be appreciated - we really like zfs compared to any
> filesystem we have EVER worked on and dont want to revert if at all
> possible!
> 
> 
> TIA,
> 
> Andy Lubel
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> We have been combing the message boards and it looks like there was a lot of
> talk about this interaction of zfs+nfs back in november and before but since
> i have not seen much.  It seems the only fix up to that date was to disable
> zil, is that still the case?  Did anyone ever get closure on this? 

There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS
learns to do that itself.  See:
  http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Torrey McMahon

Marion Hakanson wrote:

[EMAIL PROTECTED] said:
  

We have been combing the message boards and it looks like there was a lot of
talk about this interaction of zfs+nfs back in november and before but since
i have not seen much.  It seems the only fix up to that date was to disable
zil, is that still the case?  Did anyone ever get closure on this? 



There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS
learns to do that itself.  See:
  http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html

  


The 6120 isn't the same as a 6130/61340/6540. The instructions 
referenced above won't work on a T3/T3+/6120/6320


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Andy Lubel

yeah i saw that post about the other arrays but none for this EOL'd hunk of 
metal.  i have some 6130's but hopefully by the time they are implemented we 
will have retired this nfs stuff and stepped into zvol iscsi targets.

thanks anyways.. back to the drawing board on how to resolve this!

-Andy

-Original Message-
From: [EMAIL PROTECTED] on behalf of Torrey McMahon
Sent: Fri 4/20/2007 6:00 PM
To: Marion Hakanson
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
 
Marion Hakanson wrote:
> [EMAIL PROTECTED] said:
>   
>> We have been combing the message boards and it looks like there was a lot of
>> talk about this interaction of zfs+nfs back in november and before but since
>> i have not seen much.  It seems the only fix up to that date was to disable
>> zil, is that still the case?  Did anyone ever get closure on this? 
>> 
>
> There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS
> learns to do that itself.  See:
>   http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html
>
>   

The 6120 isn't the same as a 6130/61340/6540. The instructions 
referenced above won't work on a T3/T3+/6120/6320

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup mechanism for ZFS?

2007-04-20 Thread Tim Thomas
My initial reaction is that the world has got by without 
[email|cellphone|
other technology] for a long time ... so not a big deal. 

Well, I did say I viewed it as an indefensible position :-)

Now shall we debate if the world is a better place because of cell 
phones :-P



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> The 6120 isn't the same as a 6130/61340/6540. The instructions  referenced
> above won't work on a T3/T3+/6120/6320 

Sigh.  I can't keep up (:-).  Thanks for the correction.

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Andy Lubel
Im not sure about the workload but I did configure the volumes with the block 
size in mind.. didnt seem to do much.  it could be due to the fact im basically 
HW raid then zfs raid and i just dont know the equation to define a smarter 
blocksize.  seems like if i have 2 arrays with 64kb striped together that 128k 
would be ideal for my zfs datasets, but again.. my logic isnt infinite when it 
comes to this fun stuff ;)

The 6120 has 2 volumes each with 64k stripe size blocks.  i then raidz'ed the 2 
volumes and tried both 64k and 128k.  i do get a bit of a performance gain on 
rewrite at 128k.

These are dd tests by the way:


*this one is locally, and works just great.

bash-3.00# date ; uname -a 
Thu Apr 19 21:11:22 EDT 2007 
SunOS yuryaku 5.10 Generic_125100-04 sun4u sparc SUNW,Sun-Fire-V210 
 ^---^

bash-3.00# df -k 
Filesystemkbytesused   avail capacity  Mounted on 
... 
se6120   697761792  26 666303904 1%/pool/se6120 
se6120/rfs-v10   31457280 9710895 2174638431%/pool/se6120/rfs-v10

bash-3.00# time dd if=/dev/zero of=/pool/se6120/rfs-v10/rw-test-1.loo bs=8192 
count=131072 
131072+0 records in 
131072+0 records out 
real0m13.783s real0m14.136s 
user0m0.331s 
sys 0m9.947s


*this one is from a HP-UX 11i system mounted to the v210 listed above:

onyx:/rfs># date ; uname -a 
Thu Apr 19 21:15:02 EDT 2007 
HP-UX onyx B.11.11 U 9000/800 1196424606 unlimited-user license 
 ^^ 
onyx:/rfs># bdf 
Filesystem  kbytesused   avail %used Mounted on 
... 
yuryaku.sol:/pool/se6120/rfs-v10 
   31457280 9710896 21746384   31% /rfs/v10

onyx:/rfs># time dd if=/dev/zero of=/rfs/v10/rw-test-2.loo bs=8192 count=131072 
131072+0 records in 
131072+0 records out

real1m2.25s real0m29.02s real0m50.49s 
user0m0.30s 
sys 0m8.16s

*my 6120 tidbits of interest:

6120 Release 3.2.6 Mon Feb  5 02:26:22 MST 2007 (xxx.xxx.xxx.xxx) 
Copyright (C) 1997-2006 Sun Microsystems, Inc.  All Rights Reserved. 
daikakuji:/:<1>vol mode 
volume mounted cachemirror 
v1 yes writebehind  off 
v2 yes writebehind  off 

daikakuji:/:<5>vol list 
volumecapacity raid data   standby 
v1  340.851 GB5 u1d01-06 u1d07 
v2  340.851 GB5 u1d08-13 u1d14 
daikakuji:/:<6>sys list 
controller : 2.5 
blocksize  : 64k 
cache  : auto 
mirror : auto 
mp_support : none 
naca   : off 
rd_ahead   : off 
recon_rate : med 
sys memsize: 256 MBytes 
cache memsize  : 1024 MBytes 
fc_topology: auto 
fc_speed   : 2Gb 
disk_scrubber  : on 
ondg   : befit


Am i missing something?  As far as the RW test, i will tinker some more and 
paste the results soonish.

Thanks in advance,

Andy Lubel

-Original Message-
From: Bill Moore [mailto:[EMAIL PROTECTED]
Sent: Fri 4/20/2007 5:13 PM
To: Andy Lubel
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
 
When you say rewrites, can you give more detail?  For example, are you
rewriting in 8K chunks, random sizes, etc?  The reason I ask is because
ZFS will, by default, use 128K blocks for large files.  If you then
rewrite a small chunk at a time, ZFS is forced to read 128K, modify the
small chunk you're changing, and then write 128K.  Obviously, this has
adverse effects on performance.  :)  If your typical workload has a
preferred block size that it uses, you might try setting the recordsize
property in ZFS to match - that should help.

If you're completely rewriting the file, then I can't imagine why it
would be slow.  The only thing I can think of is the forced sync that
NFS does on a file closed.  But if you set zil_disable in /etc/system
and reboot, you shouldn't see poor performance in that case.

Other folks have had good success with NFS/ZFS performance (while other
have not).  If it's possible, could you characterize your workload in a
bit more detail?


--Bill

On Fri, Apr 20, 2007 at 04:07:44PM -0400, Andy Lubel wrote:
> 
> We are having a really tough time accepting the performance with ZFS
> and NFS interaction.  I have tried so many different ways trying to
> make it work (even zfs set:zil_disable 1) and I'm still no where near
> the performance of using a standard NFS mounted UFS filesystem -
> insanely slow; especially on file rewrites.
> 
> We have been combing the message boards and it looks like there was a
> lot of talk about this interaction of zfs+nfs back in november and
> before but since i have not seen much.  It seems the only fix up to
> that date was to disable zil, is that still the case?  Did anyone ever
> get closure on this?
> 
> We are running solaris 10 (SPARC) .latest patched 11/06 release
> connecting directly via FC to a 6120 with 2 raid 5 volumes over a bge
> interface (gigabit).  tried rai

Re: [zfs-discuss] Permanently removing vdevs from a pool

2007-04-20 Thread Lori Alt

Matty wrote:

On 4/20/07, George Wilson <[EMAIL PROTECTED]> wrote:

This is a high priority for us and is actively being worked.

Vague enough for you. :-) Sorry I can't give you anything more exact
that that.


Hi George,

If ZFS is supposed to be part of "open"solaris, then why can't the
community get additional details? If really seems like much of the
development and design of ZFS goes on behind closed doors, and the
community as a whole is involved after the fact (Eric Shrock has
requested feedback from list members, which is awesome!). This makes
it difficult for folks to contribute, and to offer suggestions (or
code) that would better ZFS as a whole. Is there a reason that more of
the ZFS development discussions aren't occuring in public?


I can't speak for the zpool-shrink work, but I've been inspired (partly
by this post) to start collecting more input from the community regarding
the use of zfs as a root file system.  I've just started a blog
(http://blogs.sun.com/lalt/) and I'll be putting out posts
there and on this alias regarding some of the design issues that need
to get resolved.   So watch for it.

Lori

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)

2007-04-20 Thread Ian Collins
Anton B. Rang wrote:

>>You need exactly the same bandwidth as with any other
>>classical backup solution - it doesn't matter how at the end you need
>>to copy all those data (differential) out of the box regardless if it's
>>a tape or a disk.
>>
>>
>
>Sure.  However, it's somewhat cheaper to buy 100 MB/sec of local-attached tape 
>than 100 MB/sec of long-distance networking.  (The pedant in me points out 
>that you also need to move the tape to the remote site, which isn't entirely 
>free)
>
>  
>
But a tape in a van is a very high bandwidth connection :)

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)

2007-04-20 Thread Lyndon Nerenberg

But a tape in a van is a very high bandwidth connection :)


Australia used to get it's usenet feed on FedExed 9-tracks.

--lyndon

  The two most common elements in the universe are Hydrogen and stupidity.
-- Harlan Ellison
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Preferred backup mechanism for ZFS?

2007-04-20 Thread Toby Thain


On 20-Apr-07, at 5:54 AM, Tim Thomas wrote:


Hi Wee


I  run a setup of SAM-FS for our main file server and we loved the
backup/restore parts that you described.

That is great to hear.


The main concerns I have with SAM fronting the entire conversation is
data integrity. Unlike ZFS, SAMFS does not do end to end  
checksumming.
My initial reaction is that the world has got by without file  
systems that can do this for a long time.


Indeed. Progress is one-way like that...


..so I don't see the absence of this as a big deal.


Except that it dilutes the ZFS promise to, "we can keep your data  
safe... unless we have to restore any of it from a backup." It's a  
big deal if you want the integrity promise to extend beyond the pool  
itself! Or so it seems to me.


--T



Rgds

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bottlenecks in building a system

2007-04-20 Thread Adam Lindsay

[EMAIL PROTECTED] wrote:

I suspect that if you have a bottleneck in your system, it would be due
to the available bandwidth on the PCI bus.
Mm. yeah, it's what I was worried about, too (mostly through ignorance 
of the issues), which is why I was hoping HyperTransport and PCIe were 
going to give that data enough room on the bus.
But after others expressed the opinion that the Areca PCIe cards were 
overkill, I'm now looking to putting some PCI-X cards on a different 
(probably slower) motherboard.


I dug up a copy of the S2895 block diagram and asked Bill Moore about
it.  He said that you should be able to get about 700mb/s off of each of
the PCI-X channels and that you only need 100mb/s to saturate a GigE
link.  He also observed that the RAID card you were using was
unnecessary and would probably hamper performance.  He reccomended
non-RAID SATA cards based upon the Marvell chipset.

Here's the e-mail trail on this list where he discusses Marvell SATA
cards in a bit more detail:

http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html

It sounds like if getting disk -> network is the concern, you'll have
plenty of bandwidth, assuming you have a reasonable controller card.


Well, if that isn't from the horse's mouth, I don't know what is.

Elsewhere in the thread, I mention that I'm trying to go for a simpler 
system (well, less dependent upon PCIe) in favour of the S2892, which 
has the added benefit of having a NIC that is less maligned in the 
community. From what I can tell of the block diagram, it looks like the 
PCI-X subsystem is similar enough (except that it's shared with the 
NIC). It's sounding like a safe compromise to me, to use the Marvell 
chips on the oft-cited SuperMicro cards.



Caching isn't going to be a huge help for writes, unless there's another
thread reading simultaneoulsy from the same file.

Prefetch will definitely use the additional RAM to try to boost the
performance of sequential reads.  However, in the interest of full
disclosure, there is a pathology that we've seen where the number of
sequential readers exceeds the available space in the cache.  In this
situation, sometimes the competeing prefetches for the different streams
will cause more temporally favorable data to be evicted from the cache
and performance will drop.  The workaround right now is just to disable
prefetch.  We're looking into more comprehensive solutions.

Interesting. So noted. I will expect to have to test thoroughly.


If you run across this problem and are willing to let me debug on your
system, shoot me an e-mail.  We've only seen this in a couple of
situations and it was combined with another problem where we were seeing
excessive overhead for kcopyout.  It's unlikely, but possible that you'll
hit this.


That's one heck of an offer. I'd have no problem with this, nor with 
taking requests for particular benchmarks from the community. It's 
essentially a research machine, and if it can help others out, I'm all 
for it.


Now time to check on the project budget... :)

thanks,
adam
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: solaris - ata over ethernet - zfs - HPC

2007-04-20 Thread Andrew Chace
A new driver was released recently that has support for ZFS; check the Coraid 
website for details.

We have a Coraid at work that we are testing and hope to (eventually) put on 
our production network. We're running Solaris 9, so I'm not sure how comparable 
our results are with your situation. Anyway, we have ours configured with 4 
RAID-5 volumes across 12 disks. The main reason that it's not being used in 
production yet, is that we have to get anywhere close to their advertised 
throughput. We're getting around 30 MB/sec reads and around 23 MB/sec writes. 
The Coraid is attached to via a cross-over cable to a V240. Last I knew, the 
Coraid development team was aware of throughput issues on Solaris, and was 
working to improve their drivers. We have not yet tested with jumbo frames; I 
would expect that that would improve things somewhat. 

-Andrew
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)

2007-04-20 Thread Brian Hechinger
On Sat, Apr 21, 2007 at 11:14:02AM +1200, Ian Collins wrote:
> 
> >Sure.  However, it's somewhat cheaper to buy 100 MB/sec of local-attached 
> >tape than 100 MB/sec of long-distance networking.  (The pedant in me points 
> >out that you also need to move the tape to the remote site, which isn't 
> >entirely free)
> >
> But a tape in a van is a very high bandwidth connection :)

What's the old quote?  "Never underestimate the bandwidth of a
stationwagon full of tapes."  ;)

-brian
-- 
"Perl can be fast and elegant as much as J2EE can be fast and elegant.
In the hands of a skilled artisan, it can and does happen; it's just
that most of the shit out there is built by people who'd be better
suited to making sure that my burger is cooked thoroughly."  -- Jonathan 
Patschke
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: zfs boot image conversion kit is posted

2007-04-20 Thread Shawn Walker

remember that solaris express can only be distributed by authorized parties.

On 20/04/07, MC <[EMAIL PROTECTED]> wrote:

> Now the original question by MC I belive was about providing
VMware and/or Xen image with guest OS being snv_62 with / as zfs.

This is true.

I'm not sure what Jim meant about the host system needing to support zfs.
Maybe you're on a different page, Jim :)

> I will setup a VM image that can be downloaded (I hope to get it done
tomorrow, but if not definitely by early next week) and played with
by anyone who is interested.

That would be golden, Brian.  Let me know if you can't get suitable hosting
for it!


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
"Less is only more where more is no good." --Frank Lloyd Wright

Shawn Walker, Software and Systems Analyst
[EMAIL PROTECTED] - http://binarycrusader.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on the desktop

2007-04-20 Thread Bill Sommerfeld
On Tue, 2007-04-17 at 17:25 -0500, Shawn Walker wrote:
> > I would think the average person would want
> >  to have access to 1000s of DVDs / CDs within
> >  a small box versus taking up the full wall.
> 
> This is already being done now, and most of the companies doing it are
> being sued like crazy :)

The legal entanglements seem to specifically be around hard-disk-based
DVD jukeboxes.  But it's not completely hopeless -- one of them recently
won a first round in court:

http://www.kaleidescape.com/company/pr/PR-20070329-DVDCCA.html


- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Generic "filesystem code" list/community for opensolaris ?

2007-04-20 Thread Rich Brown
> Hi,
> 
> 
> so far, discussing filesystem code via opensolaris
> means a certain 
> "specialization", in the sense that we do have:
> 
> zfs-discuss
> ufs-discuss
> fuse-discuss
> 
> Likewise, there are ZFS, NFS and UFS communities
> (though I can't quite 
> figure out if we have nfs-discuss ?).
> 
> What's not there is a generic "FS thingies not in
> either of these". I.e. a 
> forum with the purpose of talking filesystem code in
> general (how to port 
> a *BSD filesystem, for example), or to contribute and
> discuss community 
> filesystem patches or early-access code.
> 
> Internally, we've been having a fs-interest mailing
> list for such a 
> purpose for decades - why no generic "FS forum" on
> OpenSolaris.org ?
> 
> There's more filesystems in the world than just ZFS,
> NFS and UFS. We do 
> have the legacy stuff, but there's also SMB/CIFS,
> NTFS, Linux-things, etc. 
> etc. etc.; I think these alone will never be
> high-volume enough to warrant 
> communities or even discussion lists of their own,
> but combined there's 
> surely enough to fill one mailing list ?
> 
> Why _not_ have a
> "[EMAIL PROTECTED]", and a fs
> community 
> that deals with anything that's not [NUZ]FS ?
> 
> Thanks for some thoughts on this,
> FrankH.
> 
> ==
> ==
> No good can come from selling your freedom, not for
> all gold of the world,
> for the value of this heavenly gift exceeds that of
> any fortune on earth.
> ==
> ==
> ___
> ufs-discuss mailing list
> [EMAIL PROTECTED]
> 

Hi Frank,

I'm about to discuss/announce some changes that are coming within the next few 
months into ONNV.  My understanding is that ufs-discuss is the right place to 
talk about generic file system issues.

As I was scanning through old threads, I found this one.  Are there plans to 
create a "file system code" list/community?  Is the ufs-discuss alias still the 
right place at present to discuss VFS-level changes?

Thanks,

Rich
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 6410 expansion shelf

2007-04-20 Thread Albert Chin
On Thu, Mar 22, 2007 at 01:21:04PM -0700, Frank Cusack wrote:
> Does anyone have a 6140 expansion shelf that they can hook directly to
> a host?  Just wondering if this configuration works.  Previously I
> though the expansion connector was proprietary but now I see it's
> just fibre channel.

The 6140 controller unit has either 2GB or 4GB cache. Does the 6140
expansion shelf have cache as well or is the cache in the controller
unit used for all expansions shelves?

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS+NFS on storedge 6120 (sun t4)

2007-04-20 Thread Leon Koll
Welcome to the club, Andy...

I tried several times to attract the attention of the community to the dramatic 
performance degradation (about 3 times) of NFZ/ZFS vs. ZFS/UFS combination - 
without any result : http://www.opensolaris.org/jive/thread.jspa?messageID=98592";>[1] , http://www.opensolaris.org/jive/thread.jspa?threadID=24015";>[2].

Just look at two graphs in my http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html";>posting
 dated August, 2006 to see how bad the situation was and, unfortunately, 
this situation wasn't changed much recently: 
http://photos1.blogger.com/blogger/7591/428/1600/sfs.1.png

I don't think the storage array is a source of the problems you reported. It's 
somewhere else...

[i]-- leon[/i]
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss