date:20091103

[zfs-discuss] SunOS neptune 5.11 snv_127 sun4u sparc SUNW, Sun-Fire-880

2009-11-03 Thread Dennis Clarke


I just went through a BFU update to snv_127 on a V880 :

neptune console login: root
Password:
Nov  3 08:19:12 neptune login: ROOT LOGIN /dev/console
Last login: Mon Nov  2 16:40:36 on console
Sun Microsystems Inc.   SunOS 5.11  snv_127 Nov. 02, 2009
SunOS Internal Development: root 2009-Nov-02 [onnv_127-tonic]
bfu'ed from /build/archives-nightly-osol/sparc on 2009-11-03

I have [ high ] hopes that there was a small tarball somewhere which
contained the sources listed in :

http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html

Is there such a tarball anywhere at all or shall I just wait
 for the putback to hit the mercurial repo ?

Yes .. this is sort of begging .. but I call it "enthusiasm" :-)


-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS dedup issue

2009-11-03 Thread Piotr Jasiukajtis

Hi,

Lets take a look:

# zpool list
NAMESIZE   USED  AVAILCAP  DEDUP  HEALTH  ALTROOT
rpool68G  13.9G  54.1G20%  42.27x  ONLINE  -

# zfs get all rpool/export/data
NAME   PROPERTYVALUE
   SOURCE
rpool/export/data  typefilesystem   
   -
rpool/export/data  creationMon Nov  2 16:11 2009
   -
rpool/export/data  used46.7G
   -
rpool/export/data  available   38.7M
   -
rpool/export/data  referenced  46.7G
   -
rpool/export/data  compressratio   1.00x
   -
rpool/export/data  mounted yes  
   -
rpool/export/data  quota   none 
   default
rpool/export/data  reservation none 
   default
rpool/export/data  recordsize  128K 
   default
rpool/export/data  mountpoint  /export/data 
   inherited from rpool/export
rpool/export/data  sharenfsoff  
   default
rpool/export/data  checksumon   
   default
rpool/export/data  compression off  
   default
rpool/export/data  atime   on   
   default
rpool/export/data  devices on   
   default
rpool/export/data  execon   
   default
rpool/export/data  setuid  on   
   default
rpool/export/data  readonlyoff  
   default
rpool/export/data  zoned   off  
   default
rpool/export/data  snapdir hidden   
   default
rpool/export/data  aclmode groupmask
   default
rpool/export/data  aclinherit  restricted   
   default
rpool/export/data  canmounton   
   default
rpool/export/data  shareiscsi  off  
   default
rpool/export/data  xattr   on   
   default
rpool/export/data  copies  1
   default
rpool/export/data  version 4
   -
rpool/export/data  utf8onlyoff  
   -
rpool/export/data  normalization   none 
   -
rpool/export/data  casesensitivity sensitive
   -
rpool/export/data  vscan   off  
   default
rpool/export/data  nbmand  off  
   default
rpool/export/data  sharesmboff  
   default
rpool/export/data  refquotanone 
   default
rpool/export/data  refreservation  none 
   default
rpool/export/data  primarycacheall  
   default
rpool/export/data  secondarycache  all  
   default
rpool/export/data  usedbysnapshots 0
   -
rpool/export/data  usedbydataset   46.7G
   -
rpool/export/data  usedbychildren  0
   -
rpool/export/data  usedbyrefreservation0
   -
rpool/export/data  logbias latency  
   default
rpool/export/data  dedup   on   
   local
rpool/export/data  org.opensolaris.caiman:install  ready
   inherited from rpool



# df -h
FilesystemSize  Used Avail Use% Mounted on
rpool/ROOT/os_b123_dev
  2.4G  2.4G   40M  99% /
swap  9.1G  336K  9.1G   1% /etc/svc/volatile
/usr/lib/libc/libc_hwcap1.so.1
  2.4G  2.4G   40M  99% /lib/libc.so.1
swap  9.1G 0  9.1G   0% /tmp
swap  9.1G   40K  9.1G   1% /var/run
rpool/export   40M   25K   40M   1% /export
rpool/export/home  40M   30K   40M   1% /export/home
rpool/export/home/admin
  460M  421M   40M  92% /export/home/admin
rpool  40M   83K   40M   1% /rpool
rpool/expo

Re: [zfs-discuss] dedup question

2009-11-03 Thread Toby Thain



On 2-Nov-09, at 3:16 PM, Nicolas Williams wrote:


On Mon, Nov 02, 2009 at 11:01:34AM -0800, Jeremy Kitchen wrote:

forgive my ignorance, but what's the advantage of this new dedup over
the existing compression option?  Wouldn't full-filesystem  
compression

naturally de-dupe?

...
There are many examples where snapshot/clone isn't feasible but dedup
can help.  For example: mail stores (though they can do dedup at the
application layer by using message IDs and hashes).  For example: home
directories (think of users saving documents sent via e-mail).  For
example: source code workspaces (ONNV, Xorg, Linux, whatever), where
users might not think ahead to snapshot/clone a local clone (I also  
tend

to maintain a local SCM clone that I then snapshot/clone to get
workspaces for bug fixes and projects; it's a pain, really).  I'm sure
there are many, many other examples.


A couple that come to mind... Some patterns become much cheaper with  
dedup:


- The Subversion working copy format where you have the reference  
checked out file alongside the working file
- QA/testing system where you might have dozens or hundreds of builds  
of iterations an application, mostly identical


Exposing checksum metadata might have interesting implications for  
operations like diff, cmp, rsync, even tar.


--Toby



The workspace example is particularly interesting: with the
snapshot/clone approach you get to deduplicate the _source code_, but
not the _object code_, while with dedup you get both dedup'ed
automatically.

As for compression, that helps whether you dedup or not, and it  
helps by
about the same factor either way -- dedup and compression are  
unrelated,

really.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SunOS neptune 5.11 snv_127 sun4u sparc SUNW, Sun-Fire-880

2009-11-03 Thread James C. McPherson


Dennis Clarke wrote:

I just went through a BFU update to snv_127 on a V880 :

neptune console login: root
Password:
Nov  3 08:19:12 neptune login: ROOT LOGIN /dev/console
Last login: Mon Nov  2 16:40:36 on console
Sun Microsystems Inc.   SunOS 5.11  snv_127 Nov. 02, 2009
SunOS Internal Development: root 2009-Nov-02 [onnv_127-tonic]
bfu'ed from /build/archives-nightly-osol/sparc on 2009-11-03

I have [ high ] hopes that there was a small tarball somewhere which
contained the sources listed in :

http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html

Is there such a tarball anywhere at all or shall I just wait
 for the putback to hit the mercurial repo ?

Yes .. this is sort of begging .. but I call it "enthusiasm" :-)


Hi Dennis,
we haven't done source tarballs or Mercurial bundles in quite
some time, since it's more efficient for you to pull from the
Mercurial repo and build it yourself :)

Also, the build 127 tonic bits that I generated today (and
which you appear to be using) won't contain Jeff's push from
yesterday, because that changeset is part of build 128 - and
I haven't closed the build yet.

The push is in the repo, btw:


changeset:   10922:e2081f502306
user:Jeff Bonwick 
date:Sun Nov 01 14:14:46 2009 -0800
comments:
PSARC 2009/571 ZFS Deduplication Properties
6677093 zfs should have dedup capability



cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs on multiple machines

2009-11-03 Thread Miha Voncina

Hi,

is it possible to link multiple machines into one storage pool using zfs?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] More Dedupe Questions...

2009-11-03 Thread Darren J Moffat


Tristan Ball wrote:


I'm curious as to how send/recv intersects with dedupe... if I send/recv 
a deduped filesystem, is the data sent it it's de-duped form, ie just 
sent once, followed by the pointers for subsequent dupe data, or is the 
the data sent in expanded form, with the recv side system then having to 
redo the dedupe process?


The on disk dedup and dedup of the stream are actually separate 
features.   Stream dedup hasn't yet integrated.  It will be a choice at 
*send* time if the stream is to be deduplicated.


Obviously sending it deduped is more efficient in terms of bandwidth and 
CPU time on the recv side, but it may also be more complicated to achieve?


A stream can be deduped even if the on disk format isn't and vice versa.

Also - do we know yet what affect block size has on dedupe? My guess is 
that a smaller block size will perhaps give a better duplication match 
rate, but at the cost of higher CPU usage and perhaps reduced 
performance, as the system will need to store larger de-dupe hash tables?


That really depends on how the applications write blocks and what your 
data is like.  It could go either way very easily.   As with all dedup 
it is a trade off between IO bandwidth and CPU/memory.  Sometimes dedup 
will improve performance, since like compression it can reduce IO 
requirements, but depending on workload the CPU/memory overhead may or 
may not be worth it (same with compression).


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on multiple machines

2009-11-03 Thread Darren J Moffat


Miha Voncina wrote:

Hi,

is it possible to link multiple machines into one storage pool using zfs?


Depends what you mean by this.

Multiple machines can not import the same ZFS pool at the same time, 
doing so *will* cause corruption and ZFS tries hard to protect against 
multiple imports.


However ZFS can use iSCSI LUNs from multiple target machines for its 
disks that make up a given pool.


ZFS volumes (ZVOLS) can also be used as iSCSI targets and thus shared 
out to multiple machines.


ZFS file systems can be shared over NFS and CIFS and thus shared by 
multiple machines.


ZFS pools can be used in a Sun Cluster configuration but will only 
imported into a single node of a Sun Cluster configuration at a time.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SunOS neptune 5.11 snv_127 sun4u sparc SUNW, Sun-Fire-880

2009-11-03 Thread Dennis Clarke


> Dennis Clarke wrote:
>> I just went through a BFU update to snv_127 on a V880 :
>>
>> neptune console login: root
>> Password:
>> Nov  3 08:19:12 neptune login: ROOT LOGIN /dev/console
>> Last login: Mon Nov  2 16:40:36 on console
>> Sun Microsystems Inc.   SunOS 5.11  snv_127 Nov. 02, 2009
>> SunOS Internal Development: root 2009-Nov-02 [onnv_127-tonic]
>> bfu'ed from /build/archives-nightly-osol/sparc on 2009-11-03
>>
>> I have [ high ] hopes that there was a small tarball somewhere which
>> contained the sources listed in :
>>
>> http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html
>>
>> Is there such a tarball anywhere at all or shall I just wait
>>  for the putback to hit the mercurial repo ?
>>
>> Yes .. this is sort of begging .. but I call it "enthusiasm" :-)
>
> Hi Dennis,
> we haven't done source tarballs or Mercurial bundles in quite
> some time, since it's more efficient for you to pull from the
> Mercurial repo and build it yourself :)

Well, funny you should mention it.

I was this close ( -->|.|<-- ) to running a nightly build and then I had a
minor brainwave .. "why bother?" because the sparc archive bits were there
already.

> Also, the build 127 tonic bits that I generated today (and
> which you appear to be using) won't contain Jeff's push from
> yesterday, because that changeset is part of build 128 - and
> I haven't closed the build yet.
>
> The push is in the repo, btw:
>
>
> changeset:   10922:e2081f502306
> user:Jeff Bonwick 
> date:Sun Nov 01 14:14:46 2009 -0800
> comments:
> PSARC 2009/571 ZFS Deduplication Properties
>  6677093 zfs should have dedup capability
>

funny .. I didn't see it last night.  :-\

I'll blame the coffee and go get a "nightly" happening right away :-)

Thanks for the reply!

-- 
Dennis Clarke
dcla...@opensolaris.ca  <- Email related to the open source Solaris
dcla...@blastwave.org   <- Email related to open source for Solaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Jürgen Keil

> So.. it seems that data is deduplicated, zpool has
> 54.1G of free space, but I can use only 40M.
> 
> It's x86, ONNV revision 10924, debug build, bfu'ed from b125.

I think I'm observing the same (with changeset 10936) ...

I created a 2GB file, and a "tank" zpool on top of that file,
with compression and dedup enabled:

mkfile 2g /var/tmp/tank.img
zpool create tank /var/tmp/tank.img
zfs set dedup=on tank
zfs set compression=on tank


Now I tried to create four zfs filesystems, 
and filled them by pulling and updating
the same set of onnv sources from mercurial.

One copy needs ~ 800MB of disk space 
uncompressed, or ~ 520MB compressed. 
During the 4th "hg update":

> hg update
abort: No space left on device: 
/tank/snv_128_yy/usr/src/lib/libast/sparcv9/src/lib/libast/FEATURE/common


> zpool list tank
NAME   SIZE   USED  AVAILCAP  DEDUP  HEALTH  ALTROOT
tank  1,98G   720M  1,28G35%  3.70x  ONLINE  -


> zfs list -r tank
NAME  USED  AVAIL  REFER  MOUNTPOINT
tank 1,95G  026K  /tank
tank/snv_128  529M  0   529M  /tank/snv_128
tank/snv_128_jk   530M  0   530M  /tank/snv_128_jk
tank/snv_128_xx   530M  0   530M  /tank/snv_128_xx
tank/snv_128_yy   368M  0   368M  /tank/snv_128_yy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Jürgen Keil

> I think I'm observing the same (with changeset 10936) ...

# mkfile 2g /var/tmp/tank.img
# zpool create tank /var/tmp/tank.img
# zfs set dedup=on tank
# zfs create tank/foobar


> dd if=/dev/urandom of=/tank/foobar/file1 bs=1024k count=512
512+0 records in
512+0 records out
> cp /tank/foobar/file1 /tank/foobar/file2
> cp /tank/foobar/file1 /tank/foobar/file3
> cp /tank/foobar/file1 /tank/foobar/file4
/tank/foobar/file4: No space left on device

>  zfs list -r tank
NAME  USED  AVAIL  REFER  MOUNTPOINT
tank 1.95G  022K  /tank
tank/foobar  1.95G  0  1.95G  /tank/foobar

> zpool list tank
NAME   SIZE   USED  AVAILCAP  DEDUP  HEALTH  ALTROOT
tank  1.98G   515M  1.48G25%  3.90x  ONLINE  -
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris disk confusion ?

2009-11-03 Thread Jeremy Kister


On 11/2/2009 9:23 PM, Marion Hakanson wrote:

Could it be that c12t1d0 was at some time in the past (either in this
machine or another machine) known as c3t11d0, and was part of a pool
called "dbzpool"?


quite possibly.  but certainly not this host's dbzpool.


You'll need to give the same "dd" treatment to the end of the disk as well;
ZFS puts copies of its labels at the beginning and at the end.  Oh, and


im not sure what you mean here - I thought p0 was the entire disk in x86 - 
and s2 was the whole disk in the partition.  what else should i overwrite?


Thanks,

--

Jeremy Kister
http://jeremy.kister.net./



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-03 Thread Ross Walker


On Nov 2, 2009, at 2:38 PM, "Paul B. Henson"  wrote:


On Sat, 31 Oct 2009, Al Hopper wrote:

Kudos to you - nice technical analysis and presentation, Keep  
lobbying
your point of view - I think interoperability should win out if it  
comes

down to an arbitrary decision.


Thanks; but so far that doesn't look promising. Right now I've got a  
cron
job running every hour on the backend servers crawling around and  
fixing

permissions on new directories :(.

You would have thought something like this would have been noticed  
in one

of the NFS interoperability bake offs.


Paul,

Maybe your approaching this the wrong way.

Maybe this isn't an interoperability fix, but a security fix as it  
allows non-Sun clients to bypass security restrictions placed on a  
sgid protected directory tree because it doesn't properly test the  
existence of that bit upon file creation.


If an appropriate scenario can be made, and I'm sure it can, one might  
even post a CERT advisory to make sure operators are made aware of  
this potential security problem.


-Ross


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe is in

2009-11-03 Thread Orvar Korvar

I was under the impression that you can create a new zfs dataset and turn on 
the dedup functionality, and copy your data to it. Or am I wrong?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Location of ZFS documentation (source)?

2009-11-03 Thread Cindy Swearingen


Alex,

You can download the man page source files from this URL:

http://dlc.sun.com/osol/man/downloads/current/

If you want a different version, you can navigate to the available
source consolidations from the Downloads page on opensolaris.org.

Thanks,

Cindy

On 11/02/09 16:39, Cindy Swearingen wrote:

Hi Alex,

I'm checking with some folks on how we handled this handoff
for the previous project.

I'll get back to you shortly.

Thanks,

Cindy

On 11/02/09 16:07, Alex Blewitt wrote:

The man pages documentation from the old Apple port
(http://github.com/alblue/mac-zfs/tree/master/zfs_documentation/man8/)
don't seem to have a corresponding source file in the onnv-gate
repository (http://hub.opensolaris.org/bin/view/Project+onnv/WebHome)
although I've found the text on-line
(http://docs.sun.com/app/docs/doc/819-2240/zfs-1m)

Can anyone point me to where these are stored, so that we can update
the documentation in the Apple fork?

Alex
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] De-Dupe and iSCSI

2009-11-03 Thread Tiernan OToole

Good morning all...

Great work on the De-Dupe stuff. cant wait to try it out. but quick question
about iSCSI and De-Dupe. will it work? if i share out a ZVOL to another
machine and copy some simular files to it (thinking VMs) will they get
de-duplicated?

Thanks.

-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.tiernanotoolephotography.com
www.the-hairy-one.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe is in

2009-11-03 Thread Darren J Moffat


Orvar Korvar wrote:

I was under the impression that you can create a new zfs dataset and turn on 
the dedup functionality, and copy your data to it. Or am I wrong?


you don't even have to create a new dataset just do:

# zfs set dedup=on 

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] De-Dupe and iSCSI

2009-11-03 Thread Darren J Moffat


Tiernan OToole wrote:
Good morning all... 

Great work on the De-Dupe stuff. cant wait to try it out. but quick 
question about iSCSI and De-Dupe. will it work? if i share out a ZVOL to 
another machine and copy some simular files to it (thinking VMs) will 
they get de-duplicated?


It works but how much benefit you will get from it, since it is block 
not file based, depends on what type of filesystem and/or application is 
on the iSCSI target.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients

2009-11-03 Thread Paul B. Henson

On Tue, 3 Nov 2009, Ross Walker wrote:

> Maybe this isn't an interoperability fix, but a security fix as it allows
> non-Sun clients to bypass security restrictions placed on a sgid
> protected directory tree because it doesn't properly test the existence
> of that bit upon file creation.
>
> If an appropriate scenario can be made, and I'm sure it can, one might
> even post a CERT advisory to make sure operators are made aware of this
> potential security problem.

I agree it's a security issue, I think I mentioned that at some point in
this thread. However, it doesn't allow a client to do something they
couldn't do anyway. If the sgid bit was respected and the directory was
created with the right group, the client could chgrp it to their primary
group afterwards. The security issue isn't that an evil client will avail
of this to end up with a directory owned by the wrong group, it's that a
poor innocent client will end up with a directory owned by their primary
group rather than the group of the parent directory, and any inherited
group@ ACL will apply to the primary group, resulting in insecure and
unintended access :(.

Another possible security issue that came up while I was discussing this
issue with one of the Linux NFSv4 developers is that relying upon the
client to set the ownership of the directory results in a race condition
and is in their opinion buggy.

In between the time the client generates the mkdir request and sends it
over the wire and the server receives it, someone else might have changed
the permissions or group ownership of the parent directory, resulting in
the explicitly specified group provided by the client being wrong. They
refuse to implement this buggy behavior, and to quote them, "You should get
Sun to fix their server".

I'm trying to do that, but no luck so far ...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] More Dedupe Questions...

2009-11-03 Thread Kyle McDonald


Hi Darren,

More below...

Darren J Moffat wrote:

Tristan Ball wrote:

Obviously sending it deduped is more efficient in terms of bandwidth 
and CPU time on the recv side, but it may also be more complicated to 
achieve?


A stream can be deduped even if the on disk format isn't and vice versa.

Is the send dedup'ing more efficient if the filesystem is already 
depdup'd? If both are enabled do they share anything?


 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-11-03 Thread Jeroen Roodhart

We see the same issue on a x4540 Thor system with 500G disks:

lots of:
...
Nov  3 16:41:46 uva.nl scsi: [ID 107833 kern.warning] WARNING: 
/p...@3c,0/pci10de,3...@f/pci1000,1...@0 (mpt5):
Nov  3 16:41:46 encore.science.uva.nl   Disconnected command timeout for Target 
7
...

This system is running nv125 XvM. Seems to occur more when we are using vm-s. 
This of course causes very long interruptions on the vm-s as well...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] More Dedupe Questions...

2009-11-03 Thread Lori Alt


Kyle McDonald wrote:

Hi Darren,

More below...

Darren J Moffat wrote:

Tristan Ball wrote:

Obviously sending it deduped is more efficient in terms of bandwidth 
and CPU time on the recv side, but it may also be more complicated 
to achieve?


A stream can be deduped even if the on disk format isn't and vice versa.

Is the send dedup'ing more efficient if the filesystem is already 
depdup'd? If both are enabled do they share anything?


 -Kyle



At this time, no.  But very shortly we hope to tie the two together 
better to make use of the existing checksums and duplication info 
available in the on-disk and in-kernel structures.


Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Eric Schrock



On Nov 3, 2009, at 6:01 AM, Jürgen Keil wrote:


I think I'm observing the same (with changeset 10936) ...


   # mkfile 2g /var/tmp/tank.img
   # zpool create tank /var/tmp/tank.img
   # zfs set dedup=on tank
   # zfs create tank/foobar


This has to do with the fact that dedup space accounting is charged to  
all filesystems, regardless of whether blocks are deduped.  To do  
otherwise is impossible, as there is no true "owner" of a block, and  
the fact that it may or may not be deduped is often beyond the control  
of a single filesystem.


This has some interesting pathologies as the pool gets full.  Namely,  
that ZFS will artificially enforce a limit on the logical size of the  
pool based on non-deduped data.  This is obviously something that  
should be addressed.


- Eric





dd if=/dev/urandom of=/tank/foobar/file1 bs=1024k count=512

   512+0 records in
   512+0 records out

cp /tank/foobar/file1 /tank/foobar/file2
cp /tank/foobar/file1 /tank/foobar/file3
cp /tank/foobar/file1 /tank/foobar/file4

   /tank/foobar/file4: No space left on device


zfs list -r tank

   NAME  USED  AVAIL  REFER  MOUNTPOINT
   tank 1.95G  022K  /tank
   tank/foobar  1.95G  0  1.95G  /tank/foobar


zpool list tank

   NAME   SIZE   USED  AVAILCAP  DEDUP  HEALTH  ALTROOT
   tank  1.98G   515M  1.48G25%  3.90x  ONLINE  -
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris disk confusion ?

2009-11-03 Thread David Dyer-Bennet

On Mon, November 2, 2009 20:23, Marion Hakanson wrote:

> You'll need to give the same "dd" treatment to the end of the disk as
> well;
> ZFS puts copies of its labels at the beginning and at the end.

Does anybody else see this as rather troubling?  Obviously it's dangerous
to get in the habit of doing this as a routine operation, which to read
advice here is how people are thinking of it.

It seems to me that something in ZFS's protective procedures is missing or
astray or over-active -- being protective is good, but there needs to be a
way to re-use a disk that's been used before, too.  And frequently people
are at a loss to even understand what the possible conflict might be.

Maybe a doubling of the -f option should give as full an explanation as
possible of what the evidence shows as previous use, and then let you
override it if you really really insist?  Or some other option?  Or an
entirely separate utility (or script)?

What I basically want, I think, is a standard way to get an explanation of
exactly what ZFS thinks the conflict in my new proposed use of a disk
might be -- and then a standard and as-safe-as-possible way to tell it to
go ahead and use the disk.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Nils Goroll


Hi Eric and all,

Eric Schrock wrote:


On Nov 3, 2009, at 6:01 AM, Jürgen Keil wrote:


I think I'm observing the same (with changeset 10936) ...


   # mkfile 2g /var/tmp/tank.img
   # zpool create tank /var/tmp/tank.img
   # zfs set dedup=on tank
   # zfs create tank/foobar


This has to do with the fact that dedup space accounting is charged to 
all filesystems, regardless of whether blocks are deduped.  To do 
otherwise is impossible, as there is no true "owner" of a block


It would be great if someone could explain why it is hard (impossible? not a
good idea?) to account all data sets for at least one reference to each dedup'ed
block and add this space to the total free space?

This has some interesting pathologies as the pool gets full.  Namely, 
that ZFS will artificially enforce a limit on the logical size of the 
pool based on non-deduped data.  This is obviously something that should 
be addressed.


Would the idea I mentioned not address this issue as well?

Thanks, Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Anurag Agarwal

Hi,

It looks interesting problem.

Would it help if as ZFS detects dedup blocks, it can start increasing
effective size of pool.
It will create an anomaly with respect to total disk space, but it will
still be accurate from each file system usage point of view.

Basically, dedup is at block level, so space freed can effectively be
accounted as extra free blocks added to pool. Just a thought.

Regards,
Anurag.


On Tue, Nov 3, 2009 at 9:39 PM, Nils Goroll  wrote:

> Hi Eric and all,
>
> Eric Schrock wrote:
>
>>
>> On Nov 3, 2009, at 6:01 AM, Jürgen Keil wrote:
>>
>>  I think I'm observing the same (with changeset 10936) ...

>>>
>>>   # mkfile 2g /var/tmp/tank.img
>>>   # zpool create tank /var/tmp/tank.img
>>>   # zfs set dedup=on tank
>>>   # zfs create tank/foobar
>>>
>>
>> This has to do with the fact that dedup space accounting is charged to all
>> filesystems, regardless of whether blocks are deduped.  To do otherwise is
>> impossible, as there is no true "owner" of a block
>>
>
> It would be great if someone could explain why it is hard (impossible? not
> a
> good idea?) to account all data sets for at least one reference to each
> dedup'ed
> block and add this space to the total free space?
>
>  This has some interesting pathologies as the pool gets full.  Namely, that
>> ZFS will artificially enforce a limit on the logical size of the pool based
>> on non-deduped data.  This is obviously something that should be addressed.
>>
>
> Would the idea I mentioned not address this issue as well?
>
> Thanks, Nils
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>



-- 
Anurag Agarwal
CEO, Founder
KQ Infotech, Pune
www.kqinfotech.com
9881254401
Coordinator Akshar Bharati
www.aksharbharati.org
Spreading joy through reading
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Bartlomiej Pelc

Well, then you could have more "logical space" than "physical space", and that 
would be extremely cool, but what happens if for some reason you wanted to turn 
off dedup on one of the filesystems? It might exhaust all the pool's space to 
do this. I think good idea would be another pool's/filesystem's property, that 
when turned on, would allow allocating more "logical data" than pool's 
capacity, but then you would accept risks that involve it. Then administrator 
could decide which is better for his system.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris disk confusion ?

2009-11-03 Thread Cindy Swearingen


Hi David,

This RFE is filed for this feature:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6893282

Allow the zpool command to wipe labels from disks

Cindy

On 11/03/09 09:00, David Dyer-Bennet wrote:

On Mon, November 2, 2009 20:23, Marion Hakanson wrote:


You'll need to give the same "dd" treatment to the end of the disk as
well;
ZFS puts copies of its labels at the beginning and at the end.


Does anybody else see this as rather troubling?  Obviously it's dangerous
to get in the habit of doing this as a routine operation, which to read
advice here is how people are thinking of it.

It seems to me that something in ZFS's protective procedures is missing or
astray or over-active -- being protective is good, but there needs to be a
way to re-use a disk that's been used before, too.  And frequently people
are at a loss to even understand what the possible conflict might be.

Maybe a doubling of the -f option should give as full an explanation as
possible of what the evidence shows as previous use, and then let you
override it if you really really insist?  Or some other option?  Or an
entirely separate utility (or script)?

What I basically want, I think, is a standard way to get an explanation of
exactly what ZFS thinks the conflict in my new proposed use of a disk
might be -- and then a standard and as-safe-as-possible way to tell it to
go ahead and use the disk.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Robert Milkowski


Cyril Plisko wrote:

I think I'm observing the same (with changeset 10936) ...


  # mkfile 2g /var/tmp/tank.img
  # zpool create tank /var/tmp/tank.img
  # zfs set dedup=on tank
  # zfs create tank/foobar
  

This has to do with the fact that dedup space accounting is charged to all
filesystems, regardless of whether blocks are deduped.  To do otherwise is
impossible, as there is no true "owner" of a block, and the fact that it may
or may not be deduped is often beyond the control of a single filesystem.

This has some interesting pathologies as the pool gets full.  Namely, that
ZFS will artificially enforce a limit on the logical size of the pool based
on non-deduped data.  This is obviously something that should be addressed.




Eric,

Many people (me included) perceive deduplication as a mean to save
disk space and allow more data to be squeezed into a storage. What you
are saying is that effectively ZFS dedup does a wonderful job in
detecting duplicate blocks and goes into all the trouble of removing
an extra copies and keep accounting of everything. However, when it
comes to letting me use the freed space I will be plainly denied to do
so. If that so, what would be the reason to use ZFS deduplication at
all ?

  


c'mon it is obviously a bug and not a design feature.
(it is I hope/think that is the case)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] marvell88sx2 driver build126

2009-11-03 Thread Tim Cook

On Mon, Nov 2, 2009 at 6:34 AM, Orvar Korvar  wrote:

> I have the same card and might have seen the same problem. Yesterday I
> upgraded to b126 and started to migrate all my data to 8 disc raidz2
> connected to such a card. And suddenly ZFS reported checksum errors. I
> thought the drives were faulty. But you suggest the problem could have been
> the driver? I also noticed that one of the drives had resilvered a small
> amount, just like yours.
>
> I now use b125 and there are no checksum errors. So, is there a bug in the
> new b126 driver?
>


Can any of you Sun folks comment on this?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread David Dyer-Bennet

On Tue, November 3, 2009 10:32, Bartlomiej Pelc wrote:
> Well, then you could have more "logical space" than "physical space", and
> that would be extremely cool, but what happens if for some reason you
> wanted to turn off dedup on one of the filesystems? It might exhaust all
> the pool's space to do this. I think good idea would be another
> pool's/filesystem's property, that when turned on, would allow allocating
> more "logical data" than pool's capacity, but then you would accept risks
> that involve it. Then administrator could decide which is better for his
> system.

Compression has the same issues; how is that handled?  (Well, except that
compression is limited to the filesystem, it doesn't have cross-filesystem
interactions.)  They ought to behave the same with regard to reservations
and quotas unless there is a very good reason for a difference.

Generally speaking, I don't find "but what if you turned off dedupe?" to
be a very important question.  Or rather, I consider it such an important
question that I'd have to consider it very carefully in light of the
particular characteristics of a particular pool; no GENERAL answer is
going to be generally right.

Reserving physical space for blocks not currently stored seems like the
wrong choice; it violates my expectations, and goes against the purpose of
dedupe, which as I understand it is to save space so you can use it for
other things.  It's obvious to me that changing the dedupe setting (or the
compression setting) would have consequences on space use, and it seems
natural that I as the sysadmin am on the hook for those consequences. 
(I'd expect to find in the documentation explanations of what things I
need to consider and how to find the detailed data to make a rational
decision in any particular case.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs mount error

2009-11-03 Thread Ramin Moazeni


Hello

A customer recently had a power outage.  Prior to the outage, they did a 
graceful shutdown of their system.

On power-up, the system is not coming up due to zfs errors as follows:
cannot mount 'rpool/export': Number of symbolic links encountered during 
path name traversal exceeds MAXSYMLINKS

mount '/export/home': failed to create mountpoint.

The possible cause of this might be that a symlink is created pointing 
to itself since the customer stated
that they created lots of symlink to get their env ready. However, since 
/export is not getting mounted, they

can not go back and delete/fix the symlinks.

Can someone suggest a way to fix this issue?

Thanks
Ramin Moazeni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris disk confusion ?

2009-11-03 Thread Jeremy Kister


On 11/3/2009 3:49 PM, Marion Hakanson wrote:

If the disk is going to be part of whole-disk zpool, I like to make
sure there is not an old VTOC-style partition table on there.  That
can be done either via some "format -e" commands, or with "fdisk -E",
to put an EFI label on there.



unfortunately, fdisk won't help me at all:
# fdisk -E /dev/rdsk/c12t1d0p0
# zpool create -f testp c12t1d0
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c3t11d0s0 is part of active ZFS pool dbzpool. Please see zpool(1M).

and i can't find anything in format that lets me do anything:
# format -e c12t1d0
selecting c12t1d0
[disk formatted]
/dev/dsk/c3t11d0s0 is part of active ZFS pool dbzpool. Please see zpool(1M).
[...]
format> label
Cannot label disk when partitions are in use as described.


I wonder if getting my hands on a pre-sol10 x86 format binary would help...

--

Jeremy Kister
http://jeremy.kister.net./
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Eric Schrock



On Nov 3, 2009, at 12:24 PM, Cyril Plisko wrote:


I think I'm observing the same (with changeset 10936) ...


  # mkfile 2g /var/tmp/tank.img
  # zpool create tank /var/tmp/tank.img
  # zfs set dedup=on tank
  # zfs create tank/foobar


This has to do with the fact that dedup space accounting is charged  
to all
filesystems, regardless of whether blocks are deduped.  To do  
otherwise is
impossible, as there is no true "owner" of a block, and the fact  
that it may
or may not be deduped is often beyond the control of a single  
filesystem.


This has some interesting pathologies as the pool gets full.   
Namely, that
ZFS will artificially enforce a limit on the logical size of the  
pool based
on non-deduped data.  This is obviously something that should be  
addressed.




Eric,

Many people (me included) perceive deduplication as a mean to save
disk space and allow more data to be squeezed into a storage. What you
are saying is that effectively ZFS dedup does a wonderful job in
detecting duplicate blocks and goes into all the trouble of removing
an extra copies and keep accounting of everything. However, when it
comes to letting me use the freed space I will be plainly denied to do
so. If that so, what would be the reason to use ZFS deduplication at
all ?


Please read my response before you respond.  What do you think "this  
is obviously something that should be addressed" means?  There is  
already a CR filed and the ZFS team is working on it.


- Eric




--
Regards,
   Cyril


--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris disk confusion ?

2009-11-03 Thread Marion Hakanson

>I said:
>> You'll need to give the same "dd" treatment to the end of the disk as well;
>> ZFS puts copies of its labels at the beginning and at the end.  Oh, and

zfs...@jeremykister.com said:
> im not sure what you mean here - I thought p0 was the entire disk in x86 -
> and s2 was the whole disk in the partition.  what else should i overwrite? 

Sorry, yes, you did get the whole slice overwritten.  Most people just
add a "count=10" or something similar, to overwrite the beginning of
the drive, but your invocation would overwrite the whole thing.

If the disk is going to be part of whole-disk zpool, I like to make
sure there is not an old VTOC-style partition table on there.  That
can be done either via some "format -e" commands, or with "fdisk -E",
to put an EFI label on there.

Anyway, I agree with the desire for "zpool" to be able to do this
itself, with less possibility of human error in partitioning, etc.
Glad to hear there's already an RFE filed for it.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread Nils Goroll


Hi Cyril,


But: Isn't there an implicit expectation for a space guarantee associated
with a dataset? In other words, if a dataset has 1GB of data, isn't it
natural to expect to be able to overwrite that space with other data? One


I'd say that expectation is not [always] valid. Assume you have a
dataset of 1GB of data and the pool free space is 200 MB. You are
cloning that dataset and trying to overwrite the data on the cloned
dataset. You will hit "no more space left on device" pretty soon.
Wonders of virtualization :)


The point I wanted to make is that by defining a (ref)reservation for that 
clone, ZFS won't even create it if space does not suffice:


r...@haggis:~# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
rpool   416G   187G   229G44%  ONLINE  -

r...@haggis:~# zfs clone -o refreservation=230g 
rpool/export/home/slink/t...@zfs-auto-snap:frequent-2009-11-03-22:04:46 rpool/test

cannot create 'rpool/test': out of space

I don't see how a similar guarantee could be given with de-dup.

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] More Dedupe Questions...

2009-11-03 Thread Darren J Moffat


Kyle McDonald wrote:

Hi Darren,

More below...

Darren J Moffat wrote:

Tristan Ball wrote:

Obviously sending it deduped is more efficient in terms of bandwidth 
and CPU time on the recv side, but it may also be more complicated to 
achieve?



A stream can be deduped even if the on disk format isn't and vice versa.

Is the send dedup'ing more efficient if the filesystem is already 
depdup'd? If both are enabled do they share anything?


ZFS send deduplication is still in development so I'd rather let the 
engineers working on it say what they are doing if they wish to.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe is in

2009-11-03 Thread Darren J Moffat


Trevor Pretty wrote:



Darren J Moffat wrote:

Orvar Korvar wrote:
  

I was under the impression that you can create a new zfs dataset and turn on 
the dedup functionality, and copy your data to it. Or am I wrong?



you don't even have to create a new dataset just do:

# zfs set dedup=on 
  
But like all ZFS functions will that not only get applied, when you 
(re)write (old)new data, like compression=on ?


Correct but if you are creating a new dataset you are writting new data 
anyway.



Which leads to the question would a scrub activate dedupe?


Not at this time now.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Where is green-bytes dedup code?

2009-11-03 Thread C. Bergström

Green-bytes is publicly selling their hardware and dedup solution 
today.  From the feedback of others with testing from someone on our 
team we've found the quality of the initial putback to be buggy and not 
even close to production ready.  (That's fine since nobody has stated it 
was production ready)


It brings up the question though of where is the green-bytes code?  They 
are obligated under the CDDL to release their changes *unless* they 
privately bought a license from Sun.  It seems the conflicts from the 
lawsuit may or may not be resolved, but still..


Where's the code?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread Nils Goroll


Well, then you could have more "logical space" than "physical space"


Reconsidering my own question again, it seems to me that the question of space 
management is probably more fundamental than I had initially thought, and I 
assume members of the core team will have thought through much of it.


I will try to share my thoughts and I would very much appreciate any corrections 
or additional explanations.


For dedup, my understanding at this point is that, first of all, every reference 
to dedup'ed data must be accounted to the respective dataset.


Obviously, a decision has been made to account that space as "used", rather than 
"referenced". I am trying to understand, why.


At first sight, referring to the definition of "used" space as being unique to 
the respective dataset, it would seem natural to account all de-duped space as 
"referenced". But this could lead to much space never being accounted as "used" 
anywhere (but for the pool). This would differs from the observed behavior from 
non-deduped datasets, where, to my understanding, all "referred" space is "used" 
by some other dataset. Despite being a little counter-intuitive, first I found 
this simple solution quite attractive, because it wouldn't alter the semantics 
of used vs. referenced space (under the assumption that my understanding is 
correct).


My understanding from Eric's explanation is that it has been decided to go an 
alternative route and account all de-duped space as "used" to all datasets 
referencing it because, in contrast to snapshots/clones, it is impossible (?) to 
differentiate between used and referred space for de-dup. Also, at first sight, 
this seems to be a way to keep the current semantics for (ref)reservations.


But while without de-dup, all the usedsnap and usedds values should roughly sum 
up to the pool used space, they can't with this concept - which is why I thought 
a solution could be to compensate for multiply accounted "used" space by 
artificially increasing the pool size.


Instead, from the examples given here, what seems to have been implemented with 
de-dup is to simply maintain space statistics for the pool on the basis of 
actually used space.


While one find it counter-intuitive that the used sizes of all 
datasets/snapshots will exceed the pool used size with de-dedup, if my 
understanding is correct, this design seems to be consistent.


I am very interested in the reasons why this particular approach has been chosen 
and why others have been dropped.



Now to the more general question: If all datasets of a pool contained the same 
data and got de-duped, the sums of their "used" space still seems to be limited 
by the "locical" pool size, as we've seen in examples given by Jürgen and others 
and, to get a benefit of de-dup, this implementation obviously needs to be changed.


But: Isn't there an implicit expectation for a space guarantee associated with a 
dataset? In other words, if a dataset has 1GB of data, isn't it natural to 
expect to be able to overwrite that space with other data? One might want to 
define space guarantees (like with (ref)reservation), but I don't see how those 
should work with the currently implemented concept.


Do we need something like a de-dup-reservation, which is substracted from the 
pool free space?



Thank you for reading,

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread Nils Goroll


> No point  in trying to preserve a naive mental model that

simply can't stand up to reality.


I kind of dislike the idea to talk about naiveness here.

Being able to give guarantees (in this case: reserve space) can be vital for 
running critical business applications. Think about the analogy in memory 
management (proper swap space reservation vs. the oom-killer).


But I realize that talking about an "implicit expectation" to give some 
motivation for reservations probably lead to some misunderstanding.


Sorry, Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread Jürgen Keil

> But: Isn't there an implicit expectation for a space guarantee associated 
> with a 
> dataset? In other words, if a dataset has 1GB of data, isn't it natural to 
> expect to be able to overwrite that space with other
> data?

Is there such a space guarantee for compressed or cloned zfs?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs mount error

2009-11-03 Thread Brent Jones

On Mon, Nov 2, 2009 at 1:34 PM, Ramin Moazeni  wrote:
> Hello
>
> A customer recently had a power outage.  Prior to the outage, they did a
> graceful shutdown of their system.
> On power-up, the system is not coming up due to zfs errors as follows:
> cannot mount 'rpool/export': Number of symbolic links encountered during
> path name traversal exceeds MAXSYMLINKS
> mount '/export/home': failed to create mountpoint.
>
> The possible cause of this might be that a symlink is created pointing to
> itself since the customer stated
> that they created lots of symlink to get their env ready. However, since
> /export is not getting mounted, they
> can not go back and delete/fix the symlinks.
>
> Can someone suggest a way to fix this issue?
>
> Thanks
> Ramin Moazeni
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

I see these very frequently on my systems, regardless of a clean
shutdown or not, 1/3 of the time filesystems cannot mount.

What I do, is boot into single user mode, make sure the filesystem in
question is NOT mounted, and just delete the directory that its trying
to mount into.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS dedup vs compression vs ZFS user/group quotas

2009-11-03 Thread Jorgen Lundman



We recently found that the ZFS user/group quota accounting for disk-usage worked 
"opposite" to what we were expecting. Ie, any space saved from compression was a 
benefit to the customer, not to us.


(We expected the Google style: Give a customer 2GB quota, and if compression 
saves space, that is profit to us)


Is the space saved with dedup charged in the same manner? I would expect so, I 
figured some of you would just know.  I will check when b128 is out.


I don't suppose I can change the model? :)

Lund

--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs mount error

2009-11-03 Thread Trevor Pretty





Ramin

I don't know but..

Is the error not from mount and it's /export/home that can't be
created? 

"mount '/export/home': failed to create mountpoint."

Have you tried  mounting 'rpool/export' somewhere else, ike .mnt?

Ramin Moazeni wrote:

  Hello

A customer recently had a power outage.  Prior to the outage, they did a 
graceful shutdown of their system.
On power-up, the system is not coming up due to zfs errors as follows:
cannot mount 'rpool/export': Number of symbolic links encountered during 
path name traversal exceeds MAXSYMLINKS
mount '/export/home': failed to create mountpoint.

The possible cause of this might be that a symlink is created pointing 
to itself since the customer stated
that they created lots of symlink to get their env ready. However, since 
/export is not getting mounted, they
can not go back and delete/fix the symlinks.

Can someone suggest a way to fix this issue?

Thanks
Ramin Moazeni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  




www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread Cyril Plisko

On Tue, Nov 3, 2009 at 10:54 PM, Nils Goroll  wrote:
> Now to the more general question: If all datasets of a pool contained the
> same data and got de-duped, the sums of their "used" space still seems to be
> limited by the "locical" pool size, as we've seen in examples given by
> Jürgen and others and, to get a benefit of de-dup, this implementation
> obviously needs to be changed.

Agreed.

>
> But: Isn't there an implicit expectation for a space guarantee associated
> with a dataset? In other words, if a dataset has 1GB of data, isn't it
> natural to expect to be able to overwrite that space with other data? One

I'd say that expectation is not [always] valid. Assume you have a
dataset of 1GB of data and the pool free space is 200 MB. You are
cloning that dataset and trying to overwrite the data on the cloned
dataset. You will hit "no more space left on device" pretty soon.
Wonders of virtualization :)


-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs on multiple machines

2009-11-03 Thread Trevor Pretty





Miha 
If you do want multi-reader,
multi-writer block access (and not use iSCSI) then QFS is what you
want. 

http://www.sun.com/storage/management_software/data_management/qfs/features.xml

You can use ZFS pools are lumps of disk under SAM-QFS:-
https://blogs.communication.utexas.edu/groups/techteam/weblog/5e700/ 

I successfully mocked this up on VirtualBox on my laptop for a customer.

Trevor


Darren J Moffat wrote:

  Miha Voncina wrote:
  
  
Hi,

is it possible to link multiple machines into one storage pool using zfs?

  
  
Depends what you mean by this.

Multiple machines can not import the same ZFS pool at the same time, 
doing so *will* cause corruption and ZFS tries hard to protect against 
multiple imports.

However ZFS can use iSCSI LUNs from multiple target machines for its 
disks that make up a given pool.

ZFS volumes (ZVOLS) can also be used as iSCSI targets and thus shared 
out to multiple machines.

ZFS file systems can be shared over NFS and CIFS and thus shared by 
multiple machines.

ZFS pools can be used in a Sun Cluster configuration but will only 
imported into a single node of a Sun Cluster configuration at a time.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  




www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread David Dyer-Bennet

On Tue, November 3, 2009 16:36, Nils Goroll wrote:
>  > No point  in trying to preserve a naive mental model that
>> simply can't stand up to reality.
>
> I kind of dislike the idea to talk about naiveness here.

Maybe it was a poor choice of words; I mean something more along the lines
of "simplistic".  The point is, "space" is no longer as simple a concept
as it was 40 years ago.  Even without deduplication, there is the
possibility of clones and compression causing things not to behave the
same way a simple filesystem on a hard drive did long ago.

> Being able to give guarantees (in this case: reserve space) can be vital
> for
> running critical business applications. Think about the analogy in memory
> management (proper swap space reservation vs. the oom-killer).

In my experience, systems that run on the edge of their resources and
depend on guarantees to make them work have endless problems, whereas if
they are not running on the edge of their resources, they work fine
regardless of guarantees.

For a very few kinds of embedded systems I can see the need to work to the
edges  (aircraft flight systems, for example), but that's not something
you do in a general-purpose computer with a general-purpose OS.

> But I realize that talking about an "implicit expectation" to give some
> motivation for reservations probably lead to some misunderstanding.
>
> Sorry, Nils

There's plenty of real stuff worth discussing around this issue, and I
apologize for choosing a belittling term to express disagreement.  I hope
it doesn't derail the discussion.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedup question

2009-11-03 Thread Jeff Savit


On 11/ 2/09 07:42 PM, Craig S. Bell wrote:

I just stumbled across a clever visual representation of deduplication:

http://loveallthis.tumblr.com/post/166124704

It's a flowchart of the lyrics to "Hey Jude".  =-)

Nothing is compressed, so you can still read all of the words.  Instead, all of 
the duplicates have been folded together.   -cheers, CSB
  
This should reference the prior (April 1, 1984) research by Donald Knuth 
at http://www.cs.utexas.edu/users/arvindn/misc/knuth_song_complexity.pdf  


:-) Jeff

--
Jeff Savit
Principal Field Technologist
Sun Microsystems, Inc.Phone: 732-537-3451 (x63451)
2398 E Camelback Rd   Email: jeff.sa...@sun.com
Phoenix, AZ  85016http://blogs.sun.com/jsavit/ 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Where is green-bytes dedup code?

2009-11-03 Thread Tim Cook

On Tuesday, November 3, 2009, "C. Bergström"  wrote:
> Green-bytes is publicly selling their hardware and dedup solution today.  
> From the feedback of others with testing from someone on our team we've found 
> the quality of the initial putback to be buggy and not even close to 
> production ready.  (That's fine since nobody has stated it was production 
> ready)
>
> It brings up the question though of where is the green-bytes code?  They are 
> obligated under the CDDL to release their changes *unless* they privately 
> bought a license from Sun.  It seems the conflicts from the lawsuit may or 
> may not be resolved, but still..
>
> Where's the code?



I highly doubt you're going to get any commentary from sun engineers
on pending litigation.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] MPxIO and removing physical devices

2009-11-03 Thread Karl Katzke

I am a bit of a Solaris newbie. I have a brand spankin' new Solaris 10u8 
machine (x4250) that is running an attached J4400 and some internal drives. 
We're using multipathed SAS I/O (enabled via stmsboot), so the device mount 
points have been moved off from their "normal" c0t5d0 to long strings -- in the 
case of c0t5d0, it's now /dev/rdsk/c6t5000CCA00A274EDCd0. (I can see the 
cross-referenced devices with stmsboot -L.)

Normally, when replacing a disk on a Solaris system, I would run cfgadm -c 
unconfigure c0::dsk/c0t5d0. However, cfgadm -l does not list c6, nor does it 
list any disks. In fact, running cfgadm against the places where I think things 
are supposed to live gets me the following:

bash# cfgadm -l /dev/rdsk/c0t5d0
Ap_Id Type Receptacle Occupant Condition
/dev/rdsk/c0t5d0: No matching library found

bash# cfgadm -l /dev/rdsk/c6t5000CCA00A274EDCd0
cfgadm: Attachment point not found

bash# cfgadm -l /dev/dsk/c6t5000CCA00A274EDCd0
Ap_Id  Type Receptacle   Occupant Condition
/dev/dsk/c6t5000CCA00A274EDCd0: No matching library found

bash# cfgadm -l c6t5000CCA00A274EDCd0
Ap_Id Type Receptacle Occupant Condition
c6t5000CCA00A274EDCd0: No matching library found

I ran devfsadm -C -v and it removed all of the old attachment points for the 
/dev/dsk/c0t5d0 devices and created some for the c6 devices. Running cfgadm -al 
shows a c0, c4, and c5 -- these correspond to the actual controllers, but no 
devices are attached to the controllers. 

I found an old email on this list about MPxIO that said the solution was 
basically to yank the physical device after making sure that no I/O was 
happening to it. While this worked and allowed us to return the device to 
service as a spare in the zpool it inhabits, more concerning was what happened 
when we ran mpathadm list lu after yanking the device and returning it to 
service: 

-- 

bash# mpathadm list lu
/dev/rdsk/c6t5000CCA00A2A9398d0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A29EE2Cd0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A2BDBFCd0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A2A8E68d0s2
Total Path Count: 1
Operational Path Count: 1
/dev/rdsk/c6t5000CCA00A0537ECd0s2
Total Path Count: 1
Operational Path Count: 1
mpathadm: Error: Unable to get configuration information.
mpathadm: Unable to complete operation

(Side note: Some of the disks are single path via an internal controller, and 
some of them are multi path in the J4400  via two external controllers.) 

A reboot fixed the 'issue' with mpathadm and it now outputs complete data. 

 

So -- how do I administer and remove physical devices that are in 
multipath-managed controllers on Solaris 10u8 without breaking multipath and 
causing configuration changes that interfere with the services and devices 
attached via mpathadm and the other voodoo and black magic inside? I can't seem 
to find this documented anywhere, even if the instructions to enable 
multipathing with stmsboot -e were quite complete and worked well! 

Thanks,
Karl Katzke



-- 

Karl Katzke
Systems Analyst II
TAMU - RGS


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Jürgen Keil

> Well, then you could have more "logical space" than
> "physical space", and that would be extremely cool,

I think we already have that, with zfs clones.

I often clone a zfs onnv workspace, and everything
is "deduped" between zfs parent snapshot and clone
filesystem.  The clone (initially) needs no extra zpool
space.

And with zfs clone I can actually use all
the remaining free space from the zpool.

With zfs deduped blocks, I can't ...

> but what happens if for some reason you wanted to
> turn off dedup on one of the filesystems? It might
> exhaust all the pool's space to do this.

As far as I understand it, nothing happens for existing
deduped blocks when you turn off dedup for a zfs
filesystem.  The new dedup=off setting is affecting
new written blocks only.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread mano vasilakis

I'm fairly new to all this and I think that is the intended behavior.
Also from my limited understanding I believe dedup behavior it would
significantly cut down on access times.
For the most part though this is such new code that I would wait abit to see
where they take it.


On Tue, Nov 3, 2009 at 3:24 PM, Cyril Plisko wrote:

> >>> I think I'm observing the same (with changeset 10936) ...
> >>
> >>   # mkfile 2g /var/tmp/tank.img
> >>   # zpool create tank /var/tmp/tank.img
> >>   # zfs set dedup=on tank
> >>   # zfs create tank/foobar
> >
> > This has to do with the fact that dedup space accounting is charged to
> all
> > filesystems, regardless of whether blocks are deduped.  To do otherwise
> is
> > impossible, as there is no true "owner" of a block, and the fact that it
> may
> > or may not be deduped is often beyond the control of a single filesystem.
> >
> > This has some interesting pathologies as the pool gets full.  Namely,
> that
> > ZFS will artificially enforce a limit on the logical size of the pool
> based
> > on non-deduped data.  This is obviously something that should be
> addressed.
> >
>
> Eric,
>
> Many people (me included) perceive deduplication as a mean to save
> disk space and allow more data to be squeezed into a storage. What you
> are saying is that effectively ZFS dedup does a wonderful job in
> detecting duplicate blocks and goes into all the trouble of removing
> an extra copies and keep accounting of everything. However, when it
> comes to letting me use the freed space I will be plainly denied to do
> so. If that so, what would be the reason to use ZFS deduplication at
> all ?
>
>
> --
> Regards,
> Cyril
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread Cyril Plisko

>>> I think I'm observing the same (with changeset 10936) ...
>>
>>   # mkfile 2g /var/tmp/tank.img
>>   # zpool create tank /var/tmp/tank.img
>>   # zfs set dedup=on tank
>>   # zfs create tank/foobar
>
> This has to do with the fact that dedup space accounting is charged to all
> filesystems, regardless of whether blocks are deduped.  To do otherwise is
> impossible, as there is no true "owner" of a block, and the fact that it may
> or may not be deduped is often beyond the control of a single filesystem.
>
> This has some interesting pathologies as the pool gets full.  Namely, that
> ZFS will artificially enforce a limit on the logical size of the pool based
> on non-deduped data.  This is obviously something that should be addressed.
>

Eric,

Many people (me included) perceive deduplication as a mean to save
disk space and allow more data to be squeezed into a storage. What you
are saying is that effectively ZFS dedup does a wonderful job in
detecting duplicate blocks and goes into all the trouble of removing
an extra copies and keep accounting of everything. However, when it
comes to letting me use the freed space I will be plainly denied to do
so. If that so, what would be the reason to use ZFS deduplication at
all ?


-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread David Dyer-Bennet

On Tue, November 3, 2009 15:06, Cyril Plisko wrote:
> On Tue, Nov 3, 2009 at 10:54 PM, Nils Goroll  wrote:

>> But: Isn't there an implicit expectation for a space guarantee
>> associated
>> with a dataset? In other words, if a dataset has 1GB of data, isn't it
>> natural to expect to be able to overwrite that space with other data?
>> One
>
> I'd say that expectation is not [always] valid. Assume you have a
> dataset of 1GB of data and the pool free space is 200 MB. You are
> cloning that dataset and trying to overwrite the data on the cloned
> dataset. You will hit "no more space left on device" pretty soon.
> Wonders of virtualization :)

Yes, and the same is true potentially with compression as well; if the old
data blocks are actually deleted and freed up (meaning no snapshots or
other things keeping them around), the new data still may not fit in those
blocks due to differing compression based on what the data actually is.

So that's a bit of assumption we're just going to have to get over making
in general.  No point  in trying to preserve a naive mental model that
simply can't stand up to reality.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] virsh troubling zfs!?

2009-11-03 Thread Ralf Teckelmann

Hi and hello,

I have a problem confusing me. I hope someone can help me with it.
I followed a "best practise" - I think - using dedicated zfs filesystems for my 
virtual machines.
Commands (for completion):
[i]zfs create rpool/vms[/i]
[i]zfs create rpool/vms/vm1[/i]
[i] zfs create -V 10G rpool/vms/vm1/vm1-dsk[/i]

This command creates the file system [i]/rpool/vms/vm1/vm1-dsk[/i] and the 
according [i]/dev/zvol/dsk/rpool/vms/vm1/vm1-dsk[/i].

If I delete a VM i set up using this filesystem via[i] virsh undefine vm1[/i] 
the [i]/rpool/vms/vm1/vm1-dsk[/i] gets also deleted, but the 
[i]/dev/zvol/dsk/rpool/vms/vm1/vm1-dsk[/i] is left.

Without [i]/rpool/vms/vm1/vm1-dsk[/i] I am not able to do [i]zfs destroy 
rpool/vms/vm1/vm1-dsk[/i] so the [i]/dev/zvol/dsk/rpool/vms/vm1/vm1-dsk[/i] 
could not be destroyed "and will be left forever"!? 

How can I get rid of this problem?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup accounting & reservations

2009-11-03 Thread Nils Goroll


Hi David,


simply can't stand up to reality.

I kind of dislike the idea to talk about naiveness here.


Maybe it was a poor choice of words; I mean something more along the lines
of "simplistic".  The point is, "space" is no longer as simple a concept
as it was 40 years ago.  Even without deduplication, there is the
possibility of clones and compression causing things not to behave the
same way a simple filesystem on a hard drive did long ago.


Thanks for emphasizing this again - I do absolutely agree that with today's 
technologies proper monitoring and proactive management is much more important 
than ever before.


But, again, risks can be reduced.


Being able to give guarantees (in this case: reserve space) can be vital
for
running critical business applications. Think about the analogy in memory
management (proper swap space reservation vs. the oom-killer).


In my experience, systems that run on the edge of their resources and
depend on guarantees to make them work have endless problems, whereas if
they are not running on the edge of their resources, they work fine
regardless of guarantees.


Agree. But what if things go wrong and a process eats up all your storage in 
error? If it's got its own dataset and you've used a reservation for your 
critical application on another dataset, you have a higher chance of surviving.



There's plenty of real stuff worth discussing around this issue, and I
apologize for choosing a belittling term to express disagreement.  I hope
it doesn't derail the discussion.


It certainly won't on my side. Thank you for the clarification.

Thanks, Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedupe is in

2009-11-03 Thread Trevor Pretty







Darren J Moffat wrote:

  Orvar Korvar wrote:
  
  
I was under the impression that you can create a new zfs dataset and turn on the dedup functionality, and copy your data to it. Or am I wrong?

  
  
you don't even have to create a new dataset just do:

# zfs set dedup=on 
  

But like all ZFS functions will that not only get applied, when you
(re)write (old)new data, like compression=on ?

Which leads to the question would a scrub activate dedupe?







www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS non-zero checksum and permanent error with deleted file

2009-11-03 Thread Steven Samuel Cole


Hello,

I am actually using ZFS under FreeBSD, but maybe someone over here can 
help me anyway. I'd like some advice if I still can rely on one of my 
ZFS pools:


[u...@host ~]$ sudo zpool clear zpool01
  ...
[u...@host ~]$ sudo zpool scrub zpool01
  ...
[u...@host ~]$ sudo zpool status -v zpool01
  pool: zpool01
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zpool01 ONLINE   0 0 4
  raidz1ONLINE   0 0 4
ad12ONLINE   0 0 0
ad14ONLINE   0 0 0
ad16ONLINE   0 0 0
ad18ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

zpool01:<0x3736a>


How can there be an error in a file that does not seem to exist ?
How can I clear / recover from the error ?

I have read the corresponding documentation and did the obligatory 
research, but so far, the only option I can see is a full destroy/create 
cycle - which seems an overkill, considering the pool size and the fact 
that there seems to be only one (deleted ?) file involved.


[u...@host ~]$ df -h /mnt/zpool01/
FilesystemSizeUsed   Avail Capacity  Mounted on
zpool01   1.3T1.2T133G90%/mnt/zpool01

[u...@host ~]$ uname -a
FreeBSD host.domain 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Fri May  1 
07:18:07 UTC 2009 
r...@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64


Cheers,

ssc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS dedup issue

2009-11-03 Thread George Wilson


Eric Schrock wrote:


On Nov 3, 2009, at 12:24 PM, Cyril Plisko wrote:


I think I'm observing the same (with changeset 10936) ...


  # mkfile 2g /var/tmp/tank.img
  # zpool create tank /var/tmp/tank.img
  # zfs set dedup=on tank
  # zfs create tank/foobar


This has to do with the fact that dedup space accounting is charged 
to all
filesystems, regardless of whether blocks are deduped.  To do 
otherwise is
impossible, as there is no true "owner" of a block, and the fact that 
it may
or may not be deduped is often beyond the control of a single 
filesystem.


This has some interesting pathologies as the pool gets full.  Namely, 
that
ZFS will artificially enforce a limit on the logical size of the pool 
based
on non-deduped data.  This is obviously something that should be 
addressed.




Eric,

Many people (me included) perceive deduplication as a mean to save
disk space and allow more data to be squeezed into a storage. What you
are saying is that effectively ZFS dedup does a wonderful job in
detecting duplicate blocks and goes into all the trouble of removing
an extra copies and keep accounting of everything. However, when it
comes to letting me use the freed space I will be plainly denied to do
so. If that so, what would be the reason to use ZFS deduplication at
all ?


Please read my response before you respond.  What do you think "this is 
obviously something that should be addressed" means?  There is already a 
CR filed and the ZFS team is working on it.


We have a fix for this and it should be available in a couple of days.

- George



- Eric




--
Regards,
   Cyril


--
Eric Schrock, Fishworks
http://blogs.sun.com/eschrock




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

59 matches

Mail list logo