Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Aubrey Li
On Mon, Mar 26, 2012 at 4:33 PM,   wrote:
>
>>I'm migrating a webserver(apache+php) from RHEL to solaris. During the
>>stress testing comparison, I found under the same session number of client
>>request, CPU% is ~70% on RHEL while CPU% is full on solaris.
>
> Which version of Solaris is this?

This is Solaris 11.

>
>>The apache root documentation is apparently in zfs dataset. It looks like each
>>file system operation will run into zfs root mutex contention.
>>
>>Is this an expected behavior? If so, is there any zfs tunable to
>>address this issue?
>
> The zfs_root() function is the function which is used when a
> mountpoint is transversed.  Where is the apache document root
> directory and under which mountpoints?

I have an external storage server and I created a zfs pool on it.

# zpool list
NAME SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
export1  800G   277G   523G  34%  1.00x  ONLINE  -
rpool117G   109G  8.44G  92%  1.00x  ONLINE  -

this pool is mounted under /export, all of the apache documents.
are in this pool at /export/webserver/apache2/htdocs.

The php temporary folder is set to /tmp, which is tmpfs.

Thanks,
-Aubrey

>
> Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] volblocksize for VMware VMFS-5

2012-03-26 Thread Jim Klimov

2012-03-26 7:58, Yuri Vorobyev wrote:

Hello.

What the best practices for choosing ZFS volume volblocksize setting for
VMware VMFS-5?
VMFS-5 block size is 1Mb. Not sure how it corresponds with ZFS.

Setup details follow:
- 11 pairs of mirrors;
- 600Gb 15k SAS disks;
- SSDs for L2ARC and ZIL
- COMSTAR FC target;
- about 30 virtual machines, mostly Windows (so underlying file systems
is NTFS with 4k block)
- 3 ESXi hosts.

Also, i will glad to hear volume layout suggestions.
I see several options:
- one big zvol with size equal size of the pool;
- one big zvol with size of the pool-20% (to avoid fragmentation);
- several zvols (size?).

Thanks for attention.


You will still see fragmentation because that's the way
ZFS works never overwriting recently-live data. It will
try to combine new updates to the pool (a transaction
group, TXG) as few large writes if contiguous blocks of
free space permit. And yes, reserving some space unused
should help against fragmentation.

I asked on the list, but got no response, whether space
reserved as an unused zvol can be used for such antifrag
reservation (to forbid other datasets' writes into the
bytes otherwise unallocated).

Regarding the question on "many zvols": this can be useful.
For example, when using dedicated datasets (either zvols
or filesystem datasets) for each VM, you can easily clone
golden images into preconfigured VM guests on the ZFS
side with near-zero overhead (like Sun VDI does). You
can also easily expand a (cloned) zvol and then resize
the guest FS with its mechanisms, but shrinking is tough
(if at all possible) if you ever need it.
Also you could store VMs or their parts (i.e. their system
disks vs. data disks, or critical VMs vs. testing VMs) in
differently configured datasets (perhaps hosted on different
disks in the future, like 15k RPM vs. 7k RPM) - then it
would make sense to use different zvols (and pools).
Or perhaps if you'll make a backup store for Veem backup
or any other solution, and emulate a tape drive for bulk
storage, you might want a zvol with maximum volblocksize
while you might use something else for live VM images...

Also note that if you ever plan to use ZFS snapshots,
then in case of zvols the system will reserve another
zvol size when making even the first snapshot (i.e. you
have a 1Gb swap, if you make a snapshot even when it's
empty, reservation becomes 2Gb with 1Gb available to
user as the block device). This allows to completely
rewrite the zvol block device contents and guarantee
you'll have enough space to keep both the snapshot
and new data.

So if snapshots are planned, zvols shouldn't exceed
half of pool size. There is no such problem with
filesystem snapshots - those only use what has been
allocated and referred (not deleted) by some snapshot.

Also beware that if you use really small volblocksizes,
your pool's metadata to address the volume's blocks
would take a considerable amount of overhead compared
to your userdata (I'd expect roughly 1md:1ud with
minimal blocksize). I had such problems (and numbers)
a year ago and wrote on Sun Forums, parts of my woes
might make it to this list's archive.
Again, this should not be a big problem with files
because those use variable length blocks and tend to
use large ones when there is enough pending writes,
so metadata portion is smaller.

So... my counter-question to you and the list: are there
substantial benefits to using ZFS as an iSCSI/zvol/VMFS5
provider instead of publishing an NFS service and storing
VM images as files? Both resources can be shared to several
clients (ESX hosts). I think that for a number of reasons the
NFS/files variant is more flexible. What are its drawbacks?

I see that you plan to make a COMSTAR FC target, so that
networking nuance is one reason for iSCSI vs. "IP over FC
to make NFS"... but in general, over jumbo ethernet -
which tool suits the task better? :)

I heard that VMWare has some smallish limit on the number
of NFS connections, but 30 should be bearable...

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Jim Klimov

2012-03-26 14:27, Aubrey Li wrote:

The php temporary folder is set to /tmp, which is tmpfs.



By the way, how much RAM does the box have available?
"tmpfs" in Solaris is backed by "virtual memory".
It is like a RAM disk, although maybe slower than ramdisk
FS as seen in livecd, as long as there is enough free
RAM, but then tmpfs can get swapped out to disk.

If your installation followed the default wizard, your
swap (or part of it) could be in rpool/swap and backed
by ZFS - leading to both many tmpfs accesses and many
ZFS accesses. (No idea about the mutex spinning part
though - if it is normal or not).

Also I'm not sure if tmpfs gets the benefits of caching,
and ZFS ARC cache can consume lots of RAM and thus push
tmpfs out to swap.

As a random guess, try pointing PHP tmp directory to
/var/tmp (backed by zfs) and see if any behaviors change?

Good luck,
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Jim Mauro

You care about #2 and #3 because you are fixated on a ZFS root
lock contention problem, and not open to a broader discussion
about what your real problem actually is. I am not saying there is
not lock contention, and I am not saying there is - I'll look at the
data later carefully later when I have more time.

Your problem statement, which took 20 emails to glean, is the 
Solaris system consumes more CPU than Linux on the same
hardware, doing roughly the same amount of work, and delivering
roughly the same level of performance - is that correct?

Please consider that, in Linux, you have no observability into
kernel lock statistics (at least, known that I know of) - Linux uses kernel
locks also, and for this workload, it seems likely to me that could
you observe those statistics, you would see numbers that would 
lead you to conclude you have lock contention in Linux.

Let's talk about THE PROBLEM - Linux is 15% sys, 55% usr,
Solaris is 30% sys, 70% usr, running the same workload,
doing the same amount of work. delivering the same level
of performance. Please validate that problem statement.


On Mar 25, 2012, at 9:51 PM, Aubrey Li wrote:

> On Mon, Mar 26, 2012 at 4:18 AM, Jim Mauro  wrote:
>> If you're chasing CPU utilization, specifically %sys (time in the kernel),
>> I would start with a time-based kernel profile.
>> 
>> #dtrace -n 'profile-997hz /arg0/ { @[stack()] = count(); } tick-60sec { 
>> trunc(@, 20); printa(@0; }'
>> 
>> I would be curious to see where the CPU cycles are being consumed first,
>> before going down the lock path…
>> 
>> This assume that most or all of CPU utilization is %sys. If it's %usr, we 
>> take
>> a different approach.
>> 
> 
> Here is the output, I changed to "tick-5sec" and "trunc(@, 5)".
> No.2 and No.3 is what I care about.
> 
> Thanks,
> -Aubrey
> 
> 21  80536   :tick-5sec
> == 1 =
>  genunix`avl_walk+0x6a
>  genunix`as_gap_aligned+0x2b7
>  unix`map_addr_proc+0x179
>  unix`map_addr+0x8e
>  genunix`choose_addr+0x9e
>  zfs`zfs_map+0x161
>  genunix`fop_map+0xc5
>  genunix`smmap_common+0x268
>  genunix`smmaplf32+0xa2
>  genunix`syscall_ap+0x92
>  unix`_sys_sysenter_post_swapgs+0x149
> 1427
> 
> = 2 =
>  unix`mutex_delay_default+0x7
>  unix`mutex_vector_enter+0x2ae
>  zfs`zfs_zget+0x46
>  zfs`zfs_root+0x55
>  genunix`fsop_root+0x2d
>  genunix`traverse+0x65
>  genunix`lookuppnvp+0x446
>  genunix`lookuppnatcred+0x119
>  genunix`lookupnameatcred+0x97
>  genunix`lookupnameat+0x6b
>  genunix`vn_openat+0x147
>  genunix`copen+0x493
>  genunix`openat64+0x2d
>  unix`_sys_sysenter_post_swapgs+0x149
> 2645
> 
> == 3 =
>  unix`mutex_delay_default+0x7
>  unix`mutex_vector_enter+0x2ae
>  zfs`zfs_zget+0x46
>  zfs`zfs_root+0x55
>  genunix`fsop_root+0x2d
>  genunix`traverse+0x65
>  genunix`lookuppnvp+0x446
>  genunix`lookuppnatcred+0x119
>  genunix`lookupnameatcred+0x97
>  genunix`lookupnameat+0x6b
>  genunix`cstatat_getvp+0x11e
>  genunix`cstatat64_32+0x5d
>  genunix`fstatat64_32+0x4c
>  unix`_sys_sysenter_post_swapgs+0x149
> 3201
> 
>  4 ===
>  unix`i86_mwait+0xd
>  unix`cpu_idle_mwait+0x154
>  unix`idle+0x116
>  unix`thread_start+0x8
> 3559
> 
> = 5 ==
>  tmpfs`tmp_readdir+0x138
>  genunix`fop_readdir+0xe8
>  genunix`getdents64+0xd5
>  unix`_sys_sysenter_post_swapgs+0x149
> 4589
> 
> =  6 
>  unix`strlen+0x3
>  genunix`fop_readdir+0xe8
>  genunix`getdents64+0xd5
>  unix`_sys_sysenter_post_swapgs+0x149
> 5005
> 
> === 7 =
>  tmpfs`tmp_readdir+0xc7
>  genunix`fop_readdir+0xe8
>  genunix`getdents64+0xd5
>  unix`_sys_sysenter_post_swapgs+0x149
> 9548
> 
> 
> = 8 ===
>  unix`strlen+0x8
>  genunix`fop_readdir+0xe8
>  genunix`getdents64+0xd5
>  unix`_sys_sysenter_post_swapgs+0x149
>11166
> 
> 
> = 9 ===
>  unix`strlen+0xe
>  genunix`fop_readdir+0xe8
>  genunix`getdents64+0xd5
>  unix`_sys_sysenter_post_swapgs+0x149
>14491
> 
> =  10 ==
>  tmpfs`tmp_readdir+0xbe
>

Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Aubrey Li
On Mon, Mar 26, 2012 at 7:28 PM, Jim Klimov  wrote:
> 2012-03-26 14:27, Aubrey Li wrote:
>>
>> The php temporary folder is set to /tmp, which is tmpfs.
>>
>
> By the way, how much RAM does the box have available?
> "tmpfs" in Solaris is backed by "virtual memory".
> It is like a RAM disk, although maybe slower than ramdisk
> FS as seen in livecd, as long as there is enough free
> RAM, but then tmpfs can get swapped out to disk.
>
> If your installation followed the default wizard, your
> swap (or part of it) could be in rpool/swap and backed
> by ZFS - leading to both many tmpfs accesses and many
> ZFS accesses. (No idea about the mutex spinning part
> though - if it is normal or not).
>
> Also I'm not sure if tmpfs gets the benefits of caching,
> and ZFS ARC cache can consume lots of RAM and thus push
> tmpfs out to swap.
>
> As a random guess, try pointing PHP tmp directory to
> /var/tmp (backed by zfs) and see if any behaviors change?
>
> Good luck,
> //Jim
>

Thanks for your suggestions. Actually the default PHP tmp directory
was /var/tmp, and I changed "/var/tmp" to "/tmp". This reduced zfs
root lock contention significantly. However, I still see a bunch of lock
contention. So I'm here ask for help.

Thanks,
-Aubrey
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Aubrey Li
On Mon, Mar 26, 2012 at 8:24 PM, Jim Mauro  wrote:
>
> You care about #2 and #3 because you are fixated on a ZFS root
> lock contention problem, and not open to a broader discussion
> about what your real problem actually is. I am not saying there is
> not lock contention, and I am not saying there is - I'll look at the
> data later carefully later when I have more time.
>
> Your problem statement, which took 20 emails to glean, is the
> Solaris system consumes more CPU than Linux on the same
> hardware, doing roughly the same amount of work, and delivering
> roughly the same level of performance - is that correct?
>
> Please consider that, in Linux, you have no observability into
> kernel lock statistics (at least, known that I know of) - Linux uses kernel
> locks also, and for this workload, it seems likely to me that could
> you observe those statistics, you would see numbers that would
> lead you to conclude you have lock contention in Linux.
>
> Let's talk about THE PROBLEM - Linux is 15% sys, 55% usr,
> Solaris is 30% sys, 70% usr, running the same workload,
> doing the same amount of work. delivering the same level
> of performance. Please validate that problem statement.
>

You're definitely right.

I'm running the same workload, doing the same amount of work,
delivering the same level of performance on the same hardware
platform, even the root disk type are exactly the same.

So the userland software stack is exactly the same.
The defference is:
linuxis 15% sys, 55% usr.
solaris is 30% sys, 70% usr.

Basically I agree with Fajar. This is probably not a fair comparison.
A robost system not only deliver highest performance, we should
consider reliability, availability and serviceability, and energy
efficiency and other server related features. No doubt ZFS is the
most excellent file system in the planet.

As Richard pointed out, if we look at mpstat output
==
SET minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl sze
 0 35140   0 2380 59742 19476 93056 30906 32919 256336 1104 967806 65
35   0   0  32

Including
smtx 256336: spins on mutex

I need to look at

icsw 30906:  involuntary context switches
migr 32919:  thread migration
syscl 967806: system calls

And given that Solaris consumes 70% CPU on the userland,
I probably need to break down how long it is in apache, libphp, libc,
etc. (what's your approach for usr% you mentioned above? BTW)

I am not open to a broader discussion because this is zfs mailing list.
zfs root lock contention is what I observed so far I can post on this forum.
I can take care of other aspects and may ask for help somewhere else.

I admit I didn't dig into linux, I agree there could be lock contention as well.
But since solaris consumes more CPU% and hence more system power
than linux, I think I have to look at solaris first, to see if there are any
tuning works need to be done.

Do you agree this is the right way to go ahead?

Thanks,
-Aubrey
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Richard Elling
I see nothing unusual in the lockstat data. I think you're barking up
the wrong tree.
 -- richard

On Mar 25, 2012, at 10:51 PM, Aubrey Li wrote:

> On Mon, Mar 26, 2012 at 1:19 PM, Richard Elling
>  wrote:
>> Apologies to the ZFSers, this thread really belongs elsewhere.
>> 
>> Let me explain below:
>> 
>> Root documentation path of apache is in zfs, you see
>> it at No.3 at the above dtrace report.
>> 
>> 
>> The sort is in reverse order. The large number you see below the
>> stack trace is the number of times that stack was seen. By far the
>> most frequently seen is tmpfs`tmp_readdir
> 
> It's true, but I didn't see any potential issues there.
> 
>> 
>> 
>> tmpfs(/tmp) is the place where PHP place the temporary
>> folders and files.
>> 
>> 
>> bingo
>> 
> 
> I have to paste the lock investigation here again,
> 
> Firstly, you can see which lock is spinning:
> ===
> # lockstat -D 10 sleep 2
> 
> Adaptive mutex spin: 1862701 events in 3.678 seconds (506499 events/sec)
> 
> Count indv cuml rcnt nsec Lock   Caller
> ---
> 829064  45%  45% 0.0033280 0xff117624a5d0 rrw_enter_read+0x1b
> 705001  38%  82% 0.0030983 0xff117624a5d0 rrw_exit+0x1d
> 140678   8%  90% 0.0010546 0xff117624a6e0 zfs_zget+0x46
> 37208   2%  92% 0.00 5403 0xff114b136840 vn_rele+0x1e
> 33926   2%  94% 0.00 5417 0xff114b136840 lookuppnatcred+0xc5
> 27188   1%  95% 0.00 1155 vn_vfslocks_buckets+0xd980
> vn_vfslocks_getlock+0x3b
> 11073   1%  96% 0.00 1639 vn_vfslocks_buckets+0x4600
> vn_vfslocks_getlock+0x3b
> 9321   1%  96% 0.00 1961 0xff114b82a680 dnlc_lookup+0x83
> 6929   0%  97% 0.00 1590 0xff11573b8f28
> zfs_fastaccesschk_execute+0x6a
> 5746   0%  97% 0.00 5935 0xff114b136840 lookuppnvp+0x566
> ---
> 
> Then if you look at the caller of lock(0xff117624a5d0), you'll see
> it's ZFS, not tmpfs.
> 
> Count indv cuml rcnt nsec Lock   Caller
> 48494   6%  17% 0.00   145263 0xff117624a5d0 rrw_enter_read+0x1b
> 
>  nsec -- Time Distribution -- count Stack
>   256 |   17rrw_enter+0x2c
>   512 |   1120  zfs_root+0x3b
>  1024 |@  1718  fsop_root+0x2d
>  2048 |@@ 4834  traverse+0x65
>  4096 |@@@18569 lookuppnvp+0x446
>  8192 |   6620  lookuppnatcred+0x119
> 16384 |@  2929  lookupnameatcred+0x97
> 32768 |@  1635  lookupnameat+0x6b
> 65536 |   894   cstatat_getvp+0x11e
>131072 |   1249  cstatat64_32+0x5d
>262144 |@  1620  fstatat64_32+0x4c
>524288 |@  2474
> _sys_sysenter_post_swapgs+0x149
> 
> 
> That's why I post this subject here. Hope it's clear this time.
> 
> Thanks,
> -Aubrey

--
DTrace Conference, April 3, 2012, 
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Good tower server for around 1,250 USD?

2012-03-26 Thread John D Groenveld
In message , Bob 
Friesenhahn writes:
>Almost all of the systems listed on the HCL are defunct and no longer 
>purchasable except for on the used market.  Obtaining an "approved" 
>system seems very difficult. In spite of this, Solaris runs very well 
>on many non-approved modern systems.

http://www.oracle.com/webfolder/technetwork/hcl/data/s11ga/systems/views/nonoracle_systems_all_results.mfg.page1.html>

>I don't know what that means as far as the ability to purchase Solaris 
>"support".

I believe it must pass the HCTS before Oracle will support
Solaris running on third-party hardware.
http://www.oracle.com/webfolder/technetwork/hcl/hcts/index.html>

John
groenv...@acm.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Jim Klimov
> > As a random guess, try pointing PHP tmp directory to
> > /var/tmp (backed by zfs) and see if any behaviors change?
> >
> > Good luck,
> > //Jim
> >
> 
> Thanks for your suggestions. Actually the default PHP tmp directory
> was /var/tmp, and I changed "/var/tmp" to "/tmp". This reduced zfs
> root lock contention significantly. However, I still see a bunch 
> of lock
> contention. So I'm here ask for help.

Well, as a further attempt down this road, is it possible for you to rule out
ZFS from swapping - i.e. if RAM amounts permit, disable the swap at all
(swap -d /dev/zvol/dsk/rpool/swap) or relocate it to dedicated slices of
same or better yet separate disks?
 
If you do have lots of swapping activity (that can be seen in "vmstat 1" as 
si/so columns) going on in a zvol, you're likely to get much fragmentation
in the pool, and searching for contiguous stretches of space can become
tricky (and time-consuming), or larger writes can get broken down into
many smaller random writes and/or "gang blocks", which is also slower.
At least such waiting on disks can explain the overall large kernel times.
 
You can also see the disk wait times ratio in "iostat -xzn 1" column "%w"
and disk busy times ratio in "%b" (second and third from the right).
I dont't remember you posting that.
 
If these are accounting in tens, or even close or equal to 100%, then
your disks are the actual bottleneck. Speeding up that subsystem, 
including addition of cache (ARC RAM, L2ARC SSD, maybe ZIL 
SSD/DDRDrive) and combatting fragmentation by moving swap and 
other scratch spaces to dedicated pools or raw slices might help.

HTH,
//Jim
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] volblocksize for VMware VMFS-5

2012-03-26 Thread Richard Elling
On Mar 25, 2012, at 8:58 PM, Yuri Vorobyev wrote:

> Hello.
> 
> What the best practices for choosing ZFS volume volblocksize setting for 
> VMware VMFS-5?
> VMFS-5 block size is 1Mb. Not sure how it corresponds with ZFS.

Zero correlation. 

What I see on the wire from VMFS is 16KB random reads followed by 4KB random 
writes.
VMFS reads the same set of 16KB data again and again, so it tends to get nicely 
cached
in the MFU portion of the ARC.

> 
> Setup details follow:
> - 11 pairs of mirrors;
> - 600Gb 15k SAS disks;
> - SSDs for L2ARC and ZIL
> - COMSTAR FC target;
> - about 30 virtual machines, mostly Windows (so underlying file systems is 
> NTFS with 4k block)
> - 3 ESXi hosts.
> 
> Also, i will glad to hear volume layout suggestions.
> I see several options:
> - one big zvol with size equal size of the pool;
> - one big zvol with size of the pool-20% (to avoid fragmentation);

A zvol with reservation is one way of implemented the reservations often seen in
other file systems.

> - several zvols (size?).

In general, for COMSTAR, more LUs is better.
 -- richard

--
DTrace Conference, April 3, 2012, 
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] test for holes in a file?

2012-03-26 Thread ольга крыжановская
How can I test if a file on ZFS has holes, i.e. is a sparse file,
using the C api?

Olga
-- 
      ,   _                                    _   ,
     { \/`o;-    Olga Kryzhanovska   -;o`\/ }
.'-/`-/     olga.kryzhanov...@gmail.com   \-`\-'.
 `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
      /\/\     Solaris/BSD//C/C++ programmer   /\/\
      `--`                                      `--`
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-26 Thread Mike Gerdts
2012/3/26 ольга крыжановская :
> How can I test if a file on ZFS has holes, i.e. is a sparse file,
> using the C api?

See SEEK_HOLE in lseek(2).

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-26 Thread ольга крыжановская
Mike, I was hoping that some one has a complete example for a bool
has_file_one_or_more_holes(const char *path) function.

Olga

2012/3/26 Mike Gerdts :
> 2012/3/26 ольга крыжановская :
>> How can I test if a file on ZFS has holes, i.e. is a sparse file,
>> using the C api?
>
> See SEEK_HOLE in lseek(2).
>
> --
> Mike Gerdts
> http://mgerdts.blogspot.com/



-- 
      ,   _                                    _   ,
     { \/`o;-    Olga Kryzhanovska   -;o`\/ }
.'-/`-/     olga.kryzhanov...@gmail.com   \-`\-'.
 `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
      /\/\     Solaris/BSD//C/C++ programmer   /\/\
      `--`                                      `--`
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-26 Thread Joerg Schilling
?   wrote:

> How can I test if a file on ZFS has holes, i.e. is a sparse file,
> using the C api?

See star .

ftp://ftp.berlios.de/pub/star/

or

http://hg.berlios.de/repos/schillix-on/file/e3829115a7a4/usr/src/cmd/star/hole.c

The interface was defined for star in September 2004, star added support in May 
2005 after the interface was implemented.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-26 Thread Andrew Gabriel
I just played and knocked this up (note the stunning lack of comments, 
missing optarg processing, etc)...

Give it a list of files to check...

#define _FILE_OFFSET_BITS 64

#include

#include

#include

#include

#include

int

main(int argc, char **argv)

{

int i;

for (i = 1; i<  argc; i++) {

int fd;

fd = open(argv[i], O_RDONLY);

if (fd<  0) {

perror(argv[i]);

} else {

off_t eof;

off_t hole;

if (((eof = lseek(fd, 0, SEEK_END))<  0) ||

lseek(fd, 0, SEEK_SET)<  0) {

perror(argv[i]);

} else if (eof == 0) {

printf("%s: empty\n", argv[i]);

} else {

hole = lseek(fd, 0, SEEK_HOLE);

if (hole<  0) {

perror(argv[i]);

} else if (hole<  eof) {

printf("%s: sparse\n", argv[i]);

} else {

printf("%s: not sparse\n", argv[i]);

}

}

close(fd);

}

}
return 0;

}


On 03/26/12 10:06 PM, ольга крыжановская wrote:

Mike, I was hoping that some one has a complete example for a bool
has_file_one_or_more_holes(const char *path) function.

Olga

2012/3/26 Mike Gerdts:

2012/3/26 ольга крыжановская:

How can I test if a file on ZFS has holes, i.e. is a sparse file,
using the C api?

See SEEK_HOLE in lseek(2).

--
Mike Gerdts
http://mgerdts.blogspot.com/





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-26 Thread Bob Friesenhahn

On Mon, 26 Mar 2012, Andrew Gabriel wrote:

I just played and knocked this up (note the stunning lack of comments, 
missing optarg processing, etc)...

Give it a list of files to check...


This is a cool program, but programmers were asking (and answering) 
this same question 20+ years ago before there was anything like 
SEEK_HOLE.


If file space usage is less than file directory size then it must 
contain a hole.  Even for compressed files, I am pretty sure that 
Solaris reports the uncompressed space usage.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-26 Thread Mike Gerdts
On Mon, Mar 26, 2012 at 6:18 PM, Bob Friesenhahn
 wrote:
> On Mon, 26 Mar 2012, Andrew Gabriel wrote:
>
>> I just played and knocked this up (note the stunning lack of comments,
>> missing optarg processing, etc)...
>> Give it a list of files to check...
>
>
> This is a cool program, but programmers were asking (and answering) this
> same question 20+ years ago before there was anything like SEEK_HOLE.
>
> If file space usage is less than file directory size then it must contain a
> hole.  Even for compressed files, I am pretty sure that Solaris reports the
> uncompressed space usage.

That's not the case.

# zfs create -o compression=on rpool/junk
# perl -e 'print "foo" x 10'> /rpool/junk/foo
# ls -ld /rpool/junk/foo
-rw-r--r--   1 root root  30 Mar 26 18:25 /rpool/junk/foo
# du -h /rpool/junk/foo
  16K   /rpool/junk/foo
# truss -t stat -v stat du  /rpool/junk/foo
...
lstat64("foo", 0x08047C40)  = 0
d=0x02B90028 i=8 m=0100644 l=1  u=0 g=0 sz=30
at = Mar 26 18:25:25 CDT 2012  [ 1332804325.742827733 ]
mt = Mar 26 18:25:25 CDT 2012  [ 1332804325.889143166 ]
ct = Mar 26 18:25:25 CDT 2012  [ 1332804325.889143166 ]
bsz=131072 blks=32fs=zfs

Notice that it says it has 32 512 byte blocks.

The mechanism you suggest does work for every other file system that
I've tried it on.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-26 Thread Richard Elling
On Mar 26, 2012, at 4:18 PM, Bob Friesenhahn wrote:

> On Mon, 26 Mar 2012, Andrew Gabriel wrote:
> 
>> I just played and knocked this up (note the stunning lack of comments, 
>> missing optarg processing, etc)...
>> Give it a list of files to check...
> 
> This is a cool program, but programmers were asking (and answering) this same 
> question 20+ years ago before there was anything like SEEK_HOLE.
> 
> If file space usage is less than file directory size then it must contain a 
> hole.  Even for compressed files, I am pretty sure that Solaris reports the 
> uncompressed space usage.

+1

Also, prior to ZFS, you could look at the length of the file (ls -l or stat 
struct st_size) and compare 
to the size (ls -ls or stat struct st_blocks). If length > size (unit adjusted, 
rounded up) then there are
holes.

In ZFS, this can be more difficult, because the size can be larger than the 
length (!)
due to copies. Also, if you have compression enabled, size can be < length.
 -- richard

--
DTrace Conference, April 3, 2012, 
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-26 Thread Aubrey Li
On Tue, Mar 27, 2012 at 1:15 AM, Jim Klimov  wrote:
> Well, as a further attempt down this road, is it possible for you to rule
> out
> ZFS from swapping - i.e. if RAM amounts permit, disable the swap at all
> (swap -d /dev/zvol/dsk/rpool/swap) or relocate it to dedicated slices of
> same or better yet separate disks?
>

Thanks Jim for your suggestion!


> If you do have lots of swapping activity (that can be seen in "vmstat 1" as
> si/so columns) going on in a zvol, you're likely to get much fragmentation
> in the pool, and searching for contiguous stretches of space can become
> tricky (and time-consuming), or larger writes can get broken down into
> many smaller random writes and/or "gang blocks", which is also slower.
> At least such waiting on disks can explain the overall large kernel times.

I took swapping activity into account, even when the CPU% is 100%, "si"
(swap-ins) and "so" (swap-outs) are always ZEROs.

>
> You can also see the disk wait times ratio in "iostat -xzn 1" column "%w"
> and disk busy times ratio in "%b" (second and third from the right).
> I dont't remember you posting that.
>
> If these are accounting in tens, or even close or equal to 100%, then
> your disks are the actual bottleneck. Speeding up that subsystem,
> including addition of cache (ARC RAM, L2ARC SSD, maybe ZIL
> SSD/DDRDrive) and combatting fragmentation by moving swap and
> other scratch spaces to dedicated pools or raw slices might help.

My storage system is not quite busy, and there are only read operations.
=
# iostat -xnz 3
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  112.40.0 1691.50.0  0.0  0.50.04.8   0  41 c11t0d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  118.70.0 1867.00.0  0.0  0.50.04.5   0  42 c11t0d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  127.70.0 2121.60.0  0.0  0.60.04.7   0  44 c11t0d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  141.30.0 2158.50.0  0.0  0.70.04.6   0  48 c11t0d0
==

Thanks,
-Aubrey
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss