Richard Elling wrote:
On Aug 11, 2009, at 7:39 AM, Ed Spencer wrote:
On Tue, 2009-08-11 at 07:58, Alex Lam S.L. wrote:
At a first glance, your production server's numbers are looking fairly
similar to the "small file workload" results of your development
server.
I thought you were saying that the development server has faster
performance?
The development serer was running only one cp -pr command.
The production mail sevrer was running two concurrent backup jobs and of
course the mail system, with each job having the same performance
throughput as if there were a single job running. The single threaded
backup jobs do not conflict with each other over performance.
Agree.
If we ran 20 concurrent backup jobs, overall performance would scale up
quite a bit. (I would guess between 5 and 10 times the performance). (I
just read Mike's post and will do some 'concurrency' testing).
Yes.
Users are currently evenly distributed over 5 filesystems (I previously
mentioned 7 but its really 5 filesystems for users and 1 for system
data, totalling 6, and one test filesystem).
We backup 2 filesystems on tuesday, 2 filesystems on thursday, and 2 on
saturday. We backup to disk and then clone to tape. Our backup people
can only handle doing 2 filesystems per night.
Creating more filesystems to increase the parallelism of our backup is
one solution but its a major redesign of the of the mail system.
Really? I presume this is because of the way you originally
allocated accounts to file systems. Creating file systems in ZFS is
easy, so could you explain in a new thread?
Ed, This would be a good idea.
This issue has been discussed many time on the iMS mailing list for the
Sun Messaging server
which as far as the way it stores messages on disk is very similar to
Cyrus. (in fact I think
it once was based on the same code base).
The upshot of it is what has been explained by Mike in that these type
of store create
millions of little files that Netbackup or Legato must walk over and
backup one after another
sequentially. This does not scale very well at all due to the reasons
explained by Mike.
The issue commonly discussed about on the iMS list has been one of file
system size. In general the rule
of thumb most people had for this was around 100 to 250 GB per file
system and lots of them to mostly
increase the parallelism in the backup process rather than for
performance gains in the
actually functionality of the application.
I, as a rule of thumb group my large users who have large mailboxes,
which in turn
have lots of large attachments into particular larger file system.
Students who have small
quotas and generally lots of small messages or small files in this case
into other smaller
file system. It really in this case is one size does not suit all. To
keep backups within
the time allocation, a bit of filesystem monitoring is useful. In the
days of UFS I used
to use a command like this to help make decisions.
[r...@xxx]#> df -F ufs -o i
Filesystem iused ifree %iused Mounted on
/dev/md/dsk/d0 605765 6674235 8% /
/dev/md/dsk/d50 2387509 28198091 8% /mail1
/dev/md/dsk/d70 2090768 30669232 6% /mail3
/dev/md/dsk/d60 2447548 30312452 7% /mail2
I used this to balance the inodes. My guess is that around 85-90% of the
inodes
in a messaging server store are files with the remainder directories.
Either way
it is a simple way to make sure the stores are reasonably balanced. I am
sure there
will be a good way to do this for ZFS?
Adding a second server to half the pool and thereby half the problem is
another solution (and we would also create more filesystems at the same
time).
It can be a good idea, but it really depends on how many file systems
you split
your message stores into. Also good for relocating message stores to if
the first server
fails. This of course depends on your message store architecture. Easy
to do with Sun
Messaging, not so sure about Cyrus. But I did once run a Simeon message
server for
a University in London and that was based on Cyrus and was pretty
similar from recollection.
I'm not convinced this is a good idea. It is a lot of work based on
the assumption that the server is the bottleneck.
Moving the pool to a FC San or a JBOD may also increase performance.
(Less layers, introduced by the appliance, thereby increasing
performance.)
Disagree.
I suspect that if we 'rsync' one of these filesystems to a second
server/pool that we would also see a performance increase equal to what
we see on the development server. (I don't know how zfs send a receive
work so I don't know if it would address this "Filesystem Entropy" or
specifically reorganize the files and directories). However, when we
created a testfs filesystem in the zfs pool on the production server,
and copied data to it, we saw the same performance as the other
filesystems, in the same pool.
Directory walkers, like NetBackup or rsync, will not scale well as
the number of files increases. It doesn't matter what file system you
use, the scalability will look more-or-less similar. For millions of
files,
ZFS send/receive works much better. More details are in my paper.
I look forward to reading this Richard. I think it will be a interesting
read
for members of this.
We will have to do something to address the problem. A combination of
what I just listed is our probable course of action. (Much testing will
have to be done to ensure our solution will address the problem because
we are not 100% sure what is the cause of performance degradation). I'm
also dealing with Network Appliance to see if there is anything we can
do at the filer end to increase performance. But I'm holding out little
hope.
DNLC hit rate?
Also, is atime on?
Turning atime off may make a big difference for you. It certainly does
for Sun Messaging server.
Maybe worth doing and reposting result?
But please, don't miss the point I'm trying to make. ZFS would benefit
from a utility or a background process that would reorganize files and
directories in the pool to optimize performance. A utility to deal with
Filesystem Entropy. Currently a zfs pool will live as long as the
lifetime of the disks that it is on, without reorganization. This can be
a long long time. Not to mention slowly expanding the pool over time
contributes to the issue.
This does not come "for free" in either performance or risk. It will
do nothing to solve the directory walker's problem.
Agree. It will have little bearing on the outcome for the reason you
mention.
NB, people who use UFS don't tend to see this because UFS can't
handle millions of files.
It can but only if you have less than a 1 TB'ish sized file systems. Not
large by
ZFS standards. They do work, but with the same performance issue for
directory
walker backups. Heaven help you in fsck'ing them after a system crash.
Hours and hours.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
_______________________________________________________________________
Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand
Phone : +64 09 968 7611
Fax : +64 09 968 7641
Mobile : +64 27 568 7611
mailto:sc...@manukau.ac.nz
http://www.manukau.ac.nz
________________________________________________________________________
perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
________________________________________________________________________
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss