Dne 09.09.2020 v 17:52 John Stoffel napsal(a):
Miloslav> There is a one PCIe RAID controller in a chasis. AVAGO
Miloslav> MegaRAID SAS 9361-8i. And 16x SAS 15k drives conneced to
Miloslav> it. Because the controller does not support pass-through for
Miloslav> the drives, we use 16x RAID-0 on controller. So, we get
Miloslav> /dev/sda ... /dev/sdp (roughly) in OS. And over that we have
Miloslav> single btrfs RAID-10, composed of 16 devices, mounted as
Miloslav> /data.

I will bet that this is one of your bottlenecks as well.  Get a secord
or third controller and split your disks across them evenly.

That's plan for a next step.

Miloslav> We run 'rsync' to remote NAS daily. It takes about 6.5 hours to 
finish,
Miloslav> 12'265'387 files last night.

That's.... sucky.  So basically you're hitting the drives hard with
random IOPs and you're probably running out of performance.  How much
space are you using on the filesystem?

Miloslav> It's not so sucky how it seems. rsync runs during the
Miloslav> night. And even reading is high, server load stays low. We
Miloslav> have problems with writes.

Ok.  So putting in an SSD pair to cache things should help.

And why not use brtfs send to ship off snapshots instead of using
rsync?  I'm sure that would be an improvement...

Miloslav> We run backup to external NAS (NetApp) for a disaster
Miloslav> recovery scenario.  Moreover NAS is spreaded across multiple
Miloslav> locations. Then we create NAS snapshot, tens days
Miloslav> backward. All snapshots easily available via NFS mount. And
Miloslav> NAS capacity is cheaper.

So why not run the backend storage on the Netapp, and just keep the
indexes and such local to the system?  I've run Netapps for many years
and they work really well.  And then you'd get automatic backups using
schedule snapshots.

Keep the index files local on disk/SSDs and put the maildirs out to
NFSv3 volume(s) on the Netapp(s).  Should do wonders.  And you'll stop
needing to do rsync at night.

It's the option we have in minds. As you wrote, NetApp is very solid. The main reason for local storage is, that IMAP server is completely isolated from network. But maybe one day will use it.

Miloslav> Last half year, we encoutered into performace
Miloslav> troubles. Server load grows up to 30 in rush hours, due to
Miloslav> IO waits. We tried to attach next harddrives (the 838G ones
Miloslav> in a list below) and increase a free space by rebalace. I
Miloslav> think, it helped a little bit, not not so rapidly.

If you're IOPs bound, but not space bound, then you *really* want to
get an SSD in there for the indexes and such.  Basically the stuff
that gets written/read from all the time no matter what, but which
isn't large in terms of space.

Miloslav> Yes. We are now on 66% capacity. Adding SSD for indexes is
Miloslav> our next step.

This *should* give you a boost in performance.  But finding a way to
take before and after latency/performance measurements is key.  I
would look into using 'fio' to test your latency numbers.  You might
also want to try using XFS or even ext4 as your filesystem.  I
understand not wanting to 'fsck', so that might be right out.

Unfortunately, to quickly fix the problem and make server usable again, we already added SSD and moved indexes on it. So we have no measurements in old state.

Situation is better, but I guess, problem still exists. I takes some time to load be growing. We will see.

Thank you for the fio tip. Definetly I'll try that.

Kind regards
Milo

Reply via email to