Hi,

The new cluster is set up with two physical servers with HDDs and a VM backed 
by an all-flash stretched vSAN.
The old cluster will be set up the same way.

The main volume that I'm concerned about usually takes about 20-30 minutes to 
finish the self-heal, the network is 10Gbps.


Best regards
--
THORGEIR MARTHINUSSEN
Senior Systems Consultant
BASEFARM

-----Original Message-----
From: Strahil 
<[email protected]<mailto:strahil%20%[email protected]%3e>>
To: Thorgeir 
<[email protected]<mailto:thorgeir%20%[email protected]%3e>>,
 gluster-users 
<[email protected]<mailto:gluster-users%20%[email protected]%3e>>
Subject: Re: [Gluster-users] Adding arbiter on a large existing replica 2 set
Date: Wed, 16 Oct 2019 21:04:50 +0300


Hi Thorgeir,

Did you try adding an arbiter with SSD brick/bricks ?

SSD/NVMe is the best type of storage for an arbiter - yes , it's more expensive 
but you will need less disks than a data brick .

Of course , arbiter is only one side of the equasion and the time to heal might 
depend on your data bricks' IOPS.

How much time does a node in the cluster need to heal after being reboot ?

Best Regards,
Strahil Nikolov

On Oct 16, 2019 16:37, Thorgeir Marthinussen 
<[email protected]> wrote:
Hi,

We have an old Gluster cluster setup, running a replica 2 across two 
datacenters, and currently on version 4.1.5

I need to add an arbiter to this setup, but I'm concerned about the performance 
impact of this on the volumes.

I recently set up a new cluster, for a different purpose, and decided to test 
adding an arbiter to the volume after adding in some data.
Had a volume with ~435,000 files totaling about 12TB.
Adding the arbiter initiated a heal-operation that took almost 3 hours.

The older cluster, one of the volumes is about 14TB, but ~45,5 million files.

Since arbiter is only concerned about metadata and checksums, I'm concerned 
about the fact that we have 100 times the amount of files, i.e. 100 times the 
amount of I/O operations to execute during healing, and possibly 100 times the 
time which would mean about 12,5 days.

Another "issue" is that the 'gluster volume heal <vol-name> info summary' 
command seems to "count" all the files, so the command can take a very long 
time to complete.
The metrics-scraping script I created for us, with a timeout of 110seconds, 
fails to complete when a volume has over ~800-900 files unsynced (which happens 
regularily when taking one cluster-node down for patching).


Does anyone have any experience with adding arbiter afterwards, performance 
impact, time to heal, etc.
Also other ways to get the status on healing.

Any advice would be appreciated.


Best regards
--
THORGEIR MARTHINUSSEN
Senior Systems Consultant
BASEFARM
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to