[ceph-users] Re: Advice on sizing WAL/DB cluster for Optane and SATA SSD disks.

2020-03-16 Thread Janne Johansson
Den sön 15 mars 2020 kl 14:06 skrev Виталий Филиппов : > WAL is 1G (you can allocate 2 to be sure), DB should always be 30G. And > this doesn't depend on the size of the data partition :-) > DB should be either 3, 30 or 300 depending on how much you can spare on the fast devices. 30 is probably g

[ceph-users] HEALTH_WARN 1 pools have too few placement groups

2020-03-16 Thread Dietmar Rieder
Hi, I was planing to activate the pg_autoscaler on a EC (6+3) pool which I created two years ago. Back then I calculated the total # of pgs for this pool with a target per ods pg # of 150 (this was the recommended /osd pg number as far as I recall). I used the RedHat ceph pg per pool calculator [

[ceph-users] Re: MGRs failing once per day and generally slow response times

2020-03-16 Thread Janek Bevendorff
Over the weekend, all five MGRs failed, which means we have no more Prometheus monitoring data. We are obviously monitoring the MGR status as well, so we can detect the failure, but it's still a pretty serious issue. Any ideas as to why this might happen? On 13/03/2020 16:56, Janek Bevendorff

[ceph-users] Re: HEALTH_WARN 1 pools have too few placement groups

2020-03-16 Thread Ashley Merrick
This was a bug in 14.2.7 and calculation for EC pools. It has been fixed in 14.2.8 On Mon, 16 Mar 2020 16:21:41 +0800 Dietmar Rieder wrote Hi, I was planing to activate the pg_autoscaler on a EC (6+3) pool which I created two years ago. Back then I calculated the total #

[ceph-users] Re: Is there a better way to make a samba/nfs gateway?

2020-03-16 Thread mj
On 3/16/20 5:21 AM, Konstantin Shalygin wrote: On 3/13/20 8:49 PM, Marc Roos wrote: Can you also create snapshots via the vfs_ceph solution? Yes! Since Samba 4.11 this supported via vfs_ceph_snapshots module. Just out of curiosity: We are currently running a samba server with RBD disks a

[ceph-users] Re: Is there a better way to make a samba/nfs gateway?

2020-03-16 Thread Konstantin Shalygin
On 3/16/20 4:10 PM, mj wrote: Just out of curiosity: We are currently running a samba server with RBD disks as a VM on our proxmox/ceph cluster. I see the advantage of having vfs_ceph_snapshots of the samba user-data. But then again: re-sharing data using samba vfs_ceph adds a layer of comp

[ceph-users] Zabbix module failed to send data - SSL support

2020-03-16 Thread tdados
Hello all, We are having an issue with the Ceph Zabbix module and it's failing to send data. The reason is that in our Zabbix infrastructure we use Encryption and agent connection with certificate as well. I see the logs that are failing due to that reason from the zabbix proxy servers. 1329:

[ceph-users] Re: bluefs enospc

2020-03-16 Thread Igor Fedotov
Hi Derek, first of all  some BlueStore design overview to make sure we're on the same plate. BlueFS doesn't keep all the BlueStore data but just RocksDB part of it. In your case BlueFS shares the same device with BlueStore user data. Some space rebalance procedure takes periodically place t

[ceph-users] Re: Advice on sizing WAL/DB cluster for Optane and SATA SSD disks.

2020-03-16 Thread vitalif
Hi Victor, 1) RocksDB doesn't put L4 on the fast device if it's less than ~ 286 GB, so no. But, anyway, there's usually no L4, so 30 GB is usually sufficient. I had ~17 GB block.dbs even for 8 TB hard drives used for RBD... RGW probably uses slightly more if stored objects are small... but yo

[ceph-users] Re: bluefs enospc

2020-03-16 Thread Derek Yarnell
13:51' starts looking like peering a few pgs and then at '2020-03-15 14:40' on 716 fails, and then for example 719 it fails 1 min later at '2020-03-15 14:41'. [1] - ftp://ftp.umiacs.umd.edu/pub/derek/ceph-osd.716.log-20200316.gz [2] - ftp://ftp.umiacs.umd.edu/pub/derek/ceph

[ceph-users] Re: HEALTH_WARN 1 pools have too few placement groups

2020-03-16 Thread Dietmar Rieder
Oh, didn't realize, Thanks Dietmar On 2020-03-16 09:44, Ashley Merrick wrote: > This was a bug in 14.2.7 and calculation for EC pools. > > It has been fixed in 14.2.8 > > > On Mon, 16 Mar 2020 16:21:41 +0800 *Dietmar Rieder > * wrote > > Hi, > > I was planing to activate th

[ceph-users] Re: bluefs enospc

2020-03-16 Thread Igor Fedotov
other OSDs reported(-ing) something similar? Here is another node which at around '2020-03-15 13:51' starts looking like peering a few pgs and then at '2020-03-15 14:40' on 716 fails, and then for example 719 it fails 1 min later at '2020-03-15 14:41'. [1] -

[ceph-users] Re: Forcibly move PGs from full to empty OSD

2020-03-16 Thread Thomas Schneider
Hi Wido, can you please share some detailed instructions how to do this? And what do you mean with "respect your failure domain"? THX Am 04.03.2020 um 11:27 schrieb Wido den Hollander: > On 3/4/20 11:15 AM, Thomas Schneider wrote: >> Hi, >> >> Ceph balancer is not working correctly; there's an o

[ceph-users] Re: upmap balancer

2020-03-16 Thread Thomas Schneider
Hi Dan, I have opened this this bug report for balancer not working as expected. https://tracker.ceph.com/issues/43586 Then I thought it could make sense to balance the cluster manually by means of moving PGs from a heavily loaded OSD to another. I found your slides "Luminous: pg upmap (dev)

[ceph-users] Re: Advice on sizing WAL/DB cluster for Optane and SATA SSD disks.

2020-03-16 Thread Igor Fedotov
On 3/16/2020 3:25 PM, vita...@yourcmc.ru wrote: Hi Victor, 1) RocksDB doesn't put L4 on the fast device if it's less than ~ 286 GB, so no. But, anyway, there's usually no L4, so 30 GB is usually sufficient. I had ~17 GB block.dbs even for 8 TB hard drives used for RBD... RGW probably uses s

[ceph-users] Re: upmap balancer

2020-03-16 Thread Dan van der Ster
Hi Thomas, I lost track of your issue. Are you just trying to balance the PGs ? 14.2.8 has big improvements -- check the release notes / blog post about setting the upmap_max_deviations down to 2 or 1. -- Dan On Mon, Mar 16, 2020 at 4:00 PM Thomas Schneider <74cmo...@gmail.com> wrote: > > Hi Dan,

[ceph-users] Re: upmap balancer

2020-03-16 Thread Thomas Schneider
Hi Dan, indeed I'm trying to balance the PGs. In order to ensure Ceph cluster operations I used OSD reweight, means some specific OSDs are not with reweight 0.8 and 0.9 respectively. Question: Can I upgrade to Ceph 14.2.8 w/o resetting the weight to 1.0? Or should I cleanup this reweight first,

[ceph-users] Re: upmap balancer

2020-03-16 Thread Dan van der Ster
Hi, I would upgrade, configure the balancer correctly, then wait a bit for it to smooth things out. Afterwards you can reweight back to 1.0. -- dan On Mon, Mar 16, 2020 at 4:19 PM Thomas Schneider <74cmo...@gmail.com> wrote: > > Hi Dan, > > indeed I'm trying to balance the PGs. > > In order to ens

[ceph-users] Re: Forcibly move PGs from full to empty OSD

2020-03-16 Thread Anthony D'Atri
He means that if eg. you enforce 1 copy of a PG per rack, that any upmaps you enter don’t result in 2 or 3 in the same rack. If your CRUSH poilicy is one copy per *host* the danger is even higher that you could have data become unavailable or even lost in case of a failure. > On Mar 16, 2020,

[ceph-users] Re: bluefs enospc

2020-03-16 Thread Derek Yarnell
Hi Igor, On 3/16/20 10:34 AM, Igor Fedotov wrote: > I can suggest the following non-straightforward way for now: > > 1) Check osd startup log for the following line: > > 2020-03-15 14:43:27.845 7f41bb6baa80  1 > bluestore(/var/lib/ceph/osd/ceph-681) _open_alloc loaded 23 GiB in 97 > extents > >

[ceph-users] Re: Forcibly move PGs from full to empty OSD

2020-03-16 Thread Thomas Schneider
Hi, thanks for this clarification. I'm running a 7-node-cluster and this risk should be managable. Am 16.03.2020 um 16:57 schrieb Anthony D'Atri: > He means that if eg. you enforce 1 copy of a PG per rack, that any upmaps you > enter don’t result in 2 or 3 in the same rack. If your CRUSH poil

[ceph-users] Re: v14.2.8 Nautilus released

2020-03-16 Thread Dietmar Rieder
On 2020-03-03 13:36, Abhishek Lekshmanan wrote: > > This is the eighth update to the Ceph Nautilus release series. This release > fixes issues across a range of subsystems. We recommend that all users upgrade > to this release. Please note the following important changes in this > release; as alwa

[ceph-users] OSD failing to restart with "no available blob id"

2020-03-16 Thread Gilles Mocellin
Hi ! I'm stuck with "no available blob id" during the start of an OSD. It seems there's a workaround back-ported only in nautilus (Bug https://tracker.ceph.com/issues/38272), but I use mimic for now. Someone has an operational workaround ? Or should I recreate my OSD ? And what is the easiest way

[ceph-users] Upmap balancing - pools grouped together?

2020-03-16 Thread Andras Pataki
I've been trying the upmap balancer on a new Nautilus cluster.  We three main pools, a triple replicated pool (id:1) and two 6+3 erasure coded pools (id: 4 and 5).  The balancer does a very nice job on the triple replicated pool, but does something strange on the EC pools.  Here is a sample of