[ceph-users] CephFS metadata: Large omap object found

2019-10-01 Thread Eugen Block

Hi all,

we have a new issue in our Nautilus cluster.
The large omap warning seems to be more common for RGW usage, but we  
currently only use CephFS and RBD. I found one thread [1] regarding  
metadata pool, but it doesn't really help in our case.


The deep-scrub of PG 36.6 brought up this message (deep-scrub finished  
with "ok"):


2019-09-30 20:18:22.548401 osd.9 (osd.9) 275 : cluster [WRN] Large  
omap object found. Object: 36:654134d2:::mds0_openfiles.0:head Key  
count: 238621 Size (bytes): 9994510



I checked xattr (none) and omapheader:

ceph01:~ # rados -p cephfs-metadata listxattr mds0_openfiles.0
ceph01:~ # rados -p cephfs-metadata getomapheader mds0_openfiles.0
header (42 bytes) :
  13 00 00 00 63 65 70 68  20 66 73 20 76 6f 6c 75  |ceph fs volu|
0010  6d 65 20 76 30 31 31 01  01 0d 00 00 00 74 c3 12  |me v011..t..|
0020  00 00 00 00 00 01 00 00  00 00|..|
002a

ceph01:~ # ceph fs volume ls
[
  {
"name": "cephfs"
  }
]


The respective OSD has default thresholds regarding large_omap:

ceph02:~ # ceph daemon osd.9 config show | grep large_omap
"osd_deep_scrub_large_omap_object_key_threshold": "20",
"osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",


Can anyone point me to a solution for this?

Best regards,
Eugen


[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033813.html
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata: Large omap object found

2019-10-01 Thread Paul Emmerich
The thresholds were recently reduced by a factor of 10. I guess you
have a lot of (open) files? Maybe use more active MDS servers?

Or increase the thresholds, I wouldn't worry at all about 200k omap
keys if you are running on reasonable hardware.
The usual argument for a low number of omap keys is recovery time, but
if you are running a metadata-heavy workload on something that has
problems recovering 200k keys in less than a few seconds, then you are
doing something wrong anyways.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Tue, Oct 1, 2019 at 9:10 AM Eugen Block  wrote:
>
> Hi all,
>
> we have a new issue in our Nautilus cluster.
> The large omap warning seems to be more common for RGW usage, but we
> currently only use CephFS and RBD. I found one thread [1] regarding
> metadata pool, but it doesn't really help in our case.
>
> The deep-scrub of PG 36.6 brought up this message (deep-scrub finished
> with "ok"):
>
> 2019-09-30 20:18:22.548401 osd.9 (osd.9) 275 : cluster [WRN] Large
> omap object found. Object: 36:654134d2:::mds0_openfiles.0:head Key
> count: 238621 Size (bytes): 9994510
>
>
> I checked xattr (none) and omapheader:
>
> ceph01:~ # rados -p cephfs-metadata listxattr mds0_openfiles.0
> ceph01:~ # rados -p cephfs-metadata getomapheader mds0_openfiles.0
> header (42 bytes) :
>   13 00 00 00 63 65 70 68  20 66 73 20 76 6f 6c 75  |ceph fs volu|
> 0010  6d 65 20 76 30 31 31 01  01 0d 00 00 00 74 c3 12  |me v011..t..|
> 0020  00 00 00 00 00 01 00 00  00 00|..|
> 002a
>
> ceph01:~ # ceph fs volume ls
> [
>{
>  "name": "cephfs"
>}
> ]
>
>
> The respective OSD has default thresholds regarding large_omap:
>
> ceph02:~ # ceph daemon osd.9 config show | grep large_omap
>  "osd_deep_scrub_large_omap_object_key_threshold": "20",
>  "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",
>
>
> Can anyone point me to a solution for this?
>
> Best regards,
> Eugen
>
>
> [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033813.html
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus pg autoscale, data lost?

2019-10-01 Thread Raymond Berg Hansen
Hi. I am new to ceph but have set it up on my homelab and started using it. It 
seemed very good intil I desided to try pg autoscale.
After enabling autoscale to 3 of my pools, autoscale tried(?) to reduce the 
number of PGs and the pools are now unaccessible.
I have tried to turn it off again, but no luck! Please help.

ceph status:
https://pastebin.com/88qNivJi  (do not know why it lists 4 pools, I have 3. 
Maybe one of the pools I created after and deleted are in limbo?)

ceph osd pool ls detail:
https://pastebin.com/HZLz6yHL

ceph health detail:
https://pastebin.com/Kqd2YMtm
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Wido den Hollander



On 10/1/19 12:16 PM, Raymond Berg Hansen wrote:
> Hi. I am new to ceph but have set it up on my homelab and started using it. 
> It seemed very good intil I desided to try pg autoscale.
> After enabling autoscale to 3 of my pools, autoscale tried(?) to reduce the 
> number of PGs and the pools are now unaccessible.
> I have tried to turn it off again, but no luck! Please help.
> 

Are you sure the data is not available? The 'unknown' status can
sometimes happen if the Mgr isn't receiving the data.

Have you tried to restart the active Manager?

Wido

> ceph status:
> https://pastebin.com/88qNivJi  (do not know why it lists 4 pools, I have 3. 
> Maybe one of the pools I created after and deleted are in limbo?)
> 
> ceph osd pool ls detail:
> https://pastebin.com/HZLz6yHL
> 
> ceph health detail:
> https://pastebin.com/Kqd2YMtm
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Raymond Berg Hansen
Yes I am sure, tried to restart the whole cluster.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata: Large omap object found

2019-10-01 Thread Eugen Block

Thank you, Paul.


The thresholds were recently reduced by a factor of 10. I guess you
have a lot of (open) files? Maybe use more active MDS servers?


We'll consider adding more MDS servers, although the workload hasn't  
been an issue yet.



Or increase the thresholds, I wouldn't worry at all about 200k omap
keys if you are running on reasonable hardware.
The usual argument for a low number of omap keys is recovery time, but
if you are running a metadata-heavy workload on something that has
problems recovering 200k keys in less than a few seconds, then you are
doing something wrong anyways.



We haven't had any issues with MDS failovers and/or recovery yet, I  
guess higher thresholds would be fine.
To get rid of the warning (for a week) it was sufficient to issue a  
deep-scrub on the affected PG while the listomapkeys output was lower  
than 200k. Maybe we were just "lucky" until now because the  
deep-scrubs are issued outside of business hours, so the number of  
open files should be lower.


Anyway, thank you for your input, it seems as if this is not a problem  
at the moment.


Regards,
Eugen


Zitat von Paul Emmerich :


The thresholds were recently reduced by a factor of 10. I guess you
have a lot of (open) files? Maybe use more active MDS servers?

Or increase the thresholds, I wouldn't worry at all about 200k omap
keys if you are running on reasonable hardware.
The usual argument for a low number of omap keys is recovery time, but
if you are running a metadata-heavy workload on something that has
problems recovering 200k keys in less than a few seconds, then you are
doing something wrong anyways.


Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Tue, Oct 1, 2019 at 9:10 AM Eugen Block  wrote:


Hi all,

we have a new issue in our Nautilus cluster.
The large omap warning seems to be more common for RGW usage, but we
currently only use CephFS and RBD. I found one thread [1] regarding
metadata pool, but it doesn't really help in our case.

The deep-scrub of PG 36.6 brought up this message (deep-scrub finished
with "ok"):

2019-09-30 20:18:22.548401 osd.9 (osd.9) 275 : cluster [WRN] Large
omap object found. Object: 36:654134d2:::mds0_openfiles.0:head Key
count: 238621 Size (bytes): 9994510


I checked xattr (none) and omapheader:

ceph01:~ # rados -p cephfs-metadata listxattr mds0_openfiles.0
ceph01:~ # rados -p cephfs-metadata getomapheader mds0_openfiles.0
header (42 bytes) :
  13 00 00 00 63 65 70 68  20 66 73 20 76 6f 6c 75   
|ceph fs volu|
0010  6d 65 20 76 30 31 31 01  01 0d 00 00 00 74 c3 12  |me  
v011..t..|

0020  00 00 00 00 00 01 00 00  00 00|..|
002a

ceph01:~ # ceph fs volume ls
[
   {
 "name": "cephfs"
   }
]


The respective OSD has default thresholds regarding large_omap:

ceph02:~ # ceph daemon osd.9 config show | grep large_omap
 "osd_deep_scrub_large_omap_object_key_threshold": "20",
 "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824",


Can anyone point me to a solution for this?

Best regards,
Eugen


[1]  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033813.html

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Eugen Block

Hi,

we had a problem with the autoscaler just recently, we had to turn it  
off because the MONs suddenly became laggy [1]. Did you check the MON  
processes? Try disabling autoscaler, wait until that change is applied  
and then restart MONs one by one.


Regards,
Eugen

[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/TZZIRVRGQK4WDDT52UNYIDMZH2TR7467/



Zitat von Raymond Berg Hansen :


Yes I am sure, tried to restart the whole cluster.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: moving EC pool from HDD to SSD without downtime

2019-10-01 Thread Frank Schilder
Thanks Poul!

For reference to everyone finding this thread, this procedure works indeed as 
intended:

ceph osd getcrushmap -o crush.map
crushtool -d crush.map -o crush.txt
# edit crush rule: "step take ServerRoom class hdd" --> "step take ServerRoom 
class ssd"
crushtool -o crush-new.map -c crush.txt
ceph osd set norebalance
ceph osd set nobackfill
ceph osd setcrushmap -i crush-new.map
# wait for peering to finish, you will see 100% objects misplaced but all PGs 
active+...
ceph osd unset norebalance
ceph osd unset nobackfill

Ceph will now happily move objects while storage is fully redundant and r/w 
accessible.

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Raymond Berg Hansen
Thanks for the tip, but I already tried that to. First thing I tried actually.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Eugen Block
It seems that the metadata pool is not affected, those PGs are all  
active+clean. Are different rulesets applied to 'cephfs_data' and  
'cephfs_metadata'?


Have you changed anything regarding crush rules or device classes or  
something related to the ceph osd tree?



Zitat von Raymond Berg Hansen :

Thanks for the tip, but I already tried that to. First thing I tried  
actually.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Objects degraded after adding disks

2019-10-01 Thread Frank Schilder
I'm running a cepf fs with an 8+2 EC data pool. Disks are on 10 hosts and 
failure domain is host. Version is mimic 13.2.2. Today I added a few OSDs to 
one of the hosts and observed that a lot of PGs became inactive even though 9 
out of 10 hosts were up all the time. After getting the 10th host and all disks 
up, I still ended up with a large amount of undersized PGs and degraded 
objects, which I don't understand as no OSD was removed.


Here some details about the steps taken on the host with new disks, main 
questions at the end:

- shut down OSDs (systemctl stop docker)
- reboot host (this is necessary due to OS deployment via warewulf)

Devices got renamed and not all disks came back up (4 OSDs remained down). This 
is expected, I need to re-deploy the containers to adjust for device name 
changes. Around this point PGs started peering and some failed waiting for 1 of 
the down OSDs. I don't understand why they didn't just remain active with 9 out 
of 10 disks. Until this moment of some OSDs coming up, all PGs were active. 
With min_size=9 I would expect all PGs to remain active with no changes to 9 
out of the 10 hosts.

- redeploy docker containers
- all disks/OSDs come up, including the 4 OSDs from above
- inactive PGs complete peering and become active
- now I have a los of degraded Objects and undersized PGs even though not a 
single OSD was removed

I don't understand why I have degraded objects. I should just have misplaced 
objects:

HEALTH_ERR
22995992/145698909 objects misplaced (15.783%)
Degraded data redundancy: 5213734/145698909 objects degraded 
(3.578%), 208 pgs degraded, 208
pgs undersized
Degraded data redundancy (low space): 169 pgs backfill_toofull

Note: The backfill_toofull with low utilization (usage: 38 TiB used, 1.5 PiB / 
1.5 PiB avail) is a known issue in ceph (https://tracker.ceph.com/issues/39555)

Also, I should be able to do whatever with 1 out of 10 hosts without loosing 
data access. What could be the problem here?


Questions summary:

Why does peering not succeed to keep all PGs active with 9 out of 10 OSDs up 
and in?
Why do undersized PGs arise even though all OSDs are up?
Why do degraded objects arise even though no OSD was removed?

Thanks!

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Raymond Berg Hansen
You are absolutly right, I had made a crush rule for device class hdd. Did not 
put this in connection with this problem. When I put the pools back in the 
default crush rule things are starting to fix itself it seems.
Have I done something wrong with this crush rule?

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicated-hdd {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type datacenter
step emit
}
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Doubt about ceph-iscsi and Vmware

2019-10-01 Thread Gesiel Galvão Bernardes
Thank you everyone for the explanations.

Still on this subject: I created an host and attached a disk. For a second
host, for use a shared iscsi storage, I just need add disk to second client?
I tried this:
> disk add pool1/vmware_iscsi1
Warning: 'pool1/vmware_iscsi1' mapped to 1 other client(s)
ok

It's a problem, or is correct?

Regards
Gesiel


Em sex, 20 de set de 2019 às 20:13, Mike Christie 
escreveu:

> On 09/20/2019 01:52 PM, Gesiel Galvão Bernardes wrote:
> > Hi,
> > I'm testing Ceph with Vmware, using Ceph-iscsi gateway. I reading
> > documentation*  and have doubts some points:
> >
> > - If I understanded, in general terms, for each VMFS datastore in VMware
> > will match the an RBD image. (consequently in an RBD image I will
> > possible have many VMWare disks). Its correct?
> >
> > - In documentation is this: "gwcli requires a pool with the name rbd, so
> > it can store metadata like the iSCSI configuration". In part 4 of
> > "Configuration", have: "Add a RBD image with the name disk_1 in the pool
> > rbd". In this part, the use of "rbd" pool is a example and I could use
> > any pool for storage of image, or the pool should be "rbd"?
> > Resuming: gwcli require "rbd" pool for metadata and I could use any pool
> > for image, or i will use just "rbd pool" for storage image and metadata?
> >
> > - How much memory ceph-iscsi use? Which  is a good number of RAM?
> >
>
> The major memory use is:
>
> 1. In RHEL 7.5 kernels and older we allocate max_data_area_mb of kernel
> memory per device. The default value for that is 8. You can use gwcli to
> configure it. It is allocated when the device is created. In newer
> kernels, there is pool of memory and each device can use up to
> max_data_area_mb worth of it. The per device default is the same and you
> can change it with gwcli. The total pool limit is 2 GB. There is a sysfs
> file:
>
> /sys/module/target_core_user/parameters/global_max_data_area_mb
>
> that can be used to change it.
>
> 2. Each device uses about 20 MB of memory in userspace. This is not
> configurable.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Marc Roos
 

Some time ago on Luminous I also had to change the crush rules on a all 
hdd cluster to hdd (to prepare for adding ssd's and ssd pools). And pg's 
started migrating while everything already was on hdd's, looks like this 
is still not fixed?





-Original Message-
From: Raymond Berg Hansen [mailto:raymon...@gmail.com] 
Sent: dinsdag 1 oktober 2019 14:32
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Nautilus pg autoscale, data lost?

You are absolutly right, I had made a crush rule for device class hdd. 
Did not put this in connection with this problem. When I put the pools 
back in the default crush rule things are starting to fix itself it 
seems.
Have I done something wrong with this crush rule?

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicated-hdd {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type datacenter
step emit
}
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Eugen Block

Some time ago on Luminous I also had to change the crush rules on a all
hdd cluster to hdd (to prepare for adding ssd's and ssd pools). And pg's
started migrating while everything already was on hdd's, looks like this
is still not fixed?


Sage responded to a thread yesterday, how to change crush device  
classes without rebalancing (crushtool reclassify):


https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/675QZ2JXXX4RPRNPK2NL7FB5MVANKUB2/


Zitat von Marc Roos :


Some time ago on Luminous I also had to change the crush rules on a all
hdd cluster to hdd (to prepare for adding ssd's and ssd pools). And pg's
started migrating while everything already was on hdd's, looks like this
is still not fixed?





-Original Message-
From: Raymond Berg Hansen [mailto:raymon...@gmail.com]
Sent: dinsdag 1 oktober 2019 14:32
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Nautilus pg autoscale, data lost?

You are absolutly right, I had made a crush rule for device class hdd.
Did not put this in connection with this problem. When I put the pools
back in the default crush rule things are starting to fix itself it
seems.
Have I done something wrong with this crush rule?

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule replicated-hdd {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type datacenter
step emit
}
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph and centos 8

2019-10-01 Thread fleg
Hi,
We have a ceph+cephfs cluster runing nautilus version 14.2.4
We have debian buster/ubuntu bionic clients mounting cephfs in kernel mode 
without problems.
We now want to mount cephfs from our new centos 8 clients. Unfortunately, 
ceph-common is needed but there are no packages available for el8 (only el7). 
And no way to install the el7 packages on centos 8 (missing deps).
Thus, despite the fact that centos 8 have a 4.18 kernel (required to use quota, 
snapshots etc...), it seems impossible to mount in kernel mode (good perfs) and 
we still have to use the so slow fuse mode.
Is it possible to workaround this problem ? Or when is it planned to provides 
(even as beta) the ceph packages for centos 8 ?
Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph and centos 8

2019-10-01 Thread Sage Weil
On Tue, 1 Oct 2019, f...@lpnhe.in2p3.fr wrote:
> Hi,
> We have a ceph+cephfs cluster runing nautilus version 14.2.4
> We have debian buster/ubuntu bionic clients mounting cephfs in kernel mode 
> without problems.
> We now want to mount cephfs from our new centos 8 clients. Unfortunately, 
> ceph-common is needed but there are no packages available for el8 (only el7). 
> And no way to install the el7 packages on centos 8 (missing deps).
> Thus, despite the fact that centos 8 have a 4.18 kernel (required to use 
> quota, snapshots etc...), it seems impossible to mount in kernel mode (good 
> perfs) and we still have to use the so slow fuse mode.
> Is it possible to workaround this problem ? Or when is it planned to provides 
> (even as beta) the ceph packages for centos 8 ?

We should have el8 packages built Real Soon Now.

In the meantime, you don't need ceph-common to mount cephfs; the only real 
function of mount.ceph is to resolve the monitor name to an IP and pull 
the secret out of the secretfile.  If you specify the monitor IP then

  mount -t ceph $monip:/ -o secret=$secret,name=... 

should work.

sage
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph and centos 8

2019-10-01 Thread fleg
Thanks. Happy to ear that el8 packages will soon be available.
F.


By the way, it seems that mount.ceph is called by mount. 
I already tryed that :
mount -t ceph 123.456.789.000:6789:/  /data -o 
name=xxx_user,secretfile=/etc/ceph/client.xxx_user.keyring
and get
mount: /data: wrong fs type, bad option, bad superblock on 
123.456.789.000:6789:/, missing codepage or helper program, or other error.
Anyway, we will wait !
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph and centos 8

2019-10-01 Thread Ilya Dryomov
On Tue, Oct 1, 2019 at 4:14 PM  wrote:
>
> Thanks. Happy to ear that el8 packages will soon be available.
> F.
>
>
> By the way, it seems that mount.ceph is called by mount.
> I already tryed that :
> mount -t ceph 123.456.789.000:6789:/  /data -o 
> name=xxx_user,secretfile=/etc/ceph/client.xxx_user.keyring
> and get
> mount: /data: wrong fs type, bad option, bad superblock on 
> 123.456.789.000:6789:/, missing codepage or helper program, or other error.
> Anyway, we will wait !

You need to pass secret= instead of secretfile.  The
kernel won't go and read a random file somewhere, that is mount.ceph's
job.

You may need to pass -i to prevent mount from attempting to call
mount.ceph, although it should work fine without -i as long as you
specify only those options that the kernel can handle on its own.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus pg autoscale, data lost?

2019-10-01 Thread Mattia Belluco
Hi Raymond,

I believe the "type datacenter" bit in your replicated-hdd rule should
read "type host" instead, as in the original replicated rule.

Best
Mattia

On 10/1/19 2:31 PM, Raymond Berg Hansen wrote:
> You are absolutly right, I had made a crush rule for device class hdd. Did 
> not put this in connection with this problem. When I put the pools back in 
> the default crush rule things are starting to fix itself it seems.
> Have I done something wrong with this crush rule?
> 
> # rules
> rule replicated_rule {
>   id 0
>   type replicated
>   min_size 1
>   max_size 10
>   step take default
>   step chooseleaf firstn 0 type host
>   step emit
> }
> rule replicated-hdd {
>   id 1
>   type replicated
>   min_size 1
>   max_size 10
>   step take default class hdd
>   step chooseleaf firstn 0 type datacenter
>   step emit
> }
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Objects degraded after adding disks

2019-10-01 Thread Robert LeBlanc
On Tue, Oct 1, 2019 at 5:25 AM Frank Schilder  wrote:
>
> I'm running a cepf fs with an 8+2 EC data pool. Disks are on 10 hosts and 
> failure domain is host. Version is mimic 13.2.2. Today I added a few OSDs to 
> one of the hosts and observed that a lot of PGs became inactive even though 9 
> out of 10 hosts were up all the time. After getting the 10th host and all 
> disks up, I still ended up with a large amount of undersized PGs and degraded 
> objects, which I don't understand as no OSD was removed.
>
>
> Here some details about the steps taken on the host with new disks, main 
> questions at the end:
>
> - shut down OSDs (systemctl stop docker)
> - reboot host (this is necessary due to OS deployment via warewulf)
>
> Devices got renamed and not all disks came back up (4 OSDs remained down). 
> This is expected, I need to re-deploy the containers to adjust for device 
> name changes. Around this point PGs started peering and some failed waiting 
> for 1 of the down OSDs. I don't understand why they didn't just remain active 
> with 9 out of 10 disks. Until this moment of some OSDs coming up, all PGs 
> were active. With min_size=9 I would expect all PGs to remain active with no 
> changes to 9 out of the 10 hosts.
>
> - redeploy docker containers
> - all disks/OSDs come up, including the 4 OSDs from above
> - inactive PGs complete peering and become active
> - now I have a los of degraded Objects and undersized PGs even though not a 
> single OSD was removed
>
> I don't understand why I have degraded objects. I should just have misplaced 
> objects:
>
> HEALTH_ERR
> 22995992/145698909 objects misplaced (15.783%)
> Degraded data redundancy: 5213734/145698909 objects degraded 
> (3.578%), 208 pgs degraded, 208
> pgs undersized
> Degraded data redundancy (low space): 169 pgs backfill_toofull
>
> Note: The backfill_toofull with low utilization (usage: 38 TiB used, 1.5 PiB 
> / 1.5 PiB avail) is a known issue in ceph 
> (https://tracker.ceph.com/issues/39555)
>
> Also, I should be able to do whatever with 1 out of 10 hosts without loosing 
> data access. What could be the problem here?
>
>
> Questions summary:
>
> Why does peering not succeed to keep all PGs active with 9 out of 10 OSDs up 
> and in?

I would just double check that min_size=9 for your pool, it should be
set to that, but that is the only reason I can think that you are
seeing this problem.

> Why do undersized PGs arise even though all OSDs are up?

I've noticed on my cluster that sometimes when an OSD goes down, the
EC considers the OSD missing when it comes back online and needs to
resync. Not sure what exactly causes this to happen, but it happens
more often than it should.

> Why do degraded objects arise even though no OSD was removed?

If you are writing objects while the PGs are undersized (host/osds
down), then it will have to sync those writes to the OSDs that were
down. This is the number of degraded objects.


Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph and centos 8

2019-10-01 Thread fleg
Thanks a lot ! That was the trick !!! It works.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Doubt about ceph-iscsi and Vmware

2019-10-01 Thread Mike Christie
On 10/01/2019 07:32 AM, Gesiel Galvão Bernardes wrote:
> Thank you everyone for the explanations. 
> 
> Still on this subject: I created an host and attached a disk. For a
> second host, for use a shared iscsi storage, I just need add disk to
> second client?
> I tried this:
>> disk add pool1/vmware_iscsi1
> Warning: 'pool1/vmware_iscsi1' mapped to 1 other client(s)
> ok
> 
> It's a problem, or is correct?

Normally you would do this in a host group.


> 
> Regards
> Gesiel
> 
> 
> Em sex, 20 de set de 2019 às 20:13, Mike Christie  > escreveu:
> 
> On 09/20/2019 01:52 PM, Gesiel Galvão Bernardes wrote:
> > Hi,
> > I'm testing Ceph with Vmware, using Ceph-iscsi gateway. I reading
> > documentation*  and have doubts some points:
> >
> > - If I understanded, in general terms, for each VMFS datastore in
> VMware
> > will match the an RBD image. (consequently in an RBD image I will
> > possible have many VMWare disks). Its correct?
> >
> > - In documentation is this: "gwcli requires a pool with the name
> rbd, so
> > it can store metadata like the iSCSI configuration". In part 4 of
> > "Configuration", have: "Add a RBD image with the name disk_1 in
> the pool
> > rbd". In this part, the use of "rbd" pool is a example and I could use
> > any pool for storage of image, or the pool should be "rbd"?
> > Resuming: gwcli require "rbd" pool for metadata and I could use
> any pool
> > for image, or i will use just "rbd pool" for storage image and
> metadata?
> >
> > - How much memory ceph-iscsi use? Which  is a good number of RAM?
> >
> 
> The major memory use is:
> 
> 1. In RHEL 7.5 kernels and older we allocate max_data_area_mb of kernel
> memory per device. The default value for that is 8. You can use gwcli to
> configure it. It is allocated when the device is created. In newer
> kernels, there is pool of memory and each device can use up to
> max_data_area_mb worth of it. The per device default is the same and you
> can change it with gwcli. The total pool limit is 2 GB. There is a sysfs
> file:
> 
> /sys/module/target_core_user/parameters/global_max_data_area_mb
> 
> that can be used to change it.
> 
> 2. Each device uses about 20 MB of memory in userspace. This is not
> configurable.
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] one read/write, many read only

2019-10-01 Thread khaled atteya
Hi,

Is it possible to do this scenario :
If one open a file first , he will get read/write permissions  and other
will get read-only permission if they open the file after the first one.

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RAM recommendation with large OSDs?

2019-10-01 Thread Darrell Enns
The standard advice is "1GB RAM per 1TB of OSD". Does this actually still hold 
with large OSDs on bluestore? Can it be reasonably reduced with tuning?

>From the docs, it looks like bluestore should target the "osd_memory_target" 
>value by default. This is a fixed value (4GB by default), which does not 
>depend on OSD size. So shouldn't the advice really by "4GB per OSD", rather 
>than "1GB per TB"? Would it also be reasonable to reduce osd_memory_target for 
>further RAM savings?

For example, suppose we have 90 12TB OSD drives:

  *   "1GB per TB" rule: 1080GB RAM
  *   "4GB per OSD" rule: 360GB RAM
  *   "2GB per OSD" (osd_memory_target reduced to 2GB): 180GB RAM

Those are some massively different RAM values. Perhaps the old advice was for 
filestore? Or there is something to consider beyond the bluestore memory 
target? What about when using very dense nodes (for example, 60 12TB OSDs on a 
single node)?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-osd@n crash dumps

2019-10-01 Thread Del Monaco, Andrea
Hi list,

After the nodes ran OOM and after reboot, we are not able to restart the 
ceph-osd@x services anymore. (Details about the setup at the end).

I am trying to do this manually, so we can see the error but all i see is 
several crash dumps - this is just one of the OSDs which is not starting. Any 
idea how to get past this??
[root@ceph001 ~]# /usr/bin/ceph-osd --debug_osd 10 -f --cluster ceph --id 83 
--setuser ceph --setgroup ceph  > /tmp/dump 2>&1
starting osd.83 at - osd_data /var/lib/ceph/osd/ceph-83 
/var/lib/ceph/osd/ceph-83/journal
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 
2aaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 34: FAILED assert(stripe_width % stripe_size == 0)
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14b) [0x2af3d36b]
 2: (()+0x26e4f7) [0x2af3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, 
boost::intrusive_ptr&, ObjectStore*, CephContext*, 
std::shared_ptr, unsigned long)+0x46d) 
[0x55c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map, std::allocator > > const&, PGBackend::Listener*, coll_t, 
boost::intrusive_ptr&, ObjectStore*, 
CephContext*)+0x30a) [0x55b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr, 
PGPool const&, std::map, 
std::allocator > > const&, 
spg_t)+0x140) [0x55abd100]
 6: (OSD::_make_pg(std::shared_ptr, spg_t)+0x10cb) 
[0x55914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x55917e39]
 8: (OSD::init()+0xc99) [0x559238e9]
 9: (main()+0x23a3) [0x558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x558d9900]
2019-10-01 14:19:49.500 2aaf5540 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 
2aaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 34: FAILED assert(stripe_width % stripe_size == 0)

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14b) [0x2af3d36b]
 2: (()+0x26e4f7) [0x2af3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, 
boost::intrusive_ptr&, ObjectStore*, CephContext*, 
std::shared_ptr, unsigned long)+0x46d) 
[0x55c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map, std::allocator > > const&, PGBackend::Listener*, coll_t, 
boost::intrusive_ptr&, ObjectStore*, 
CephContext*)+0x30a) [0x55b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr, 
PGPool const&, std::map, 
std::allocator > > const&, 
spg_t)+0x140) [0x55abd100]
 6: (OSD::_make_pg(std::shared_ptr, spg_t)+0x10cb) 
[0x55914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x55917e39]
 8: (OSD::init()+0xc99) [0x559238e9]
 9: (main()+0x23a3) [0x558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x558d9900]

*** Caught signal (Aborted) **
 in thread 2aaf5540 thread_name:ceph-osd
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x2aaab69765d0]
 2: (gsignal()+0x37) [0x2aaab77f22c7]
 3: (abort()+0x148) [0x2aaab77f39b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x248) [0x2af3d468]
 5: (()+0x26e4f7) [0x2af3d4f7]
 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, 
boost::intrusive_ptr&, ObjectStore*, CephContext*, 
std::shared_ptr, unsigned long)+0x46d) 
[0x55c0bd3d]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map, std::allocator > > const&, PGBackend::Listener*, coll_t, 
boost::intrusive_ptr&, ObjectStore*, 
CephContext*)+0x30a) [0x55b0ba8a]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr, 
PGPool const&, std::map, 
std::allocator > > const&, 
spg_t)+0x140) [0x55abd100]
 9: (OSD::_make_pg(std::shared_ptr, spg_t)+0x10cb) 
[0x55914ecb]
 10: (OSD::load_pgs()+0x4a9) [0x55917e39]
 11: (OSD::init()+0xc99) [0x559238e9]
 12: (main()+0x23a3) [0x558017a3]
 13: (__libc_start_main()+0xf5) [0x2aaab77de495]
 14: (()+0x385900) [0x558d9900]
2019-10-01 14:19:49.509 2aaf5540 -1 *** Caught signal (Aborted) *

[ceph-users] ceph-osd@n crash dumps

2019-10-01 Thread Del Monaco, Andrea
Hi list,

After the nodes ran OOM and after reboot, we are not able to restart the 
ceph-osd@x services anymore. (Details about the setup at the end).

I am trying to do this manually, so we can see the error but all i see is 
several crash dumps - this is just one of the OSDs which is not starting. Any 
idea how to get past this??
[root@ceph001 ~]# /usr/bin/ceph-osd --debug_osd 10 -f --cluster ceph --id 83 
--setuser ceph --setgroup ceph  > /tmp/dump 2>&1
starting osd.83 at - osd_data /var/lib/ceph/osd/ceph-83 
/var/lib/ceph/osd/ceph-83/journal
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 
2aaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 34: FAILED assert(stripe_width % stripe_size == 0)
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14b) [0x2af3d36b]
 2: (()+0x26e4f7) [0x2af3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, 
boost::intrusive_ptr&, ObjectStore*, CephContext*, 
std::shared_ptr, unsigned long)+0x46d) 
[0x55c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map, std::allocator > > const&, PGBackend::Listener*, coll_t, 
boost::intrusive_ptr&, ObjectStore*, 
CephContext*)+0x30a) [0x55b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr, 
PGPool const&, std::map, 
std::allocator > > const&, 
spg_t)+0x140) [0x55abd100]
 6: (OSD::_make_pg(std::shared_ptr, spg_t)+0x10cb) 
[0x55914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x55917e39]
 8: (OSD::init()+0xc99) [0x559238e9]
 9: (main()+0x23a3) [0x558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x558d9900]
2019-10-01 14:19:49.500 2aaf5540 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 
2aaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
 34: FAILED assert(stripe_width % stripe_size == 0)

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x14b) [0x2af3d36b]
 2: (()+0x26e4f7) [0x2af3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, 
boost::intrusive_ptr&, ObjectStore*, CephContext*, 
std::shared_ptr, unsigned long)+0x46d) 
[0x55c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map, std::allocator > > const&, PGBackend::Listener*, coll_t, 
boost::intrusive_ptr&, ObjectStore*, 
CephContext*)+0x30a) [0x55b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr, 
PGPool const&, std::map, 
std::allocator > > const&, 
spg_t)+0x140) [0x55abd100]
 6: (OSD::_make_pg(std::shared_ptr, spg_t)+0x10cb) 
[0x55914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x55917e39]
 8: (OSD::init()+0xc99) [0x559238e9]
 9: (main()+0x23a3) [0x558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x558d9900]

*** Caught signal (Aborted) **
 in thread 2aaf5540 thread_name:ceph-osd
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x2aaab69765d0]
 2: (gsignal()+0x37) [0x2aaab77f22c7]
 3: (abort()+0x148) [0x2aaab77f39b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x248) [0x2af3d468]
 5: (()+0x26e4f7) [0x2af3d4f7]
 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, 
boost::intrusive_ptr&, ObjectStore*, CephContext*, 
std::shared_ptr, unsigned long)+0x46d) 
[0x55c0bd3d]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map, std::allocator > > const&, PGBackend::Listener*, coll_t, 
boost::intrusive_ptr&, ObjectStore*, 
CephContext*)+0x30a) [0x55b0ba8a]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr, 
PGPool const&, std::map, 
std::allocator > > const&, 
spg_t)+0x140) [0x55abd100]
 9: (OSD::_make_pg(std::shared_ptr, spg_t)+0x10cb) 
[0x55914ecb]
 10: (OSD::load_pgs()+0x4a9) [0x55917e39]
 11: (OSD::init()+0xc99) [0x559238e9]
 12: (main()+0x23a3) [0x558017a3]
 13: (__libc_start_main()+0xf5) [0x2aaab77de495]
 14: (()+0x385900) [0x558d9900]
2019-10-01 14:19:49.509 2aaf5540 -1 *** Caught signal (Aborted) *

[ceph-users] Re: one read/write, many read only

2019-10-01 Thread Robert LeBlanc
On Tue, Oct 1, 2019 at 9:10 AM khaled atteya  wrote:
>
> Hi,
>
> Is it possible to do this scenario :
> If one open a file first , he will get read/write permissions  and other will 
> get read-only permission if they open the file after the first one.

CephFS will follow POSIX. If a client requests an exclusive write lock
than no one else will be able to get a write lock. This does not mean
that other clients will automatically receive a read-only lock, it
depends on the client if it will fall back to read-only or fail if it
can't get a write lock.


Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Dashboard doesn't respond after failover

2019-10-01 Thread Matthew Stroud
For some reason the active MGR process just resets the connection after 
failover. Nothing really sticks out in the logs to explain this. However if I 
restart the MGR process, it will start responding just fine.

Thoughts?

Thanks,
Matt



CONFIDENTIALITY NOTICE: This message is intended only for the use and review of 
the individual or entity to which it is addressed and may contain information 
that is privileged and confidential. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message solely to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify 
sender immediately by telephone or return email. Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RAM recommendation with large OSDs?

2019-10-01 Thread Paul Emmerich
On Tue, Oct 1, 2019 at 6:12 PM Darrell Enns  wrote:
>
> The standard advice is “1GB RAM per 1TB of OSD”. Does this actually still 
> hold with large OSDs on bluestore?

No

> Can it be reasonably reduced with tuning?

Yes


> From the docs, it looks like bluestore should target the “osd_memory_target” 
> value by default. This is a fixed value (4GB by default), which does not 
> depend on OSD size. So shouldn’t the advice really by “4GB per OSD”, rather 
> than “1GB per TB”? Would it also be reasonable to reduce osd_memory_target 
> for further RAM savings?

Yes

> For example, suppose we have 90 12TB OSD drives:

Please don't put 90 drives in one node, that's not a good idea in
99.9% of the use cases.

>
> “1GB per TB” rule: 1080GB RAM
> “4GB per OSD” rule: 360GB RAM
> “2GB per OSD” (osd_memory_target reduced to 2GB): 180GB RAM
>
>
>
> Those are some massively different RAM values. Perhaps the old advice was for 
> filestore? Or there is something to consider beyond the bluestore memory 
> target? What about when using very dense nodes (for example, 60 12TB OSDs on 
> a single node)?

Keep in mind that it's only a target value, it will use more during
recovery if you set a low value.
We usually set a target of 3 GB per OSD and recommend 4 GB of RAM per OSD.

RAM saving trick: use fewer PGs than recommended.


Paul



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RAM recommendation with large OSDs?

2019-10-01 Thread Paul Emmerich
The problem with lots of OSDs per node is that this usually means you
have too few nodes. It's perfectly fine to run 60 OSDs per node if you
got a total of 1000 OSDs or so.
But I've seen too many setups with 3-5 nodes where each node runs 60
OSDs which makes no sense (and usually isn't even cheaper than more
nodes, especially once you consider the lost opportunity for running
erasure coding).

The usual backup cluster we are seeing is in the single-digit petabyte
range with about 12 to 24 disks per server running ~8+3 erasure
coding.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Oct 2, 2019 at 12:53 AM Darrell Enns  wrote:
>
> Thanks Paul. I was speaking more about total OSDs and RAM, rather than a 
> single node. However, I am considering building a cluster with a large 
> OSD/node count. This would be for archival use, with reduced performance and 
> availability requirements. What issues would you anticipate with a large 
> OSD/node count? Is the concern just the large rebalance if a node fails and 
> takes out a large portion of the OSDs at once?
>
> -Original Message-
> From: Paul Emmerich 
> Sent: Tuesday, October 01, 2019 3:00 PM
> To: Darrell Enns 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] RAM recommendation with large OSDs?
>
> On Tue, Oct 1, 2019 at 6:12 PM Darrell Enns  wrote:
> >
> > The standard advice is “1GB RAM per 1TB of OSD”. Does this actually still 
> > hold with large OSDs on bluestore?
>
> No
>
> > Can it be reasonably reduced with tuning?
>
> Yes
>
>
> > From the docs, it looks like bluestore should target the 
> > “osd_memory_target” value by default. This is a fixed value (4GB by 
> > default), which does not depend on OSD size. So shouldn’t the advice really 
> > by “4GB per OSD”, rather than “1GB per TB”? Would it also be reasonable to 
> > reduce osd_memory_target for further RAM savings?
>
> Yes
>
> > For example, suppose we have 90 12TB OSD drives:
>
> Please don't put 90 drives in one node, that's not a good idea in 99.9% of 
> the use cases.
>
> >
> > “1GB per TB” rule: 1080GB RAM
> > “4GB per OSD” rule: 360GB RAM
> > “2GB per OSD” (osd_memory_target reduced to 2GB): 180GB RAM
> >
> >
> >
> > Those are some massively different RAM values. Perhaps the old advice was 
> > for filestore? Or there is something to consider beyond the bluestore 
> > memory target? What about when using very dense nodes (for example, 60 12TB 
> > OSDs on a single node)?
>
> Keep in mind that it's only a target value, it will use more during recovery 
> if you set a low value.
> We usually set a target of 3 GB per OSD and recommend 4 GB of RAM per OSD.
>
> RAM saving trick: use fewer PGs than recommended.
>
>
> Paul
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: one read/write, many read only

2019-10-01 Thread iwesley
There is a lock for object exists. If the file was not writing close, other 
ones can read only.

Regards


> Am Oct 2, 2019 - 12:09 AM schrieb khaled.att...@gmail.com:
>
>
> Hi,
> 
> Is it possible to do this scenario :
> If one open a file first , he will get read/write permissions  and other will 
> get read-only permission if they open the file after the first one.
> 
> Thanks
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io