[ceph-users] Re: cephadm/podman :: upgrade to pacific stuck

2021-04-08 Thread Adrian Sevcenco

Hi! (and thanks for taking your time to answer my email :) )

On 4/8/21 1:18 AM, Sage Weil wrote:

You would normally tell cephadm to deploy another mgr with 'ceph orch
apply mgr 2'.  In this case, the default placement policy for mgrs is
already either 2 or 3, though--the problem is that you only have 1
host in your cluster, and cephadm currently doesn't handle placing

i had the idea of starting a vm temporary and deploy there a temporary mgr
but unfortunately it seems that after the latest bios update the bios
setting were reset with a default for virtualization set to off :(
and i do not know when i can get to my office again


multiple mgrs on a single host (the ports will conflict).  And upgrade
needs a standby.  So.. a single-host cephadm cluster won't upgrade
itself.

yup.. i searched for a way to define custom ports for dashboard so to start
a second mgr with non-clashing ports, but i did not found a way to do it


You can get around this by manually tweaking the mgr container.. vi
/var/lib/ceph/$fsid/mgr.$whatever/unit.run and change the container
image path on the docker or podman run line to be ceph/ceph:v16.2.0,
and then systemctl restart ceph-$fsid@mgr.$whatever

i wanted to say that i already tried, but when doing grep to report
the things i noticed that in unit.run there are _2_ specifications for image:
there is an
-e CONTAINER_IMAGE=docker.io/ceph/ceph:v16.2.0  (this is what i tried to change)
then towards the end of line after the -v specification for /etc/ceph/ceph.conf
i seen an alone docker.io/ceph/ceph:v15

after changing also this part i can see in the podman ps the mgr container 
being started
with docker.io/ceph/ceph:v16.2.0
so, SUCCESS :)

But why there is a need for two specifications for the same thing?
(-e CONTAINER_IMAGE then again the single image)

Thanks a lot!!
Adrian



Supporting automated single-node upgrades is high on the list.. we
hope to have it fixed soon.

s

On Thu, Apr 1, 2021 at 1:24 PM Adrian Sevcenco  wrote:


On 4/1/21 8:19 PM, Anthony D'Atri wrote:

I think what it’s saying is that it wants for more than one mgr daemon to be 
provisioned, so that it can failover

unfortunately it is not allowed as the port usage is clashing ...
i found out the name of the daemon by grepping the ps output (it would be nice 
a ceph orch daemon ls)
and i stopped it .. but than the message was :
cluster:
id: d9f4c810-8270-11eb-97a7-faa3b09dcf67
health: HEALTH_WARN
no active mgr
Upgrade: Need standby mgr daemon

so, it seems that there is a specific requirement of a state named "standby" 
for mgr daemon

then i tried to start it again with:
ceph orch daemon start 

but the command is stuck ...

i tried to get the ceph:v16.2 image and
ceph orch daemon redeploy mgr ceph:v16.2.0

but it also is stuck?

so, what can i do? is there anything beside delete everything and start from 
scratch?

Thank you!
Adrian



when the primary is restarted.  I suspect you would then run into the same 
thing with the mon.  All sorts of things
tend to crop up on a cluster this minimal.



On Apr 1, 2021, at 10:15 AM, Adrian Sevcenco  wrote:

Hi! I have a single machine ceph installation and after trying to update to 
pacific the upgrade is stuck with:

ceph -s cluster: id: d9f4c810-8270-11eb-97a7-faa3b09dcf67 health: 
HEALTH_WARN Upgrade: Need standby mgr daemon

services: mon: 1 daemons, quorum sev.spacescience.ro (age 3w) mgr: 
sev.spacescience.ro.wpozds(active, since 2w)
mds: sev-ceph:1 {0=sev-ceph.sev.vmvwrm=up:active} osd: 2 osds: 2 up (since 2w), 
2 in (since 2w)

data: pools:   4 pools, 194 pgs objects: 32 objects, 8.4 KiB usage:   2.0 GiB 
used, 930 GiB / 932 GiB avail pgs:
194 active+clean

progress: Upgrade to docker.io/ceph/ceph:v16.2.0 (0s) 
[]

How can i put the mgr on standby? so far i did not find anything relevant..

Thanks a lot! Adrian


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
--
Adrian Sevcenco, Ph.D.   |
Institute of Space Science - ISS, Romania|
adrian.sevcenco at {cern.ch,spacescience.ro} |
--



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: KRBD failed to mount rbd image if mapping it to the host with read-only option

2021-04-08 Thread Wido den Hollander



On 08/04/2021 14:09, Ha, Son Hai wrote:

Hi everyone,

We encountered an issue with KRBD mounting after mapping it to the host with 
read-only option.
We try to pinpoint where the problem is, but not able to do it.



See my reply down below.


The image is mounted well if we map it without the "read-only" option.
This leads to an issue that the pod in k8s cannot use the snapshotted 
persistent volume created by ceph-csi rbd provisioner.
Thank you for reading.

I have reported the bug here: Bug #50234: krbd failed to mount after map image with 
read-only option - Ceph - Ceph

Context
- Using admin keyring
- Linux Kernel: 3.10.0-1160.15.2.el7.x86_64
- Linux Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo)
- Ceph version: "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) 
nautilus (stable)"

rbd image 'csi-vol-85919409-9797-11eb-80ba-720b2b57c790':
 size 10 GiB in 2560 objects
 order 22 (4 MiB objects)
 snapshot_count: 0
 id: 533a03bba388ea
 block_name_prefix: rbd_data.533a03bba388ea
 format: 2
 features: layering
 op_features:
 flags:
 create_timestamp: Wed Apr  7 13:51:02 2021
 access_timestamp: Wed Apr  7 13:51:02 2021
 modify_timestamp: Wed Apr  7 13:51:02 2021

Bug Reproduction
# Map RBD image WITH read-only option, CANNOT mount with both readonly or 
readwrite option
sudo rbd device map -p k8s-sharedpool 
csi-vol-85919409-9797-11eb-80ba-720b2b57c790 -ro
   /dev/rbd0
sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
   mount: cannot mount /dev/rbd0 read-only

sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
   mount: /dev/rbd0 is write-protected, mounting read-only
   mount: cannot mount /dev/rbd0 read-only



ext4 will always try to recover it's journal during mount and this means 
it wants to write. That fails.


Try this with mounting:

sudo mount -t ext4 -o norecover /dev/rbd0 /mnt/test1

or

sudo mount -t ext4 -o noload /dev/rbd0 /mnt/test1

Wido


# Map RBD image WITHOUT read-only option, CAN mount with both readonly or 
readwrite option
sudo rbd device map -p k8s-sharedpool 
csi-vol-85919409-9797-11eb-80ba-720b2b57c790
   /dev/rbd0
sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1

   mount: /mnt/test1 does not contain SELinux labels.
You just mounted an file system that supports labels which does not
contain labels, onto an SELinux box. It is likely that confined
applications will generate AVC messages and not be allowed access to
this file system.  For more details see restorecon(8) and mount(8).
   mount: /dev/rbd0 mounted on /mnt/test1.

sudo mount -v -t ext4 /dev/rbd0 /mnt/test1
   mount: /mnt/test1 does not contain SELinux labels.
You just mounted an file system that supports labels which does not
contain labels, onto an SELinux box. It is likely that confined
applications will generate AVC messages and not be allowed access to
this file system.  For more details see restorecon(8) and mount(8).
   mount: /dev/rbd0 mounted on /mnt/test1.

With my best regards,
Son Hai Ha


--
KPMG IT Service GmbH
Sitz/Registergericht: Berlin/Amtsgericht Charlottenburg, HRB 87521 B
Geschäftsführer: Hans-Christian Schwieger, Helmar Symmank
Aufsichtsratsvorsitzender: WP StB Klaus Becker
  
Allgemeine Informationen zur Datenverarbeitung im Rahmen unserer allgemeinen Geschäftstätigkeit sowie im Mandatsverhältnis gemäß EU Datenschutz-Grundverordnung sind hier  abrufbar.
  
Die Information in dieser E-Mail ist vertraulich und kann dem Berufsgeheimnis unterliegen. Sie ist ausschließlich für den Adressaten bestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede Veröffentlichung, Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Maßnahmen im Vertrauen auf erlangte Information untersagt. In dieser E-Mail enthaltene Meinungen oder Empfehlungen unterliegen den Bedingungen des jeweiligen Mandatsverhältnisses mit dem Adressaten.


The information in this e-mail is confidential and may be legally privileged. 
It is intended solely for the addressee. Access to this e-mail by anyone else 
is unauthorized. If you are not the intended recipient, any disclosure, 
copying, distribution or any action taken or omitted to be taken in reliance on 
it, is prohibited and may be unlawful. Any opinions or advice contained in this 
e-mail are subject to the terms and conditions expressed in the governing KPMG 
client engagement letter.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -

[ceph-users] Re: Ceph CFP Coordination for 2021

2021-04-08 Thread Mike Perez
KubeCon NA has extended their CFP dates to May 23rd.

https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/program/cfp/#overview

DevConf.US also has its CFP open until May 31st.

https://www.devconf.info/us/

And lastly, we have Cloud-Native Data Management Day on May 4th with
its CFP open:

https://cndmday.com/

Please make sure to coordinate on the CFP etherpad:

https://pad.ceph.com/p/cfp-coordination


On Fri, Mar 26, 2021 at 7:22 AM Mike Perez  wrote:
>
> Hi everyone,
>
> I cleaned up the CFP coordination etherpad with some events coming up.
> Please add other events you think the community should be considering
> proposing content on Ceph or adjacent projects like Rook.
>
> KubeCon NA CFP, for example, is ending April 11. Take a look:
>
> https://pad.ceph.com/p/cfp-coordination
>
> I have also added this to our wiki for discovery.
>
> https://tracker.ceph.com/projects/ceph/wiki/Community
>
> --
> Mike Perez
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] KRBD failed to mount rbd image if mapping it to the host with read-only option

2021-04-08 Thread Ha, Son Hai
Hi everyone,

We encountered an issue with KRBD mounting after mapping it to the host with 
read-only option.
We try to pinpoint where the problem is, but not able to do it.

The image is mounted well if we map it without the "read-only" option.
This leads to an issue that the pod in k8s cannot use the snapshotted 
persistent volume created by ceph-csi rbd provisioner.
Thank you for reading.

I have reported the bug here: Bug #50234: krbd failed to mount after map image 
with read-only option - Ceph - Ceph

Context
- Using admin keyring
- Linux Kernel: 3.10.0-1160.15.2.el7.x86_64
- Linux Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo)
- Ceph version: "ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) 
nautilus (stable)"

rbd image 'csi-vol-85919409-9797-11eb-80ba-720b2b57c790':
size 10 GiB in 2560 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 533a03bba388ea
block_name_prefix: rbd_data.533a03bba388ea
format: 2
features: layering
op_features:
flags:
create_timestamp: Wed Apr  7 13:51:02 2021
access_timestamp: Wed Apr  7 13:51:02 2021
modify_timestamp: Wed Apr  7 13:51:02 2021

Bug Reproduction
# Map RBD image WITH read-only option, CANNOT mount with both readonly or 
readwrite option
sudo rbd device map -p k8s-sharedpool 
csi-vol-85919409-9797-11eb-80ba-720b2b57c790 -ro
  /dev/rbd0
sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
  mount: cannot mount /dev/rbd0 read-only

sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
  mount: /dev/rbd0 is write-protected, mounting read-only
  mount: cannot mount /dev/rbd0 read-only

# Map RBD image WITHOUT read-only option, CAN mount with both readonly or 
readwrite option
sudo rbd device map -p k8s-sharedpool 
csi-vol-85919409-9797-11eb-80ba-720b2b57c790
  /dev/rbd0
sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1

  mount: /mnt/test1 does not contain SELinux labels.
   You just mounted an file system that supports labels which does not
   contain labels, onto an SELinux box. It is likely that confined
   applications will generate AVC messages and not be allowed access to
   this file system.  For more details see restorecon(8) and mount(8).
  mount: /dev/rbd0 mounted on /mnt/test1.

sudo mount -v -t ext4 /dev/rbd0 /mnt/test1
  mount: /mnt/test1 does not contain SELinux labels.
   You just mounted an file system that supports labels which does not
   contain labels, onto an SELinux box. It is likely that confined
   applications will generate AVC messages and not be allowed access to
   this file system.  For more details see restorecon(8) and mount(8).
  mount: /dev/rbd0 mounted on /mnt/test1.

With my best regards,
Son Hai Ha


--
KPMG IT Service GmbH
Sitz/Registergericht: Berlin/Amtsgericht Charlottenburg, HRB 87521 B
Geschäftsführer: Hans-Christian Schwieger, Helmar Symmank
Aufsichtsratsvorsitzender: WP StB Klaus Becker
 
Allgemeine Informationen zur Datenverarbeitung im Rahmen unserer allgemeinen 
Geschäftstätigkeit sowie im Mandatsverhältnis gemäß EU 
Datenschutz-Grundverordnung sind hier 

 abrufbar.
 
Die Information in dieser E-Mail ist vertraulich und kann dem Berufsgeheimnis 
unterliegen. Sie ist ausschließlich für den Adressaten bestimmt. Jeglicher 
Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist 
untersagt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist 
Ihnen jede Veröffentlichung, Vervielfältigung oder Weitergabe wie auch das 
Ergreifen oder Unterlassen von Maßnahmen im Vertrauen auf erlangte Information 
untersagt. In dieser E-Mail enthaltene Meinungen oder Empfehlungen unterliegen 
den Bedingungen des jeweiligen Mandatsverhältnisses mit dem Adressaten.

The information in this e-mail is confidential and may be legally privileged. 
It is intended solely for the addressee. Access to this e-mail by anyone else 
is unauthorized. If you are not the intended recipient, any disclosure, 
copying, distribution or any action taken or omitted to be taken in reliance on 
it, is prohibited and may be unlawful. Any opinions or advice contained in this 
e-mail are subject to the terms and conditions expressed in the governing KPMG 
client engagement letter.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
I upgraded our Luminous cluster to Nautilus a couple of weeks ago and
converted the last batch of FileStore OSDs to BlueStore about 36 hours ago.
Yesterday our monitor cluster went nuts and started constantly calling
elections because monitor nodes were at 100% and wouldn't respond to
heartbeats. I reduced the monitor cluster to one to prevent the constant
elections and that let the system limp along until the backfills finished.
There are large amounts of time where ceph commands hang with the CPU is at
100%, when the CPU drops I see a lot of work getting done in the monitor
logs which stops as soon as the CPU is at 100% again.

I did a `perf top` on the node to see what's taking all the time and it
appears to be in the rocksdb code path. I've set `mon_compact_on_start =
true` in the ceph.conf but that does not appear to help. The
`/var/lib/ceph/mon/` directory is 311MB which is down from 3.0 GB while the
backfills were going on. I've tried adding a second monitor, but it goes
back to the constant elections. I tried restarting all the services without
luck. I also pulled the monitor from the network work and tried restarting
the mon service isolated (this helped a couple of weeks ago when `ceph -s`
would cause 100% CPU and lock up the service much worse than this) and
didn't see the high CPU load. So I'm guessing it's triggered from some
external source.

I'm happy to provide more info, just let me know what would be helpful.

Thank you,
Robert LeBlanc

[image: image.png]

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
On Thu, Apr 8, 2021 at 10:22 AM Robert LeBlanc  wrote:
>
> I upgraded our Luminous cluster to Nautilus a couple of weeks ago and 
> converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. 
> Yesterday our monitor cluster went nuts and started constantly calling 
> elections because monitor nodes were at 100% and wouldn't respond to 
> heartbeats. I reduced the monitor cluster to one to prevent the constant 
> elections and that let the system limp along until the backfills finished. 
> There are large amounts of time where ceph commands hang with the CPU is at 
> 100%, when the CPU drops I see a lot of work getting done in the monitor logs 
> which stops as soon as the CPU is at 100% again.
>
> I did a `perf top` on the node to see what's taking all the time and it 
> appears to be in the rocksdb code path. I've set `mon_compact_on_start = 
> true` in the ceph.conf but that does not appear to help. The 
> `/var/lib/ceph/mon/` directory is 311MB which is down from 3.0 GB while the 
> backfills were going on. I've tried adding a second monitor, but it goes back 
> to the constant elections. I tried restarting all the services without luck. 
> I also pulled the monitor from the network work and tried restarting the mon 
> service isolated (this helped a couple of weeks ago when `ceph -s` would 
> cause 100% CPU and lock up the service much worse than this) and didn't see 
> the high CPU load. So I'm guessing it's triggered from some external source.
>
> I'm happy to provide more info, just let me know what would be helpful.

Sent this to the dev list, but forgot it needed to be plain text. Here
is text output of the `perf top` taken a bit later, so not exactly the
same as the screenshot earlier.

Samples: 20M of event 'cycles', 4000 Hz, Event count (approx.):
61966526527 lost: 0/0 drop: 0/0
Overhead  Shared Object Symbol
 11.52%  ceph-mon  [.]
rocksdb::MemTable::KeyComparator::operator()
  6.80%  ceph-mon  [.]
rocksdb::MemTable::KeyComparator::operator()
  4.75%  ceph-mon  [.]
rocksdb::InlineSkipList::FindGreaterOrEqual
  2.89%  libc-2.27.so  [.] vfprintf
  2.54%  libtcmalloc.so.4.3.0  [.] tc_deletearray_nothrow
  2.31%  ceph-mon  [.] TLS init
function for rocksdb::perf_context
  2.14%  ceph-mon  [.] rocksdb::DBImpl::GetImpl
  1.53%  libc-2.27.so  [.] 0x0018acf8
  1.44%  libc-2.27.so  [.] _IO_default_xsputn
  1.34%  ceph-mon  [.] memcmp@plt
  1.32%  libtcmalloc.so.4.3.0  [.] tc_malloc
  1.28%  ceph-mon  [.] rocksdb::Version::Get
  1.27%  libc-2.27.so  [.] 0x0018abf4
  1.17%  ceph-mon  [.] RocksDBStore::get
  1.08%  ceph-mon  [.] 0x00639a33
  1.04%  ceph-mon  [.] 0x00639a0e
  0.89%  ceph-mon  [.] 0x00639a46
  0.86%  ceph-mon  [.] rocksdb::TableCache::Get
  0.72%  libc-2.27.so  [.] 0x0018abfe
  0.68%  libceph-common.so.0   [.] ceph_str_hash_rjenkins
  0.66%  ceph-mon  [.] rocksdb::Hash
  0.63%  ceph-mon  [.] rocksdb::MemTable::Get
  0.62%  ceph-mon  [.] 0x006399ff
  0.57%  libc-2.27.so  [.] 0x0018abf0
  0.57%  ceph-mon  [.]
rocksdb::GetContext::GetContext
  0.57%  ceph-mon  [.]
rocksdb::BlockBasedTable::Get
  0.57%  ceph-mon  [.]
rocksdb::BlockBasedTable::GetFilter
  0.55%  [vdso][.] __vdso_clock_gettime
  0.54%  ceph-mon  [.] 0x005afa17
  0.53%  ceph-mgr  [.]
std::_Rb_tree, std::less,
std::allocator >::equal_range
  0.51%  libceph-common.so.0   [.] PerfCounters::tinc
  0.50%  ceph-mon  [.]
OSDMonitor::make_snap_epoch_key[abi:cxx11]
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
On Thu, Apr 8, 2021 at 11:24 AM Robert LeBlanc  wrote:
>
> On Thu, Apr 8, 2021 at 10:22 AM Robert LeBlanc  wrote:
> >
> > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and 
> > converted the last batch of FileStore OSDs to BlueStore about 36 hours ago. 
> > Yesterday our monitor cluster went nuts and started constantly calling 
> > elections because monitor nodes were at 100% and wouldn't respond to 
> > heartbeats. I reduced the monitor cluster to one to prevent the constant 
> > elections and that let the system limp along until the backfills finished. 
> > There are large amounts of time where ceph commands hang with the CPU is at 
> > 100%, when the CPU drops I see a lot of work getting done in the monitor 
> > logs which stops as soon as the CPU is at 100% again.
> >
> > I did a `perf top` on the node to see what's taking all the time and it 
> > appears to be in the rocksdb code path. I've set `mon_compact_on_start = 
> > true` in the ceph.conf but that does not appear to help. The 
> > `/var/lib/ceph/mon/` directory is 311MB which is down from 3.0 GB while the 
> > backfills were going on. I've tried adding a second monitor, but it goes 
> > back to the constant elections. I tried restarting all the services without 
> > luck. I also pulled the monitor from the network work and tried restarting 
> > the mon service isolated (this helped a couple of weeks ago when `ceph -s` 
> > would cause 100% CPU and lock up the service much worse than this) and 
> > didn't see the high CPU load. So I'm guessing it's triggered from some 
> > external source.
> >
> > I'm happy to provide more info, just let me know what would be helpful.
>
> Sent this to the dev list, but forgot it needed to be plain text. Here
> is text output of the `perf top` taken a bit later, so not exactly the
> same as the screenshot earlier.
>
> Samples: 20M of event 'cycles', 4000 Hz, Event count (approx.):
> 61966526527 lost: 0/0 drop: 0/0
> Overhead  Shared Object Symbol
>  11.52%  ceph-mon  [.]
> rocksdb::MemTable::KeyComparator::operator()
>   6.80%  ceph-mon  [.]
> rocksdb::MemTable::KeyComparator::operator()
>   4.75%  ceph-mon  [.]
> rocksdb::InlineSkipList const&>::FindGreaterOrEqual
>   2.89%  libc-2.27.so  [.] vfprintf
>   2.54%  libtcmalloc.so.4.3.0  [.] tc_deletearray_nothrow
>   2.31%  ceph-mon  [.] TLS init
> function for rocksdb::perf_context
>   2.14%  ceph-mon  [.] 
> rocksdb::DBImpl::GetImpl
>   1.53%  libc-2.27.so  [.] 0x0018acf8
>   1.44%  libc-2.27.so  [.] _IO_default_xsputn
>   1.34%  ceph-mon  [.] memcmp@plt
>   1.32%  libtcmalloc.so.4.3.0  [.] tc_malloc
>   1.28%  ceph-mon  [.] rocksdb::Version::Get
>   1.27%  libc-2.27.so  [.] 0x0018abf4
>   1.17%  ceph-mon  [.] RocksDBStore::get
>   1.08%  ceph-mon  [.] 0x00639a33
>   1.04%  ceph-mon  [.] 0x00639a0e
>   0.89%  ceph-mon  [.] 0x00639a46
>   0.86%  ceph-mon  [.] 
> rocksdb::TableCache::Get
>   0.72%  libc-2.27.so  [.] 0x0018abfe
>   0.68%  libceph-common.so.0   [.] ceph_str_hash_rjenkins
>   0.66%  ceph-mon  [.] rocksdb::Hash
>   0.63%  ceph-mon  [.] rocksdb::MemTable::Get
>   0.62%  ceph-mon  [.] 0x006399ff
>   0.57%  libc-2.27.so  [.] 0x0018abf0
>   0.57%  ceph-mon  [.]
> rocksdb::GetContext::GetContext
>   0.57%  ceph-mon  [.]
> rocksdb::BlockBasedTable::Get
>   0.57%  ceph-mon  [.]
> rocksdb::BlockBasedTable::GetFilter
>   0.55%  [vdso][.] __vdso_clock_gettime
>   0.54%  ceph-mon  [.] 0x005afa17
>   0.53%  ceph-mgr  [.]
> std::_Rb_tree, std::less,
> std::allocator >::equal_range
>   0.51%  libceph-common.so.0   [.] PerfCounters::tinc
>   0.50%  ceph-mon  [.]
> OSDMonitor::make_snap_epoch_key[abi:cxx11]

Okay, I think I sent it to the old dev list. Trying again.

Thank you,
Robert LeBlanc
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: KRBD failed to mount rbd image if mapping it to the host with read-only option

2021-04-08 Thread Ha, Son Hai
Thank you. The option "noload" works as expected.

-Original Message-
From: Wido den Hollander  
Sent: Thursday, April 8, 2021 3:56 PM
To: Ha, Son Hai; ceph-users@ceph.io; ceph-us...@lists.ceph.com
Subject: Re: [ceph-users] KRBD failed to mount rbd image if mapping it to the 
host with read-only option



On 08/04/2021 14:09, Ha, Son Hai wrote:
> Hi everyone,
> 
> We encountered an issue with KRBD mounting after mapping it to the host with 
> read-only option.
> We try to pinpoint where the problem is, but not able to do it.
> 

See my reply down below.

> The image is mounted well if we map it without the "read-only" option.
> This leads to an issue that the pod in k8s cannot use the snapshotted 
> persistent volume created by ceph-csi rbd provisioner.
> Thank you for reading.
> 
> I have reported the bug here: Bug #50234: krbd failed to mount after 
> map image with read-only option - Ceph - 
> Ceph
> 
> Context
> - Using admin keyring
> - Linux Kernel: 3.10.0-1160.15.2.el7.x86_64
> - Linux Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo)
> - Ceph version: "ceph version 14.2.8 
> (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable)"
> 
> rbd image 'csi-vol-85919409-9797-11eb-80ba-720b2b57c790':
>  size 10 GiB in 2560 objects
>  order 22 (4 MiB objects)
>  snapshot_count: 0
>  id: 533a03bba388ea
>  block_name_prefix: rbd_data.533a03bba388ea
>  format: 2
>  features: layering
>  op_features:
>  flags:
>  create_timestamp: Wed Apr  7 13:51:02 2021
>  access_timestamp: Wed Apr  7 13:51:02 2021
>  modify_timestamp: Wed Apr  7 13:51:02 2021
> 
> Bug Reproduction
> # Map RBD image WITH read-only option, CANNOT mount with both readonly 
> or readwrite option sudo rbd device map -p k8s-sharedpool 
> csi-vol-85919409-9797-11eb-80ba-720b2b57c790 -ro
>/dev/rbd0
> sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
>mount: cannot mount /dev/rbd0 read-only
> 
> sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
>mount: /dev/rbd0 is write-protected, mounting read-only
>mount: cannot mount /dev/rbd0 read-only
> 

ext4 will always try to recover it's journal during mount and this means it 
wants to write. That fails.

Try this with mounting:

sudo mount -t ext4 -o norecover /dev/rbd0 /mnt/test1

or

sudo mount -t ext4 -o noload /dev/rbd0 /mnt/test1

Wido

> # Map RBD image WITHOUT read-only option, CAN mount with both readonly 
> or readwrite option sudo rbd device map -p k8s-sharedpool 
> csi-vol-85919409-9797-11eb-80ba-720b2b57c790
>/dev/rbd0
> sudo mount -v -r -t ext4 /dev/rbd0 /mnt/test1
> 
>mount: /mnt/test1 does not contain SELinux labels.
> You just mounted an file system that supports labels which does not
> contain labels, onto an SELinux box. It is likely that confined
> applications will generate AVC messages and not be allowed access to
> this file system.  For more details see restorecon(8) and mount(8).
>mount: /dev/rbd0 mounted on /mnt/test1.
> 
> sudo mount -v -t ext4 /dev/rbd0 /mnt/test1
>mount: /mnt/test1 does not contain SELinux labels.
> You just mounted an file system that supports labels which does not
> contain labels, onto an SELinux box. It is likely that confined
> applications will generate AVC messages and not be allowed access to
> this file system.  For more details see restorecon(8) and mount(8).
>mount: /dev/rbd0 mounted on /mnt/test1.
> 
> With my best regards,
> Son Hai Ha
> 
> 
> --
> KPMG IT Service GmbH
> Sitz/Registergericht: Berlin/Amtsgericht Charlottenburg, HRB 87521 B
> Geschäftsführer: Hans-Christian Schwieger, Helmar Symmank
> Aufsichtsratsvorsitzender: WP StB Klaus Becker
>   
> Allgemeine Informationen zur Datenverarbeitung im Rahmen unserer allgemeinen 
> Geschäftstätigkeit sowie im Mandatsverhältnis gemäß EU 
> Datenschutz-Grundverordnung sind hier 
> 
>  abrufbar.
>   
> Die Information in dieser E-Mail ist vertraulich und kann dem Berufsgeheimnis 
> unterliegen. Sie ist ausschließlich für den Adressaten bestimmt. Jeglicher 
> Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist 
> untersagt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, 
> ist Ihnen jede Veröffentlichung, Vervielfältigung oder Weitergabe wie auch 
> das Ergreifen oder Unterlassen von Maßnahmen im Vertrauen auf erlangte 
> Information untersagt. In dieser E-Mail enthaltene Meinungen oder 
> Empfehlungen unterliegen den Bedingungen des jeweiligen Mandatsverhältnisses 
> mit dem Adressaten.
> 
> The information in this e-mail is confidential and may be legally privileged. 
> It is intended solely for the addressee. Access to this e-mail by anyone else 
> is unauthorized. If you are not the intended recipien

[ceph-users] Nautilus 14.2.19 radosgw ignoring ceph config

2021-04-08 Thread Graham Allan
We just updated one of our ceph clusters from 14.2.15 to 14.2.19, and 
see some unexpected behavior by radosgw - it seems to ignore parameters 
set by the ceph config database. Specifically this is making it start up 
listening only on port 7480, and not the configured 80 and 443 (ssl) ports.


Downgrading ceph on the rgw nodes back to 14.2.15 restores the expected 
behavior (I haven't yet tried any intermediate versions). The host OS is 
CentOS 7, if that matters...


Here's a ceph config dump for one of the affected nodes, along with the 
radosgw startup log:



# ceph config dump|grep tier2-gw02
client.rgw.tier2-gw02basiclog_file   /var/log/ceph/radosgw.log *  
client.rgw.tier2-gw02advanced rgw_dns_name   s3.msi.umn.edu*  
client.rgw.tier2-gw02advanced rgw_enable_usage_log   true 
client.rgw.tier2-gw02basicrgw_frontends  beast port=80 ssl_port=443 ssl_certificate=/etc/ceph/civetweb.pem *  
client.rgw.tier2-gw02basicrgw_thread_pool_size   512  




# tail /var/log/ceph/radosgw.log
2021-04-08 11:51:07.956 7f420b78f700 -1 received  signal: Terminated from 
/usr/lib/systemd/systemd --switched-root --system --deserialize 22  (PID: 1) 
UID: 0
2021-04-08 11:51:07.956 7f420b78f700  1 handle_sigterm
2021-04-08 11:51:07.956 7f4220bc5900 -1 shutting down
2021-04-08 11:51:07.956 7f420b78f700  1 handle_sigterm set alarm for 120
2021-04-08 11:51:08.010 7f4220bc5900  1 final shutdown
2021-04-08 11:51:08.159 7f2ac6105900  0 deferred set uid:gid to 167:167 
(ceph:ceph)
2021-04-08 11:51:08.159 7f2ac6105900  0 ceph version 14.2.19 
(bb796b9b5bab9463106022eef406373182465d11) nautilus (stable), process radosgw, 
pid 88256
2021-04-08 11:51:08.300 7f2ac6105900  0 starting handler: beast
2021-04-08 11:51:08.302 7f2ac6105900  0 set uid:gid to 167:167 (ceph:ceph)
2021-04-08 11:51:08.317 7f2ac6105900  1 mgrc service_daemon_register 
rgw.tier2-gw02 metadata {arch=x86_64,ceph_release=nautilus,ceph_version=ceph 
version 14.2.19 (bb796b9b5bab9463106022eef406373182465d11) nautilus 
(stable),ceph_version_short=14.2.19,cpu=AMD EPYC 7302P 16-Core 
Processor,distro=centos,distro_description=CentOS Linux 7 
(Core),distro_version=7,frontend_config#0=beast 
port=7480,frontend_type#0=beast,hostname=tier2-gw02.msi.umn.edu,kernel_description=#1
 SMP Tue Mar 16 18:28:22 UTC 
2021,kernel_version=3.10.0-1160.21.1.el7.x86_64,mem_swap_kb=4194300,mem_total_kb=131754828,num_handles=1,os=Linux,pid=88256,zone_id=default,zone_name=default,zonegroup_id=default,zonegroup_name=default}


BTW I can also change "rgw_frontends" to specify a civetweb frontend 
instead and it will still start the default beast...


I haven't seen anyone else report such a problem so I wonder if this is 
something local to us - like perhaps I'm using "ceph config" incorrectly 
in a way which happened to be accepted before? Has anyone else seen this 
behavior?


Graham
--
Graham Allan - g...@umn.edu
Associate Director of Operations - Minnesota Supercomputing Institute
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Version of podman for Ceph 15.2.10

2021-04-08 Thread mabi
Hello,

I would like to install Ceph 15.2.10 using cephadm and just found the following 
table by checking the requirements on the host:

https://docs.ceph.com/en/latest/cephadm/compatibility/#compatibility-with-podman-versions

Do I understand this table correctly that I should be using podman version 2.1?

and what happens if I use the latest podman version 3.0

Best regards,
Mabi

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
I found this thread that matches a lot of what I'm seeing. I see the
ms_dispatch thread going to 100%, but I'm at a single MON, the
recovery is done and the rocksdb MON database is ~300MB. I've tried
all the settings mentioned in that thread with no noticeable
improvement. I was hoping that once the recovery was done (backfills
to reformatted OSDs) that it would clear up, but not yet. So any other
ideas would be really helpful. Our MDS is functioning, but stalls a
lot because the mons miss heartbeats.

mon_compact_on_start = true
rocksdb_cache_size = 1342177280
mon_lease = 30
mon_osd_cache_size = 20
mon_sync_max_payload_size = 4096


Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Thu, Apr 8, 2021 at 1:11 PM Stefan Kooman  wrote:
>
> On 4/8/21 6:22 PM, Robert LeBlanc wrote:
> > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and
> > converted the last batch of FileStore OSDs to BlueStore about 36 hours
> > ago. Yesterday our monitor cluster went nuts and started constantly
> > calling elections because monitor nodes were at 100% and wouldn't
> > respond to heartbeats. I reduced the monitor cluster to one to prevent
> > the constant elections and that let the system limp along until the
> > backfills finished. There are large amounts of time where ceph commands
> > hang with the CPU is at 100%, when the CPU drops I see a lot of work
> > getting done in the monitor logs which stops as soon as the CPU is at
> > 100% again.
>
>
> Try reducing mon_sync_max_payload_size=4096. I have seen Frank Schilder
> advise this several times because of monitor issues. Also recently for a
> cluster that got upgraded from Luminous -> Mimic -> Nautilus.
>
> Worth a shot.
>
> Otherwise I'll try to look in depth and see if I can come up with
> something smart (for now I need to go catch some sleep).
>
> Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bluestore_min_alloc_size_hdd on Octopus (15.2.10) / XFS formatted RBDs

2021-04-08 Thread Igor Fedotov

Hi David,


On 4/7/2021 7:43 PM, David Orman wrote:

Now that the hybrid allocator appears to be enabled by default in
Octopus, is it safe to change bluestore_min_alloc_size_hdd to 4k from
64k on Octopus 15.2.10 clusters, and then redeploy every OSD to switch
to the smaller allocation size, without massive performance impact for
RBD? We're seeing a lot of storage usage amplification on EC 8+3
clusters which are HDD backed that lines up with a lot of the mailing
list posts we've seen here. Upgrading to Pacific before making this
change is also a possibility once a more stable release arrives, if
that's necessary.


I wouldn't recommend switching to 4K min alloc size for pre-Pacific 
cluesters. Additional fixes besides Hybrid Allocator are required to 
avoid performance degradation.


And we decided not to backport that changes to Octopus as they look too 
complicated.





Second part of this question - we are using RBDs currently on the
clusters impacted. These have XFS filesystems on top, which detect the
sector size of the RBD as 512byte, and XFS has a block size of 4k.
With the default of 64k for bluestore_min_alloc_size_hdd, let's say a
1G file is written out to the XFS filesystem backed by the RBD. On the
ceph side, is this being seen as a lot of 4k objects thus a
significant space waste is occurring, or is RBD able to coalesce these
into 64k objects, even though XFS is using a 4k block size?

XFS details below, you can see the allocation groups are quite large:

meta-data=/dev/rbd0  isize=512agcount=501, agsize=268435440 blks
  =   sectsz=512   attr=2, projid32bit=1
  =   crc=1finobt=1, sparse=1, rmapbt=0
  =   reflink=1
data =   bsize=4096   blocks=134217728000, imaxpct=1
  =   sunit=16 swidth=16 blks
naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=521728, version=2
  =   sectsz=512   sunit=16 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

I'm curious if people have been tuning XFS on RBD for better
performance, as well.


I presume that actual writing blocks are determined primarily by the 
application - e.g. whether buffered/direct I/O is in use and how often 
flush/sync calls are made.


Speculating rather than know for sure though




Thank you!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Abandon incomplete (damaged EC) pgs - How to manage the impact on cephfs?

2021-04-08 Thread Joshua West
Hey everyone.

Inside of cephfs, I have a directory which I setup a directory layout
field to use an erasure coded (CLAY) pool, specific to the task. The
rest of my cephfs is using normal replication.

Fast forward some time, and the EC directory has been used pretty
extensively, and through some bad luck and poor timing, ~200pgs are in
an incomplete state, and the OSDs are completely gone and
unrecoverable. (Specifically OSD 31 and 34, not that it matters at
this point)

# ceph pg ls incomplete --> is attached for reference.

Fortunately, it's primarily (only) my on-site backups, and other
replaceable data inside of

I tried for a few days to recover the PGs:
 - Recreate blank OSDs with correct ID (was blocked by non-existant OSDs)
 - Deep Scrub
 - osd_find_best_info_ignore_history_les = true (`pg query` was
showing related error)
etc.

I've finally just accepted this pool to be a lesson learned, and want
to get the rest of my cephfs back to normal.

My questions:

 -- `ceph osd force-create-pg` doesn't appear to fix pgs, even for pgs
with 0 objects
 -- Deleting the pool seems like an appropriate step, but as I am
using an xattr within cephfs, which is otherwise on another pool, I am
not confident that this approach is safe?
 -- cephfs currently blocks when attemping to impact every third file
in the EC directory. Once I delete the pool, how will I remove the
files if even `rm` is blocking?

Thank you for your time,

Joshua West
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Stefan Kooman

On 4/8/21 6:22 PM, Robert LeBlanc wrote:
I upgraded our Luminous cluster to Nautilus a couple of weeks ago and 
converted the last batch of FileStore OSDs to BlueStore about 36 hours 
ago. Yesterday our monitor cluster went nuts and started constantly 
calling elections because monitor nodes were at 100% and wouldn't 
respond to heartbeats. I reduced the monitor cluster to one to prevent 
the constant elections and that let the system limp along until the 
backfills finished. There are large amounts of time where ceph commands 
hang with the CPU is at 100%, when the CPU drops I see a lot of work 
getting done in the monitor logs which stops as soon as the CPU is at 
100% again.



Try reducing mon_sync_max_payload_size=4096. I have seen Frank Schilder 
advise this several times because of monitor issues. Also recently for a 
cluster that got upgraded from Luminous -> Mimic -> Nautilus.


Worth a shot.

Otherwise I'll try to look in depth and see if I can come up with 
something smart (for now I need to go catch some sleep).


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] short pages when listing RADOSGW buckets via Swift API

2021-04-08 Thread Paul Collins
Hi,

I noticed while using rclone to migrate some data from a Swift
cluster into a RADOSGW cluster that sometimes when listing a
bucket RADOSGW will not always return as many results as specified
by the "limit" parameter, even when more objects remain to list.

This results in rclone believing on subsequent runs that the
objects do not exist, since it performs an initial comparison
based on bucket listings, and so it needlessly recopies data.

This seems contrary to how pagination is specified by Swift:

https://docs.openstack.org/swift/latest/api/pagination.html

Is this known behaviour, or should I go ahead and file a bug?

I believe the cluster is running 15.2.8 or so, but will confirm.

Thanks,
Paul

---

Further observations:

 * Here's a summary of the reply lengths I got when listing
   various buckets in our RADOSGW cluster.  (This is not all of
   the buckets in the tenant; the other 100 or so are fine.)

reply lengths: 1000 999 1000 1000 1000 1000 1000 1000 1000 1000 119
reply lengths: 1000 992 1000 1000 1000 1000 1000 935 1000 1000 257
reply lengths: 1000 1000 1000 1000 1000 975 1000 948
reply lengths: 953 1000 1000 1000 1000 1000 954 1000 1000 70
reply lengths: 1000 1000 1000 1000 998 15
reply lengths: 1000 1000 1000 1000 974 1000 1000 1000 1000 1000 1000 1000 1000 
1000 1000 1000 939 1000 1000 1000 1000 949 1000 1000 1000 644
reply lengths: 1000 1000 1000 1000 999 1000 1000 937 1000 1000 538
reply lengths: 1000 998 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 
551
reply lengths: 1000 1000 1000 1000 1000 1000 1000 931 1000 986 1000 1000 1000 
975 1000 989 1000 1000 1000 966 1000 998 921 994 1000 1000 973 58
reply lengths: 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 976 
1000 366
reply lengths: 1000 1000 1000 1000 1000 983 1000 1000 1000 1000 1000 1000 1000 
517
reply lengths: 1000 1000 1000 984 1000 1000 971 1000 1000 401
reply lengths: 949 1000 1000 1000 1000 1000 1000 403
reply lengths: 1000 998 532
reply lengths: 951 1000 1000 1000 1000 1000 976 1000 877

 * rclone uses a default $limit of 1,000, in contrast to the
   Python swiftclient's default of 10,000.

 * The Swift API doc seems clear that $limit results should always
   be returned if at least $limit results are available, and that
   receiving less than $limit results indicates no more exist.

   (It doesn't *explicitly* say the last, but the document could
   be a lot shorter if it were not intended for that to follow.)

 * When swiftclient is asked to fetch a listing, and full_listing
   is set to True, instead of implementing pagingation as
   described in the document above, swiftclient simply keeps
   fetching pages until it receives an empty page.

   So Swift API implementations that don't strictly implement
   paging per the docs may not even be noticed by most users.

 * From a review of its code, swiftclient seems to have done this
   since the very beginning.  Perhaps the code was written first
   and then pagination on the server side was nailed down later?

-- 
Paul Collins
Wellington, New Zealand
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Abandon incomplete (damaged EC) pgs - How to manage the impact on cephfs?

2021-04-08 Thread Michael Thomas

Hi Joshua,

I have had a similar issue three different times on one of my cephfs 
pools (15.2.10). The first time this happened I had lost some OSDs.  In 
all cases I ended up with degraded PGs with unfound objects that could 
not be recovered.


Here's how I recovered from the situation.  Note that this will 
permanently remove the affected files from ceph.  Restoring them from 
backup is an excercise left to the reader.


* Make a list of the affected PGs:
  ceph pg dump_stuck  | grep recovery_unfound > pg.txt

* Make a list of the affected objects (OIDs):
  cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg 
$pg list_unfound | jq '.objects[].oid.oid' ; done | sed -e 's/"//g' > 
oid.txt


* Convert the OID numbers to inodes using 'printf "%d\n" 0x${oid}' and 
put the results in a file called 'inum.txt'


* On a ceph client, find the files that correspond to the affected inodes:
  cat inum.txt | while read inum ; do echo -n "${inum} " ; find 
/ceph/frames/O3/raw -inum ${inum} ; done > files.txt


* It may be helpful to put this table of PG, OID, inum, and files into a 
spreadsheet to keep track of what's been done.


* On the ceph client, use 'unlink' to remove the files from the 
filesystem.  Do not use 'rm', as it will hang while calling 'stat()' on 
each file.  Even unlink may hang when you first try it.  If it does 
hang, do the following to get it unstuck:

  - Reboot the client
  - Restart each mon and the mgr.  I rebooted each mon/mgr, but it may 
be sufficient to restart the services without a reboot.

  - Try using 'unlink' again

* After all of the affected files have been removed, go through the list 
of PGs and remove the unfound OIDs:

  ceph pg $pgid mark_unfound_lost delete

...or if you're feeling brave, delete them all at once:
  cat pg.txt | awk '{print $1}' | while read pg ; do echo $pg ; ceph pg 
$pg mark_unfound_lost delete ; done


* Watch the output of 'ceph -s' to see the health of the pools/pgs recover.

* Restore the deleted files from backup, or decide that you don't care 
about them and don't do anything
This procedure lets you fix the problem without deleting the affected 
pool.  To be honest, the first time it happened, my solution was to 
first copy all of the data off of the affected pool and onto a new pool. 
 I later found this to be unnecessary.  But if you want to pursue this, 
here's what I suggest:


* Follow the steps above to get rid of the affected files.  I feel this 
should still be done even though you don't care about saving the data, 
to prevent corruption in the cephfs metadata.


* Go through the entire filesystem and look for:
  - files that are located on the pool (ceph.file.layout.pool = $pool_name)
  - directories that are set to write files to the pool 
(ceph.dir.layout.pool = $pool_name)


* After you confirm that no files or directories are pointing at the 
pool anymore, run 'ceph df' and look at the number of objects in the 
pool.  Ideally, it would be zero.  But more than likely it isn't.  This 
could be a simple mismatch in the object count in cephfs (harmless), or 
there could be clients with open filehandles on files that have been 
removed.  such objects will still appear in the rados listing of the 
pool[1]:

  rados -p $pool_name ls
  for obj in $(rados -p $pool_name ls); do echo $obj; rados -p 
$pool_name getxattr parent | strings; done


* To check for clients with access to these stray objects, dump the mds 
cache:

  ceph daemon mds.ceph1 dump cache /tmp/cache.txt

* Look for lines that refer to the stray objects, like this:
  [inode 0x1020fbc [2,head] ~mds0/stray6/1020fbc auth v7440537 
s=252778863 nl=0 n(v0 rc2020-12-11T21:17:59.454863-0600 b252778863 
1=1+0) (iversion lock) caps={9541437=pAsLsXsFscr/pFscr@2},l=9541437 | 
caps=1 authpin=0 0x563a7e52a000]


* The 'caps' field in the output above contains the client session id 
(eg 9541437).  Search the MDS for sessions that match to identify the 
client:

  ceph daemon mds.ceph1 session ls > session.txt
  Search through 'session.txt' for matching entries.  This will give 
you the IP address of the client:

"id": 9541437,
"entity": {
"name": {
"type": "client",
"num": 9541437
},
"addr": {
"type": "v1",
"addr": "10.13.5.48:0",
"nonce": 2011077845
}
},

* Restart the client's connection to ceph to get it to drop the cap.  I 
did this by rebooting the client, but there may be gentler ways to do it.


* Once you've done this clean up, it should be safe to remove the pool 
from cephfs:

  ceph fs rm_data_pool $fs_name $pool_name

* Once the pool has been detached from cephfs, you can remove it from 
ceph altogether:

  ceph osd pool rm $pool_name $pool_name --yes-i-really-really-mean-it

Hope this helps,

--Mike
[1]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005234.html



On 4/8/21 5:41 PM, Joshu

[ceph-users] Re: Version of podman for Ceph 15.2.10

2021-04-08 Thread David Orman
The latest podman 3.0.1 release is fine (we have many production clusters 
running this). We have not tested 3.1 yet, however, but will soon.

> On Apr 8, 2021, at 10:32, mabi  wrote:
> 
> Hello,
> 
> I would like to install Ceph 15.2.10 using cephadm and just found the 
> following table by checking the requirements on the host:
> 
> https://docs.ceph.com/en/latest/cephadm/compatibility/#compatibility-with-podman-versions
> 
> Do I understand this table correctly that I should be using podman version 
> 2.1?
> 
> and what happens if I use the latest podman version 3.0
> 
> Best regards,
> Mabi
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus 14.2.19 radosgw ignoring ceph config

2021-04-08 Thread Arnaud Lefebvre
Hello Graham,

We have the same issue after an upgrade from 14.2.16 to 14.2.19. I
tracked down the issue today and made a bug report a few hours ago:
https://tracker.ceph.com/issues/50249. Maybe the title can be adjusted
if more than rgw_frontends is impacted.
First nautilus release I found with the commit I mentioned in the bug
report is 14.2.17.

Hope this helps.

On Thu, 8 Apr 2021 at 21:00, Graham Allan  wrote:
>
> We just updated one of our ceph clusters from 14.2.15 to 14.2.19, and
> see some unexpected behavior by radosgw - it seems to ignore parameters
> set by the ceph config database. Specifically this is making it start up
> listening only on port 7480, and not the configured 80 and 443 (ssl) ports.
>
> Downgrading ceph on the rgw nodes back to 14.2.15 restores the expected
> behavior (I haven't yet tried any intermediate versions). The host OS is
> CentOS 7, if that matters...
>
> Here's a ceph config dump for one of the affected nodes, along with the
> radosgw startup log:
>
> > # ceph config dump|grep tier2-gw02
> > client.rgw.tier2-gw02basiclog_file  
> >  /var/log/ceph/radosgw.log  
> >*
> > client.rgw.tier2-gw02advanced rgw_dns_name  
> >  s3.msi.umn.edu 
> >*
> > client.rgw.tier2-gw02advanced rgw_enable_usage_log  
> >  true
> > client.rgw.tier2-gw02basicrgw_frontends 
> >  beast port=80 ssl_port=443 
> > ssl_certificate=/etc/ceph/civetweb.pem *
> > client.rgw.tier2-gw02basicrgw_thread_pool_size  
> >  512
>
>
> > # tail /var/log/ceph/radosgw.log
> > 2021-04-08 11:51:07.956 7f420b78f700 -1 received  signal: Terminated from 
> > /usr/lib/systemd/systemd --switched-root --system --deserialize 22  (PID: 
> > 1) UID: 0
> > 2021-04-08 11:51:07.956 7f420b78f700  1 handle_sigterm
> > 2021-04-08 11:51:07.956 7f4220bc5900 -1 shutting down
> > 2021-04-08 11:51:07.956 7f420b78f700  1 handle_sigterm set alarm for 120
> > 2021-04-08 11:51:08.010 7f4220bc5900  1 final shutdown
> > 2021-04-08 11:51:08.159 7f2ac6105900  0 deferred set uid:gid to 167:167 
> > (ceph:ceph)
> > 2021-04-08 11:51:08.159 7f2ac6105900  0 ceph version 14.2.19 
> > (bb796b9b5bab9463106022eef406373182465d11) nautilus (stable), process 
> > radosgw, pid 88256
> > 2021-04-08 11:51:08.300 7f2ac6105900  0 starting handler: beast
> > 2021-04-08 11:51:08.302 7f2ac6105900  0 set uid:gid to 167:167 (ceph:ceph)
> > 2021-04-08 11:51:08.317 7f2ac6105900  1 mgrc service_daemon_register 
> > rgw.tier2-gw02 metadata 
> > {arch=x86_64,ceph_release=nautilus,ceph_version=ceph version 14.2.19 
> > (bb796b9b5bab9463106022eef406373182465d11) nautilus 
> > (stable),ceph_version_short=14.2.19,cpu=AMD EPYC 7302P 16-Core 
> > Processor,distro=centos,distro_description=CentOS Linux 7 
> > (Core),distro_version=7,frontend_config#0=beast 
> > port=7480,frontend_type#0=beast,hostname=tier2-gw02.msi.umn.edu,kernel_description=#1
> >  SMP Tue Mar 16 18:28:22 UTC 
> > 2021,kernel_version=3.10.0-1160.21.1.el7.x86_64,mem_swap_kb=4194300,mem_total_kb=131754828,num_handles=1,os=Linux,pid=88256,zone_id=default,zone_name=default,zonegroup_id=default,zonegroup_name=default}
>
> BTW I can also change "rgw_frontends" to specify a civetweb frontend
> instead and it will still start the default beast...
>
> I haven't seen anyone else report such a problem so I wonder if this is
> something local to us - like perhaps I'm using "ceph config" incorrectly
> in a way which happened to be accepted before? Has anyone else seen this
> behavior?
>
> Graham
> --
> Graham Allan - g...@umn.edu
> Associate Director of Operations - Minnesota Supercomputing Institute
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Arnaud Lefebvre
Clever Cloud
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-08 Thread Robert LeBlanc
Good thought. The storage for the monitor data is a RAID-0 over three
NVMe devices. Watching iostat, they are completely idle, maybe 0.8% to
1.4% for a second every minute or so.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Thu, Apr 8, 2021 at 7:48 PM Zizon Qiu  wrote:
>
> Will it be related to some kind of disk issue of that mon located in,which 
> may casually
> slow down IO and further the rocksdb?
>
>
> On Fri, Apr 9, 2021 at 4:29 AM Robert LeBlanc  wrote:
>>
>> I found this thread that matches a lot of what I'm seeing. I see the
>> ms_dispatch thread going to 100%, but I'm at a single MON, the
>> recovery is done and the rocksdb MON database is ~300MB. I've tried
>> all the settings mentioned in that thread with no noticeable
>> improvement. I was hoping that once the recovery was done (backfills
>> to reformatted OSDs) that it would clear up, but not yet. So any other
>> ideas would be really helpful. Our MDS is functioning, but stalls a
>> lot because the mons miss heartbeats.
>>
>> mon_compact_on_start = true
>> rocksdb_cache_size = 1342177280
>> mon_lease = 30
>> mon_osd_cache_size = 20
>> mon_sync_max_payload_size = 4096
>>
>> 
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>> On Thu, Apr 8, 2021 at 1:11 PM Stefan Kooman  wrote:
>> >
>> > On 4/8/21 6:22 PM, Robert LeBlanc wrote:
>> > > I upgraded our Luminous cluster to Nautilus a couple of weeks ago and
>> > > converted the last batch of FileStore OSDs to BlueStore about 36 hours
>> > > ago. Yesterday our monitor cluster went nuts and started constantly
>> > > calling elections because monitor nodes were at 100% and wouldn't
>> > > respond to heartbeats. I reduced the monitor cluster to one to prevent
>> > > the constant elections and that let the system limp along until the
>> > > backfills finished. There are large amounts of time where ceph commands
>> > > hang with the CPU is at 100%, when the CPU drops I see a lot of work
>> > > getting done in the monitor logs which stops as soon as the CPU is at
>> > > 100% again.
>> >
>> >
>> > Try reducing mon_sync_max_payload_size=4096. I have seen Frank Schilder
>> > advise this several times because of monitor issues. Also recently for a
>> > cluster that got upgraded from Luminous -> Mimic -> Nautilus.
>> >
>> > Worth a shot.
>> >
>> > Otherwise I'll try to look in depth and see if I can come up with
>> > something smart (for now I need to go catch some sleep).
>> >
>> > Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Abandon incomplete (damaged EC) pgs - How to manage the impact on cephfs?

2021-04-08 Thread Szabo, Istvan (Agoda)
Hi,

So finally how did you solve it? Which method out of the three?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Joshua West 
Sent: Friday, April 9, 2021 5:41 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Abandon incomplete (damaged EC) pgs - How to manage the 
impact on cephfs?

Hey everyone.

Inside of cephfs, I have a directory which I setup a directory layout field to 
use an erasure coded (CLAY) pool, specific to the task. The rest of my cephfs 
is using normal replication.

Fast forward some time, and the EC directory has been used pretty extensively, 
and through some bad luck and poor timing, ~200pgs are in an incomplete state, 
and the OSDs are completely gone and unrecoverable. (Specifically OSD 31 and 
34, not that it matters at this point)

# ceph pg ls incomplete --> is attached for reference.

Fortunately, it's primarily (only) my on-site backups, and other replaceable 
data inside of

I tried for a few days to recover the PGs:
 - Recreate blank OSDs with correct ID (was blocked by non-existant OSDs)
 - Deep Scrub
 - osd_find_best_info_ignore_history_les = true (`pg query` was showing related 
error) etc.

I've finally just accepted this pool to be a lesson learned, and want to get 
the rest of my cephfs back to normal.

My questions:

 -- `ceph osd force-create-pg` doesn't appear to fix pgs, even for pgs with 0 
objects
 -- Deleting the pool seems like an appropriate step, but as I am using an 
xattr within cephfs, which is otherwise on another pool, I am not confident 
that this approach is safe?
 -- cephfs currently blocks when attemping to impact every third file in the EC 
directory. Once I delete the pool, how will I remove the files if even `rm` is 
blocking?

Thank you for your time,

Joshua West
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io