[ceph-users] Ceph Octopus - How to customize the Grafana configuration

2021-06-10 Thread Ralph Soika

Hello,

I have installed and bootsraped a Ceph manager node via cephadm and the 
options:


    --initial-dashboard-user admin --initial-dashboard-password 
[PASSWORD] --dashboard-password-noupdate


Everything works fine. I also have the Grafana Board to monitor my 
cluster. But the access to Grafana is open for anonymous users because 
of the grafana.ini template with the option:


[auth.anonymous]
enabled = true


I can't figure out how to tweak the default grafana.ini file. Can 
someone help me how to do this?



I tried to do this with the command:

# ceph config-key set mgr/cephadm/services/grafana/grafana.ini \
  -i /tmp//grafana.ini.j2

# ceph orch reconfig grafana

But without any effect. I also did not really understand where I should 
place the grafana.ini file on my Host?


Thanks for any help

===
Ralph

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus - How to customize the Grafana configuration

2021-06-10 Thread Eugen Block

Hi,

you can edit the config file  
/var/lib/ceph//grafana.host1/etc/grafana/grafana.ini (created by  
cephadm) and then restart the container. This works in my octopus lab  
environment.


Regards,
Eugen


Zitat von Ralph Soika :


Hello,

I have installed and bootsraped a Ceph manager node via cephadm and  
the options:


    --initial-dashboard-user admin --initial-dashboard-password  
[PASSWORD] --dashboard-password-noupdate


Everything works fine. I also have the Grafana Board to monitor my  
cluster. But the access to Grafana is open for anonymous users  
because of the grafana.ini template with the option:


[auth.anonymous]
enabled = true


I can't figure out how to tweak the default grafana.ini file. Can  
someone help me how to do this?



I tried to do this with the command:

# ceph config-key set mgr/cephadm/services/grafana/grafana.ini \
  -i /tmp//grafana.ini.j2

# ceph orch reconfig grafana

But without any effect. I also did not really understand where I  
should place the grafana.ini file on my Host?


Thanks for any help

===
Ralph

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Integration of openstack to ceph

2021-06-10 Thread Michel Niyoyita
Dear Ceph Users,

Anyone can help on the guidance of how I can integrate ceph to openstack ?
especially RGW.

Regards

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Integration of openstack to ceph

2021-06-10 Thread Janne Johansson
Have you checked
https://docs.ceph.com/en/latest/radosgw/keystone/
 ?

Den tors 10 juni 2021 kl 10:06 skrev Michel Niyoyita :
>
> Dear Ceph Users,
>
> Anyone can help on the guidance of how I can integrate ceph to openstack ?
> especially RGW.
>
> Regards
>
> Michel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Integration of openstack to ceph

2021-06-10 Thread David Caro

Hi, we are working on doing something similar, and there's mainly two ways we 
integrate it:

* cinder (openstack project) and rbd (ceph), for volumes, this has been working 
well for a while.
* swift (openstack project) and rgw (ceph), for object storage, this is under 
evaluation.

You might be able to use a different integration skipping the openstack project 
layer, but we have that as a
requirement. Though the opensstack project layer allows quota and user 
management on the openstack side, so it's easier
to adopt for us.

Let us know if you find another way, and how it goes for you :)

On 06/10 10:06, Michel Niyoyita wrote:
> Dear Ceph Users,
> 
> Anyone can help on the guidance of how I can integrate ceph to openstack ?
> especially RGW.
> 
> Regards
> 
> Michel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation 
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow ops at restarting OSDs (octopus)

2021-06-10 Thread Manuel Lausch
Hi,

has no one a idea what could cause this issue. Or how I could debug it?

In some days I have to go live with this cluster. If I don't have a
solution I have to go live with nautilus. 


Manuel

On Mon, 7 Jun 2021 15:46:18 +0200
Manuel Lausch  wrote:

> Hello,
> 
> I implemented a new cluster with 48 Nodes á 24 OSDs.  
> I have a replicated pool with 4 replica. The crushrule distributes the
> replicas to different racks.
> 
> With this cluster I tested a upgrade from Nautilis (14.2.20) to
> Octopus (15.2.13). The update itself worked well until I began the
> restarts of the OSDs in the 4th rack. Since then I get slow ops while
> stopping OSDs. I think something happend here, after all replica
> partners are running on the new version. This issue remains after
> completing the upgrade. 
> 
> With Nautilus I had similar issues with slow ops when stopping OSDs. I
> could resolve this with the option „osd_fast_shutdown → false“. I let
> this option set to false while upgrading. For testing/debugging, I set
> this to true (default value) and got better results when stopping
> OSDs, but the problem is not completely vanished.
> 
> Had someone else this problem and could fix it? What can I do to get
> rid of slow ops when resarting OSDs?
> 
> All Servers are connected with 2x10G network links
> 
> 
> Manuel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Manuel Lausch

Systemadministrator
Storage Services

1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 |
76135 Karlsruhe | Germany Phone: +49 721 91374-1847
E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 5452

Geschäftsführer: Alexander Charles, Thomas Ludwig, Jan Oetjen, Sascha
Vollmer


Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat
sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie
bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem
bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu
verwenden.

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient of this e-mail, you are hereby
notified that saving, distribution or use of the content of this e-mail
in any way is prohibited. If you have received this e-mail in error,
please notify the sender and delete the e-mail.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow ops at restarting OSDs (octopus)

2021-06-10 Thread Peter Lieven
Am 10.06.21 um 11:08 schrieb Manuel Lausch:
> Hi,
>
> has no one a idea what could cause this issue. Or how I could debug it?
>
> In some days I have to go live with this cluster. If I don't have a
> solution I have to go live with nautilus. 


Hi Manuel,


I had similar issues with Octopus and i am thus stuck with Nautilus.

Can you debug the slow ops and see if the slow ops are caused by the status 
"waiting for readable".

I suspected that it has something to do with the new feature in Octopus to read 
from all OSDs regardless if

they are master for a PG or not.


Can you also verify that osd_op_queue_cut_off is set to high and that icmp rate 
limiting is disabled on your hosts?


Peter


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nautilus: rbd ls returns ENOENT for some images

2021-06-10 Thread Peter Lieven
Am 09.06.21 um 13:52 schrieb Ilya Dryomov:
> On Wed, Jun 9, 2021 at 1:36 PM Peter Lieven  wrote:
>> Am 09.06.21 um 13:28 schrieb Ilya Dryomov:
>>> On Wed, Jun 9, 2021 at 11:24 AM Peter Lieven  wrote:
 Hi,


 we currently run into an issue where a rbd ls for a namespace returns 
 ENOENT for some of the images in that namespace.


 /usr/bin/rbd --conf=XXX --id XXX ls 
 'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' -l --format=json
 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
 0x55ca2390 fail: (2) No such file or directory
 2021-06-09 11:03:34.916 7f2225ffb700 -1 librbd::io::AioCompletion: 
 0x55caccd2b920 fail: (2) No such file or directory
 2021-06-09 11:03:34.920 7f2225ffb700 -1 librbd::io::AioCompletion: 
 0x55caccd9b4e0 fail: (2) No such file or directory
 rbd: error opening 34810ac2-3112-4fef-938c-b76338b0eeaf.raw: (2) No such 
 file or directory
 rbd: error opening c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw: (2) No such 
 file or directory
 rbd: error opening 5d5251d1-f017-4382-845c-65e504683742.raw: (2) No such 
 file or directory
 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
 0x55cacce07b00 fail: (2) No such file or directory
 rbd: error opening c625b898-ec34-4446-9455-d2b70d9e378f.raw: (2) No such 
 file or directory
 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
 0x55caccd7cce0 fail: (2) No such file or directory
 rbd: error opening 990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw: (2) No such 
 file or directory
 2021-06-09 11:03:34.924 7f2225ffb700 -1 librbd::io::AioCompletion: 
 0x55cacce336f0 fail: (2) No such file or directory
 rbd: error opening 7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw: (2) No such 
 file or directory
 [{"image":"108600c6-2312-4d61-9f5b-35b351112512.raw","size":3145728,"format":2,"lock_type":"exclusive"},{"image":"1292ef0c-2333-44f1-be30-39105f7d176e.raw","size":262149242880,"format":2,"lock_type":"exclusive"},{"image":"8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","size":262149242880,"format":2,"lock_type":"exclusive"}]
 rbd: listing images failed: (2) No such file or directory


 The way to trigger this state was that the images which show "No such file 
 or directory" were deleted with rbd rm, but the operation was interrupted 
 (rbd process was killed) due to a timeout.

 What is the best way to recover from this and how to properly clean up?


 Release is nautilus 14.2.20
>>> Hi Peter,
>>>
>>> Does "rbd ls" without "-l" succeed?
>>
>> Yes, it does:
>>
>>
>> /usr/bin/rbd --conf=XXX --id XXX ls 
>> 'mypool/28ef9470-76eb-4f77-bc1b-99077764ff7c' --format=json
>>
>>  
>> ["108600c6-2312-4d61-9f5b-35b351112512.raw","1292ef0c-2333-44f1-be30-39105f7d176e.raw","8cda5c3f-cdbd-42f4-918f-1480354e7965.raw","34810ac2-3112-4fef-938c-b76338b0eeaf.raw","c9882583-6dd5-4eca-bb82-3e81f7d63fa9.raw","5d5251d1-f017-4382-845c-65e504683742.raw","c625b898-ec34-4446-9455-d2b70d9e378f.raw","990c4bbe-6a7b-4adf-aab8-432e18d79e58.raw","7382eb5b-a3eb-41e2-89b6-512f7b1d86c0.raw"]
> I think simply re-running interrupted "rbd rm" commands would work and
> clean up properly.


That worked.


Thank you,

Peter


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph and openstack throttling experience

2021-06-10 Thread David Caro
We have a similar setup, way smaller though (~120 osds right now) :)

We have different capped VMs, but most have 500 write, 1000 read iops cap, you 
can see it in effect here:
https://cloud-ceph-performance-tests.toolforge.org/

We are currently running Octopus v15.2.11.

It's a very 'bare' ui (under construction), but check the 
'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests that hit 
the cap.

From there you can also see the numbers of the tests running uncapped (in the 
'rbd_from_hypervisor' or 'rbd_from_osd'
suites).

You can see the current iops of our ceph cluster here:
https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1

Of our openstack setup:
https://grafana.wikimedia.org/d/00579/wmcs-openstack-eqiad1?orgId=1&refresh=15m

And some details on the traffic openstck puts on each ceph osd host here:
https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1&refresh=5m

We are working on revamping those graphs right now, so it might become easier 
to see numbers in a few weeks.


We don't usually see slow ops with the current load, though we recommend not 
using ceph for very latency sensitive VMs
(like etcd), as on the network layer there's some hardware limits we can't 
remove right now.

Hope that helps.

On 06/10 10:54, Marcel Kuiper wrote:
> Hi
> 
> We're running ceph nautilus 14.2.21 (going to octopus latest in a few weeks)
> as volume and instance backend for our openstack vm's. Our clusters run
> somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal and db
> device
> 
> Currently we do not have our vm's capped on iops and throughput. We
> regularly get slowops warnings (once or twice per day) and wonder whether
> there are more users with sort of the same setup that do throttle their
> openstack vm's.
> 
> - What kind of numbers are used in the field for IOPS and throughput
> limiting?
> 
> - As a side question, is there an easy way to get rid of the slowops warning
> besides restarting the involved osd. Otherwise the warning seems to stay
> forever
> 
> Regards
> 
> Marcel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation 
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus - How to customize the Grafana configuration

2021-06-10 Thread Ralph Soika

Hi,

thanks a lot for this hint! Yes I now can edit the file and restart the 
grafana host with


# ceph orch stop grafana
# ceph orch start grafana

And the new configuration is used.

What I expected was, that I can define a different path on my host like 
/home/grafana.ini  that ceph will fetch during startup. But  this seems 
to be impossible. You need to:


1. start the grafana with
   # ceph orch apply grafana 1
2. edit the file
   /var/lib/ceph//grafana.host1/etc/grafana/grafana.ini
3. restart grafana with
   # ceph orch stop grafana
   # ceph orch start grafana


Thanks for your help


===

Ralph


On 10.06.21 09:31, Eugen Block wrote:

Hi,

you can edit the config file 
/var/lib/ceph//grafana.host1/etc/grafana/grafana.ini (created by 
cephadm) and then restart the container. This works in my octopus lab 
environment.


Regards,
Eugen


Zitat von Ralph Soika :


Hello,

I have installed and bootsraped a Ceph manager node via cephadm and 
the options:


    --initial-dashboard-user admin --initial-dashboard-password 
[PASSWORD] --dashboard-password-noupdate


Everything works fine. I also have the Grafana Board to monitor my 
cluster. But the access to Grafana is open for anonymous users 
because of the grafana.ini template with the option:


[auth.anonymous]
enabled = true


I can't figure out how to tweak the default grafana.ini file. Can 
someone help me how to do this?



I tried to do this with the command:

# ceph config-key set mgr/cephadm/services/grafana/grafana.ini \
  -i /tmp//grafana.ini.j2

# ceph orch reconfig grafana

But without any effect. I also did not really understand where I 
should place the grafana.ini file on my Host?


Thanks for any help

===
Ralph

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: delete stray OSD daemon after replacing disk

2021-06-10 Thread mabi
Small correction in my mail below, I meant to say Octopus and not Nautilus, so 
I am running ceph 15.2.13.


‐‐‐ Original Message ‐‐‐
On Wednesday, June 9, 2021 2:25 PM, mabi  wrote:

> Hello,
>
> I replaced an OSD disk on one of my Nautilus OSD node which created a new osd 
> number. Now ceph shows that there is one cephadm stray daemon (the old OSD #1 
> which I replaced) and which I can't remove as you can see below:
>
> ceph health detail
>
> ===
>
> HEALTH_WARN 1 stray daemon(s) not managed by cephadm
> [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
> stray daemon osd.1 on host ceph1e not managed by cephadm
>
> ceph orch daemon rm osd.1 --force
>
> ==
>
> Error EINVAL: Unable to find daemon(s) ['osd.1']
>
> Is there another command I am missing?
>
> Best regards,
> Mabi

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: delete stray OSD daemon after replacing disk

2021-06-10 Thread mabi
Thanks Eugen for your answer. I saw it a bit late because from the ceph manager 
web interface I managed to get rid of that OSD by doing the "Windows" way and 
click on the purge option. That worked.

So I suppose here that your command "ceph osd purge" must have worked, I just 
did not find this command in the documentation.

Strange that this happened, I just replaced one OSD disk with a bigger one and 
used the "ceph orch osd rm 1 --replace" for that purpose. Not quite sure if 
this is the right way to do it.


‐‐‐ Original Message ‐‐‐
On Thursday, June 10, 2021 8:44 AM, Eugen Block  wrote:

> Can you share your 'ceph osd tree'?
> You can remove the stray osd "old school" with 'ceph osd purge 1
> [--force]' if you're really sure.
>
> Zitat von mabi m...@protonmail.ch:
>
> > Small correction in my mail below, I meant to say Octopus and not
> > Nautilus, so I am running ceph 15.2.13.
> > ‐‐‐ Original Message ‐‐‐
> > On Wednesday, June 9, 2021 2:25 PM, mabi m...@protonmail.ch wrote:
> >
> > > Hello,
> > > I replaced an OSD disk on one of my Nautilus OSD node which created
> > > a new osd number. Now ceph shows that there is one cephadm stray
> > > daemon (the old OSD #1 which I replaced) and which I can't remove
> > > as you can see below:
> > > ceph health detail
> > > ===
> > > HEALTH_WARN 1 stray daemon(s) not managed by cephadm
> > > [WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
> > > stray daemon osd.1 on host ceph1e not managed by cephadm
> > > ceph orch daemon rm osd.1 --force
> > > ==
> > > Error EINVAL: Unable to find daemon(s) ['osd.1']
> > > Is there another command I am missing?
> > > Best regards,
> > > Mabi
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Octopus - How to customize the Grafana configuration

2021-06-10 Thread Eugen Block
You could play with the grafana-cli and see if it brings you anywhere.  
You seem to be able to override the default config file/directory, but  
I haven't tried that myself yet:


---snip---
host1:~ # podman exec c8167ed2efde grafana-cli --help
NAME:
   Grafana CLI - A new cli application

USAGE:
   grafana-cli [global options] command [command options] [arguments...]

VERSION:
   7.0.3

AUTHOR:
   Grafana Project 

COMMANDS:
   plugins  Manage plugins for grafana
   adminGrafana admin commands
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --pluginsDir value   Path to the Grafana plugin directory  
(default: "/var/lib/grafana/plugins") [$GF_PLUGIN_DIR]
   --repo value URL to the plugin repository (default:  
"https://grafana.com/api/plugins";) [$GF_PLUGIN_REPO]
   --pluginUrl valueFull url to the plugin zip file instead  
of downloading the plugin from grafana.com/api [$GF_PLUGIN_URL]

   --insecure   Skip TLS verification (insecure) (default: false)
   --debug  Enable debug logging (default: false)
   --configOverrides value  Configuration options to override  
defaults as a string. e.g. cfg:default.paths.log=/dev/null
   --homepath value Path to Grafana install/home path,  
defaults to working directory

   --config value   Path to config file
   --help, -h   show help (default: false)
   --version, -vprint the version (default: false)
---snip---



Zitat von Ralph Soika :


Hi,

thanks a lot for this hint! Yes I now can edit the file and restart  
the grafana host with


# ceph orch stop grafana
# ceph orch start grafana

And the new configuration is used.

What I expected was, that I can define a different path on my host  
like /home/grafana.ini  that ceph will fetch during startup. But   
this seems to be impossible. You need to:


1. start the grafana with
   # ceph orch apply grafana 1
2. edit the file
   /var/lib/ceph//grafana.host1/etc/grafana/grafana.ini
3. restart grafana with
   # ceph orch stop grafana
   # ceph orch start grafana


Thanks for your help


===

Ralph


On 10.06.21 09:31, Eugen Block wrote:

Hi,

you can edit the config file  
/var/lib/ceph//grafana.host1/etc/grafana/grafana.ini (created  
by cephadm) and then restart the container. This works in my  
octopus lab environment.


Regards,
Eugen


Zitat von Ralph Soika :


Hello,

I have installed and bootsraped a Ceph manager node via cephadm  
and the options:


    --initial-dashboard-user admin --initial-dashboard-password  
[PASSWORD] --dashboard-password-noupdate


Everything works fine. I also have the Grafana Board to monitor my  
cluster. But the access to Grafana is open for anonymous users  
because of the grafana.ini template with the option:


[auth.anonymous]
enabled = true


I can't figure out how to tweak the default grafana.ini file. Can  
someone help me how to do this?



I tried to do this with the command:

# ceph config-key set mgr/cephadm/services/grafana/grafana.ini \
  -i /tmp//grafana.ini.j2

# ceph orch reconfig grafana

But without any effect. I also did not really understand where I  
should place the grafana.ini file on my Host?


Thanks for any help

===
Ralph

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Creating a role in another tenant seems to be possible

2021-06-10 Thread Daniel Iwan
Hi Pritha

y answers inline.
Forgot to add I'm on Ceph 1.2.1


> How did you check whether the role was created in tenant1 or tenant2?
> It shouldn't be created in tenant2, if it is, then it's a bug, please open
> a tracker issue for it.
>

I checked that with
radosgw-admin role list --tenant tenant1

Example commands with output
User creating roles has in this case roles:* capability.

When creating without tenant prefix role is created in the tenant user
belongs to

aws --profile=user-from-tenant1 --endpoint=$HOST_S3_API --region="" iam
create-role --role-name=TemporaryRole --assume-role-policy-document
file://json/trust-policy-assume-role.json

{
"Role": {
"Path": "/",
"RoleName": "TemporaryRole",
"RoleId": "507f990e-46cd-418c-ad4e-cc59276500dc",
"Arn": "arn:aws:iam::tenant1:role/TemporaryRole",
"CreateDate": "2021-06-10T11:17:15.638000+00:00",
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": [
"arn:aws:iam:::oidc-provider/
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1"
]
},
"Action": [
"sts:AssumeRoleWithWebIdentity"
],
"Condition": {
"StringEquals": {
"
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1:app_id": "account"
}
}
}
]
},
"MaxSessionDuration": 3600
}
}

root@:~# radosgw-admin role list --tenant tenant1
[
{
"RoleId": "507f990e-46cd-418c-ad4e-cc59276500dc",
"RoleName": "TemporaryRole",
"Path": "/",
"Arn": "arn:aws:iam::tenant1:role/TemporaryRole",
"CreateDate": "2021-06-10T11:17:15.638Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument":
"{\n\t\"Version\":\"2012-10-17\",\n\t\"Statement\":[\n\t\t{\n\t\t\t\"Effect\":\"Allow\",\n\t\t\t\"Principal\":{\n\t\t\t\t\"Federated\":[\n\t\t\t\t\t\"arn:aws:iam:::oidc-provider/
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1\
"\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"Action\":[\n\t\t\t\t\"sts:AssumeRoleWithWebIdentity\"\n\t\t\t],\n\t\t\t\"Condition\":{\n\t\t\t\t\"StringEquals\":{\n\t\t\t\t\t\"
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1:app_id\
":\"account\"\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t]\n}"
}
]

then created with another tenant name

aws --profile=user-from-tenant1 --endpoint=$HOST_S3_API --region="" iam
create-role --role-name="tenant2\$TemporaryRole"
--assume-role-policy-document file://json/trust-policy-assume-role.json
{
"Role": {
"Path": "/",
"RoleName": "TemporaryRole",
"RoleId": "9086dc3c-3654-465c-9524-dd60cee6ec09",
"Arn": "arn:aws:iam::tenant2:role/TemporaryRole",
"CreateDate": "2021-06-10T11:17:52.11+00:00",
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": [
"arn:aws:iam:::oidc-provider/
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1"
]
},
"Action": [
"sts:AssumeRoleWithWebIdentity"
],
"Condition": {
"StringEquals": {
"
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1:app_id": "account"
}
}
}
]
},
"MaxSessionDuration": 3600
}
}

root@:~# radosgw-admin role list --tenant tenant2
[
{
"RoleId": "9086dc3c-3654-465c-9524-dd60cee6ec09",
"RoleName": "TemporaryRole",
"Path": "/",
"Arn": "arn:aws:iam::tenant2:role/TemporaryRole",
"CreateDate": "2021-06-10T11:17:52.110Z",
"MaxSessionDuration": 3600,
"AssumeRolePolicyDocument":
"{\n\t\"Version\":\"2012-10-17\",\n\t\"Statement\":[\n\t\t{\n\t\t\t\"Effect\":\"Allow\",\n\t\t\t\"Principal\":{\n\t\t\t\t\"Federated\":[\n\t\t\t\t\t\"arn:aws:iam:::oidc-provider/
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1\
"\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"Action\":[\n\t\t\t\t\"sts:AssumeRoleWithWebIdentity\"\n\t\t\t],\n\t\t\t\"Condition\":{\n\t\t\t\t\"StringEquals\":{\n\t\t\t\t\t\"
localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1:app_id\
":\"account\"\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t]\n}"
}
]

Similarly, a federated user who assumes a role with iam:CreateRole
>> permission
>> can create an arbitrary role like below.
>>
>> aws --endpoint=$HOST_S3_API --region="" iam create-role
>> --role-name="tenant2\$Tempo

[ceph-users] Re: Creating a role in another tenant seems to be possible

2021-06-10 Thread Pritha Srivastava
Hi Daniel,

Yes, it looks like a bug in the way the role name is being parsed in the
code. Please open a tracker issue for the same, and I'll fix it when I can.

Thanks,
Pritha

On Thu, Jun 10, 2021 at 5:09 PM Daniel Iwan  wrote:

> Hi Pritha
>
> y answers inline.
> Forgot to add I'm on Ceph 1.2.1
>
>
>> How did you check whether the role was created in tenant1 or tenant2?
>> It shouldn't be created in tenant2, if it is, then it's a bug, please
>> open a tracker issue for it.
>>
>
> I checked that with
> radosgw-admin role list --tenant tenant1
>
> Example commands with output
> User creating roles has in this case roles:* capability.
>
> When creating without tenant prefix role is created in the tenant user
> belongs to
>
> aws --profile=user-from-tenant1 --endpoint=$HOST_S3_API --region="" iam
> create-role --role-name=TemporaryRole --assume-role-policy-document
> file://json/trust-policy-assume-role.json
>
> {
> "Role": {
> "Path": "/",
> "RoleName": "TemporaryRole",
> "RoleId": "507f990e-46cd-418c-ad4e-cc59276500dc",
> "Arn": "arn:aws:iam::tenant1:role/TemporaryRole",
> "CreateDate": "2021-06-10T11:17:15.638000+00:00",
> "AssumeRolePolicyDocument": {
> "Version": "2012-10-17",
> "Statement": [
> {
> "Effect": "Allow",
> "Principal": {
> "Federated": [
> "arn:aws:iam:::oidc-provider/
> localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1"
> ]
> },
> "Action": [
> "sts:AssumeRoleWithWebIdentity"
> ],
> "Condition": {
> "StringEquals": {
> "
> localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1:app_id": "account"
> }
> }
> }
> ]
> },
> "MaxSessionDuration": 3600
> }
> }
>
> root@:~# radosgw-admin role list --tenant tenant1
> [
> {
> "RoleId": "507f990e-46cd-418c-ad4e-cc59276500dc",
> "RoleName": "TemporaryRole",
> "Path": "/",
> "Arn": "arn:aws:iam::tenant1:role/TemporaryRole",
> "CreateDate": "2021-06-10T11:17:15.638Z",
> "MaxSessionDuration": 3600,
> "AssumeRolePolicyDocument":
> "{\n\t\"Version\":\"2012-10-17\",\n\t\"Statement\":[\n\t\t{\n\t\t\t\"Effect\":\"Allow\",\n\t\t\t\"Principal\":{\n\t\t\t\t\"Federated\":[\n\t\t\t\t\t\"arn:aws:iam:::oidc-provider/
> localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1\
> 
> "\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"Action\":[\n\t\t\t\t\"sts:AssumeRoleWithWebIdentity\"\n\t\t\t],\n\t\t\t\"Condition\":{\n\t\t\t\t\"StringEquals\":{\n\t\t\t\t\t\"
> localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1:app_id\
> 
> ":\"account\"\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t]\n}"
> }
> ]
>
> then created with another tenant name
>
> aws --profile=user-from-tenant1 --endpoint=$HOST_S3_API --region="" iam
> create-role --role-name="tenant2\$TemporaryRole"
> --assume-role-policy-document file://json/trust-policy-assume-role.json
> {
> "Role": {
> "Path": "/",
> "RoleName": "TemporaryRole",
> "RoleId": "9086dc3c-3654-465c-9524-dd60cee6ec09",
> "Arn": "arn:aws:iam::tenant2:role/TemporaryRole",
> "CreateDate": "2021-06-10T11:17:52.11+00:00",
> "AssumeRolePolicyDocument": {
> "Version": "2012-10-17",
> "Statement": [
> {
> "Effect": "Allow",
> "Principal": {
> "Federated": [
> "arn:aws:iam:::oidc-provider/
> localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1"
> ]
> },
> "Action": [
> "sts:AssumeRoleWithWebIdentity"
> ],
> "Condition": {
> "StringEquals": {
> "
> localhost.ceph-om-vm-node3.com:8443/auth/realms/tenant1:app_id": "account"
> }
> }
> }
> ]
> },
> "MaxSessionDuration": 3600
> }
> }
>
> root@:~# radosgw-admin role list --tenant tenant2
> [
> {
> "RoleId": "9086dc3c-3654-465c-9524-dd60cee6ec09",
> "RoleName": "TemporaryRole",
> "Path": "/",
> "Arn": "arn:aws:iam::tenant2:role/TemporaryRole",
> "CreateDate": "2021-06-10T11:17:52.110Z",
> "MaxSessionDuration": 3600,
> "AssumeRolePolicyDocument":
> "{\n\t\"Version\":\"2012-10-17\",\n\t\"Statement\":[\n\t\t{\n\t\t\t\"Effect\":\"Allow\",\n\t\t\t\"Principal\":{\n\t\t\t\t\"

[ceph-users] ceph and openstack throttling experience

2021-06-10 Thread Marcel Kuiper

Hi

We're running ceph nautilus 14.2.21 (going to octopus latest in a few 
weeks) as volume and instance backend for our openstack vm's. Our 
clusters run somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's 
as journal and db device


Currently we do not have our vm's capped on iops and throughput. We 
regularly get slowops warnings (once or twice per day) and wonder 
whether there are more users with sort of the same setup that do 
throttle their openstack vm's.


- What kind of numbers are used in the field for IOPS and throughput 
limiting?


- As a side question, is there an easy way to get rid of the slowops 
warning besides restarting the involved osd. Otherwise the warning seems 
to stay forever


Regards

Marcel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Month June Schedule Now Available

2021-06-10 Thread Mike Perez
Hi everyone,

We're about to start Ceph Month 2021 with Casey Bodley giving a RGW update!

Afterward we'll have two BoF discussions on:

9:30 ET / 15:30 CEST [BoF] Ceph in Research & Scientific Computing
[Kevin Hrpcek]

10:10 ET / 16:10 CEST [BoF] The go-ceph get together [John Mulligan]

Join us now on the stream:

https://bluejeans.com/908675367

On Tue, Jun 1, 2021 at 6:50 AM Mike Perez  wrote:
>
> Hi everyone,
>
> In ten minutes, join us for the start of the Ceph Month June event!
> The schedule and meeting link can be found on this etherpad:
>
> https://pad.ceph.com/p/ceph-month-june-2021
>
> On Tue, May 25, 2021 at 11:56 AM Mike Perez  wrote:
> >
> > Hi everyone,
> >
> > The Ceph Month June schedule is now available:
> >
> > https://pad.ceph.com/p/ceph-month-june-2021
> >
> > We have great sessions from component updates, performance best
> > practices, Ceph on different architectures, BoF sessions to get more
> > involved with working groups in the community, and more! You may also
> > leave open discussion topics for the listed talks that we'll get to
> > each Q/A portion.
> >
> > I will provide the video stream link on this thread and etherpad once
> > it's available. You can also add the Ceph community calendar, which
> > will have the Ceph Month sessions prefixed with "Ceph Month" to get
> > local timezone conversions.
> >
> > https://calendar.google.com/calendar/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0%40group.calendar.google.com
> >
> > Thank you to our speakers for taking the time to share with us all the
> > latest best practices and usage with Ceph!
> >
> > --
> > Mike Perez
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow ops at restarting OSDs (octopus)

2021-06-10 Thread Manuel Lausch
Hi Peter,

your suggestion pointed me to the right spot. 
I didn't know about the feature, that ceph will read from replica
PGs.

So on. I found two functions in the osd/PrimaryLogPG.cc:
"check_laggy" and "check_laggy_requeue". On both is first a check, if
the partners have the octopus features. if not, the function is
skipped. This explains the beginning of the problem after about the
half cluster was updated.

To verifiy this, I added "return true" in the first line of the
functions. The issue is gone with it. But
I don't know what problems this could trigger. I know, the root cause
is not fixed with it.
I think I will open a bug ticket with this knowlage.


osd_op_queue_cutoff is set to high
and a icmp rate limiting should not happen


Thanks
Manuel


On Thu, 10 Jun 2021 11:28:48 +0200
Peter Lieven  wrote:

> Am 10.06.21 um 11:08 schrieb Manuel Lausch:
> > Hi,
> >
> > has no one a idea what could cause this issue. Or how I could debug
> > it?
> >
> > In some days I have to go live with this cluster. If I don't have a
> > solution I have to go live with nautilus.   
> 
> 
> Hi Manuel,
> 
> 
> I had similar issues with Octopus and i am thus stuck with Nautilus.
> 
> Can you debug the slow ops and see if the slow ops are caused by the
> status "waiting for readable".
> 
> I suspected that it has something to do with the new feature in
> Octopus to read from all OSDs regardless if
> 
> they are master for a PG or not.
> 
> 
> Can you also verify that osd_op_queue_cut_off is set to high and that
> icmp rate limiting is disabled on your hosts?
> 
> 
> Peter
> 
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow ops at restarting OSDs (octopus)

2021-06-10 Thread Dan van der Ster
Hi,

At which point in the update procedure did you

ceph osd require-osd-release octopus

?

And are you sure it was set to nautilus before the update? (`ceph osd dump`
will show)

Cheers , Dan



On Thu, Jun 10, 2021, 5:45 PM Manuel Lausch  wrote:

> Hi Peter,
>
> your suggestion pointed me to the right spot.
> I didn't know about the feature, that ceph will read from replica
> PGs.
>
> So on. I found two functions in the osd/PrimaryLogPG.cc:
> "check_laggy" and "check_laggy_requeue". On both is first a check, if
> the partners have the octopus features. if not, the function is
> skipped. This explains the beginning of the problem after about the
> half cluster was updated.
>
> To verifiy this, I added "return true" in the first line of the
> functions. The issue is gone with it. But
> I don't know what problems this could trigger. I know, the root cause
> is not fixed with it.
> I think I will open a bug ticket with this knowlage.
>
>
> osd_op_queue_cutoff is set to high
> and a icmp rate limiting should not happen
>
>
> Thanks
> Manuel
>
>
> On Thu, 10 Jun 2021 11:28:48 +0200
> Peter Lieven  wrote:
>
> > Am 10.06.21 um 11:08 schrieb Manuel Lausch:
> > > Hi,
> > >
> > > has no one a idea what could cause this issue. Or how I could debug
> > > it?
> > >
> > > In some days I have to go live with this cluster. If I don't have a
> > > solution I have to go live with nautilus.
> >
> >
> > Hi Manuel,
> >
> >
> > I had similar issues with Octopus and i am thus stuck with Nautilus.
> >
> > Can you debug the slow ops and see if the slow ops are caused by the
> > status "waiting for readable".
> >
> > I suspected that it has something to do with the new feature in
> > Octopus to read from all OSDs regardless if
> >
> > they are master for a PG or not.
> >
> >
> > Can you also verify that osd_op_queue_cut_off is set to high and that
> > icmp rate limiting is disabled on your hosts?
> >
> >
> > Peter
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph and openstack throttling experience

2021-06-10 Thread Marcel Kuiper

Hi David,

That is very helpful thank you. When looking at the graphs I notice that 
the bandwidth used looks as if this is very low. Or am I misinterpreting 
the bandwidth graphs?


Regards

Marcel

David Caro schreef op 2021-06-10 11:49:

We have a similar setup, way smaller though (~120 osds right now) :)

We have different capped VMs, but most have 500 write, 1000 read iops
cap, you can see it in effect here:
https://cloud-ceph-performance-tests.toolforge.org/

We are currently running Octopus v15.2.11.

It's a very 'bare' ui (under construction), but check the
'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests
that hit the cap.

From there you can also see the numbers of the tests running uncapped
(in the 'rbd_from_hypervisor' or 'rbd_from_osd'
suites).

You can see the current iops of our ceph cluster here:
https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1

Of our openstack setup:
https://grafana.wikimedia.org/d/00579/wmcs-openstack-eqiad1?orgId=1&refresh=15m

And some details on the traffic openstck puts on each ceph osd host 
here:

https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1&refresh=5m

We are working on revamping those graphs right now, so it might become
easier to see numbers in a few weeks.


We don't usually see slow ops with the current load, though we
recommend not using ceph for very latency sensitive VMs
(like etcd), as on the network layer there's some hardware limits we
can't remove right now.

Hope that helps.

On 06/10 10:54, Marcel Kuiper wrote:

Hi

We're running ceph nautilus 14.2.21 (going to octopus latest in a few 
weeks)
as volume and instance backend for our openstack vm's. Our clusters 
run
somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal 
and db

device

Currently we do not have our vm's capped on iops and throughput. We
regularly get slowops warnings (once or twice per day) and wonder 
whether
there are more users with sort of the same setup that do throttle 
their

openstack vm's.

- What kind of numbers are used in the field for IOPS and throughput
limiting?

- As a side question, is there an easy way to get rid of the slowops 
warning
besides restarting the involved osd. Otherwise the warning seems to 
stay

forever

Regards

Marcel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow ops at restarting OSDs (octopus)

2021-06-10 Thread Manuel Lausch
Hi Dan,

The cluster was initialy deployed with nautilus (14.2.20). I am sure
require-osd-release was nautilus at this point.
I did set this to octopus, after all components was updatated.


Manuel



On Thu, 10 Jun 2021 17:54:49 +0200
Dan van der Ster  wrote:

> Hi,
> 
> At which point in the update procedure did you
> 
> ceph osd require-osd-release octopus
> 
> ?
> 
> And are you sure it was set to nautilus before the update? (`ceph osd
> dump` will show)
> 
> Cheers , Dan
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph and openstack throttling experience

2021-06-10 Thread David Caro
On 06/10 14:05, Marcel Kuiper wrote:
> Hi David,
> 
> That is very helpful thank you. When looking at the graphs I notice that the
> bandwidth used looks as if this is very low. Or am I misinterpreting the
> bandwidth graphs?

Hey, sorry for the delay, something broke :)

What graphs in specific are you looking?

> 
> Regards
> 
> Marcel
> 
> David Caro schreef op 2021-06-10 11:49:
> > We have a similar setup, way smaller though (~120 osds right now) :)
> > 
> > We have different capped VMs, but most have 500 write, 1000 read iops
> > cap, you can see it in effect here:
> > https://cloud-ceph-performance-tests.toolforge.org/
> > 
> > We are currently running Octopus v15.2.11.
> > 
> > It's a very 'bare' ui (under construction), but check the
> > 'after_ceph_upgrade_v2' for example, the 'vm_disk' suite, the
> > 'RunConfig(rw=randread, bs=4096, ioengine=libaio, iodepth=1)' or
> > 'RunConfig(rw=randwrite, bs=4096, ioengine=libaio, iodepth=1)' tests
> > that hit the cap.
> > 
> > From there you can also see the numbers of the tests running uncapped
> > (in the 'rbd_from_hypervisor' or 'rbd_from_osd'
> > suites).
> > 
> > You can see the current iops of our ceph cluster here:
> > https://grafana.wikimedia.org/d/7TjJENEWz/wmcs-ceph-eqiad-cluster-overview?orgId=1
> > 
> > Of our openstack setup:
> > https://grafana.wikimedia.org/d/00579/wmcs-openstack-eqiad1?orgId=1&refresh=15m
> > 
> > And some details on the traffic openstck puts on each ceph osd host
> > here:
> > https://grafana.wikimedia.org/d/wsoKtElZk/wmcs-ceph-eqiad-network-utilization?orgId=1&refresh=5m
> > 
> > We are working on revamping those graphs right now, so it might become
> > easier to see numbers in a few weeks.
> > 
> > 
> > We don't usually see slow ops with the current load, though we
> > recommend not using ceph for very latency sensitive VMs
> > (like etcd), as on the network layer there's some hardware limits we
> > can't remove right now.
> > 
> > Hope that helps.
> > 
> > On 06/10 10:54, Marcel Kuiper wrote:
> > > Hi
> > > 
> > > We're running ceph nautilus 14.2.21 (going to octopus latest in a
> > > few weeks)
> > > as volume and instance backend for our openstack vm's. Our clusters
> > > run
> > > somewhere between 500 - 1000 OSDs on SAS HDDs with NVMe's as journal
> > > and db
> > > device
> > > 
> > > Currently we do not have our vm's capped on iops and throughput. We
> > > regularly get slowops warnings (once or twice per day) and wonder
> > > whether
> > > there are more users with sort of the same setup that do throttle
> > > their
> > > openstack vm's.
> > > 
> > > - What kind of numbers are used in the field for IOPS and throughput
> > > limiting?
> > > 
> > > - As a side question, is there an easy way to get rid of the slowops
> > > warning
> > > besides restarting the involved osd. Otherwise the warning seems to
> > > stay
> > > forever
> > > 
> > > Regards
> > > 
> > > Marcel
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation 
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow ops at restarting OSDs (octopus)

2021-06-10 Thread Dan van der Ster
Ok sounds correct. This was the only thing that came to mind which might
explain your problem.

Cheers, Dan




On Thu, Jun 10, 2021, 6:06 PM Manuel Lausch  wrote:

> Hi Dan,
>
> The cluster was initialy deployed with nautilus (14.2.20). I am sure
> require-osd-release was nautilus at this point.
> I did set this to octopus, after all components was updatated.
>
>
> Manuel
>
>
>
> On Thu, 10 Jun 2021 17:54:49 +0200
> Dan van der Ster  wrote:
>
> > Hi,
> >
> > At which point in the update procedure did you
> >
> > ceph osd require-osd-release octopus
> >
> > ?
> >
> > And are you sure it was set to nautilus before the update? (`ceph osd
> > dump` will show)
> >
> > Cheers , Dan
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] stretched cluster or not, with mon in 3 DC and osds on 2 DC

2021-06-10 Thread aderumier
Hi,

I'm currently reading the documentation about stretched cluster,

I would like to known if it's needed or not with this kind of 3 dc
setup:


  3km (0.2ms) 
  DC1--DC2
30km(3ms) | |  30km (2-3ms)
  |DC3--


DC1 && DC2 are near each other, small latency. (0.2ms)
DC3 is at 30km with bigger latency. (2ms)
separated links between dc with different physical path

1 monitor on each dc
osd on DC1/DC2, with size=4

Cluster is full nvme or ssd, lowest latency is required for osd
replication.

Now, I really don't known if latency monitor at DC3 could have an
impact on osd read/write latency is this monitor elected ?

vs stretched cluster with osd only use local dc monitors ?

What is the advantage of stretch cluster here ?  (with good redudant
links between sites)









___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error on Ceph Dashboard

2021-06-10 Thread Ernesto Puerta
Hi Robert,

I just launched a 16.2.4 cluster and I don't reproduce that error. Could
please file a tracker in
https://tracker.ceph.com/projects/dashboard/issues/new and attach the mgr
logs and cluster details (e.g.: number of mgrs)?

Thanks!

Kind Regards,
Ernesto


On Thu, Jun 10, 2021 at 4:05 AM Robert W. Eckert 
wrote:

> Hi - this just started happening in the past few days using Ceph Pacific
> 16.2.4 via cephadmin (Podman containers)
> The dashboard is returning
>
> No active ceph-mgr instance is currently running the dashboard. A failover
> may be in progress. Retrying in 5 seconds...
>
> And ceph status returns
>
>   cluster:
> id: fe3a7cb0-69ca-11eb-8d45-c86000d08867
> health: HEALTH_WARN
> Module 'dashboard' has failed dependency: cannot import name
> 'AuthManager'
> clock skew detected on mon.cube
>
>   services:
> mon: 3 daemons, quorum story,cube,rhel1 (age 46h)
> mgr: cube.tvlgnp(active, since 47h), standbys: rhel1.zpzsjc,
> story.gffann
> mds: 2/2 daemons up, 1 standby
> osd: 13 osds: 13 up (since 46h), 13 in (since 46h)
> rgw: 3 daemons active (3 hosts, 1 zones)
>
>   data:
> volumes: 1/1 healthy
> pools:   11 pools, 497 pgs
> objects: 1.50M objects, 2.1 TiB
> usage:   6.2 TiB used, 32 TiB / 38 TiB avail
> pgs: 497 active+clean
>
>   io:
> client:   255 B/s rd, 2.7 KiB/s wr, 0 op/s rd, 0 op/s wr
>
> The only thing that has happened on the cluster was one of the servers was
> rebooted.  No configuration changes were performed
>
> Any suggestions?
>
> Thanks,
> rob
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error on Ceph Dashboard

2021-06-10 Thread Robert W. Eckert
Hi Ernesto – I couldn’t register for an account there, it was giving me a 503, 
but I think the issue is the deployed container. I managed to clean it up, but 
not 100% sure of the cause – I think it is the referenced, container – all of 
the unit.run files reference 
docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
 but I don’t see that sha digest against the ceph/ceph:v16.2.4 in docker.
To clean it up I did the following (assume servers are named a,b,c)
On each server, I ran podman pull docker.io/ceph/ceph:v16.2.4

On server a, which was running the manager, I did

ceph orch apply mon --placement='a.domain’
ceph orch apply mgr --placement='a.domain’


I didn’t expect any immediate miracles, but just wanted to isolate the issue.  
However when I did this, the dashboard started working again.  I then 
redeployed mgr and mon to all 3 servers, and things are back up

I then did applied the mon and mgr to all servers
ceph orch apply mon --placement='a.domain,b.domain, c.domain’
ceph orch apply mgr --placement='a.domain,b.domain, c.domain’

Things still worked,
So I removed a from the servers (to reset it)
ceph orch apply mon --placement='b.domain, c.domain’
ceph orch apply mgr --placement='b.domain, c.domain’

Finally to get all 3 back up:
ceph orch apply mon --placement='a.domain,b.domain, c.domain’
ceph orch apply mgr --placement='a.domain,b.domain, c.domain’


And I am up and running,
I am thinking the pull of the docker.io:/ceph/ceph:v16.2.4 is what did it- 
because there was a module downloaded on each server.  So I am not 100% sure 
the sha digest tag matches 16.2.4, but it is working again.

Thanks,
Rob

p.s. I do have an extracted image of the container before I did all of this if 
that would help.


From: Ernesto Puerta 
Sent: Thursday, June 10, 2021 2:44 PM
To: Robert W. Eckert 
Cc: ceph-users 
Subject: Re: [ceph-users] Error on Ceph Dashboard

Hi Robert,

I just launched a 16.2.4 cluster and I don't reproduce that error. Could please 
file a tracker in https://tracker.ceph.com/projects/dashboard/issues/new and 
attach the mgr logs and cluster details (e.g.: number of mgrs)?

Thanks!

Kind Regards,
Ernesto


On Thu, Jun 10, 2021 at 4:05 AM Robert W. Eckert 
mailto:r...@rob.eckert.name>> wrote:
Hi - this just started happening in the past few days using Ceph Pacific 16.2.4 
via cephadmin (Podman containers)
The dashboard is returning

No active ceph-mgr instance is currently running the dashboard. A failover may 
be in progress. Retrying in 5 seconds...

And ceph status returns

  cluster:
id: fe3a7cb0-69ca-11eb-8d45-c86000d08867
health: HEALTH_WARN
Module 'dashboard' has failed dependency: cannot import name 
'AuthManager'
clock skew detected on mon.cube

  services:
mon: 3 daemons, quorum story,cube,rhel1 (age 46h)
mgr: cube.tvlgnp(active, since 47h), standbys: rhel1.zpzsjc, story.gffann
mds: 2/2 daemons up, 1 standby
osd: 13 osds: 13 up (since 46h), 13 in (since 46h)
rgw: 3 daemons active (3 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   11 pools, 497 pgs
objects: 1.50M objects, 2.1 TiB
usage:   6.2 TiB used, 32 TiB / 38 TiB avail
pgs: 497 active+clean

  io:
client:   255 B/s rd, 2.7 KiB/s wr, 0 op/s rd, 0 op/s wr

The only thing that has happened on the cluster was one of the servers was 
rebooted.  No configuration changes were performed

Any suggestions?

Thanks,
rob
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] lib remoto in ubuntu

2021-06-10 Thread Alfredo Rezinovsky
I cannot enable cephadm because it cannot find remoto lib.

Even when I installed it using "pip3 install remoto" and then installed ir
from the deb package build from the git sources at
https://github.com/alfredodeza/remoto/

If I type "import remoto" in a python3 prompt it works.

-- 
Alfrenovsky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] suggestion for Ceph client network config

2021-06-10 Thread Götz Reinicke
Hi all

We get a new samba smb fileserver who mounts our cephfs for exporting some 
shares. What might be a good or better network setup for that server?

Should I configure two interfaces - one for the smb share export towards our 
workstations and desktops and one towards the ceph cluster?

Or would it be „ok“ for all traffic to be on one interface?

The server has 40G ports.

Thanks for your suggestions and feedback . Regards . Götz




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io