[ceph-users] Re: Thank you!

2020-07-21 Thread Olivier AUDRY

hello

ceph is the definitive solution for storage. That's all. 

I'm happy user since 2014 and I never lost any data. When I remember
how painfull was the firmware upgrade of emc, netapp, hp storage and
the time passed to recover lost data . Ceph is just amazing !

so many thx to you guys. Thx !!

oau


Le lundi 20 juillet 2020 à 19:35 +0200, Marc Roos a écrit :
>  
> 
> I agree, Thanks from me as well, I am also really impressed by this 
> storage solution as well as something like apache mesos. Those are
> the 
> most impressive technologies introduced and developed last 5(?)
> years.
> 
> 
> 
> -Original Message-
> To: ceph-users
> Cc: dhils...@performair.com
> Subject: [ceph-users] Re: Thank you!
> 
> If there was a “like” button, I would have just clicked that to keep 
> the list noise down. I have smaller operations and so my cluster
> goes 
> down a lot more often. I keep dreading my abuse of the cluster and
> it 
> just keeps coming back for more. 
> 
> Ceph really is amazing, and it’s hard to fully appreciate the
> efforts 
> of the team other than to repeat the same: Thank you!
> 
> Brian
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph (rhel) packages rebuilt without release change ?

2020-07-21 Thread SCHAER Frederic
Hi,

On a previously installed machine I get :

# rpm -qi ceph-selinux-14.2.10-0.el7.x86_64 |grep Build
Build Date  : Thu 25 Jun 2020 08:08:52 PM CEST
Build Host  : braggi01.front.sepia.ceph.com

# rpm -q --requires ceph-selinux-14.2.10-0.el7.x86_64 |grep selinux
libselinux-utils
selinux-policy-base >= 3.13.1-252.el7_7.6

On a VM setup the same way, but now failing the ceph install, this is what I 
get with the exact same repos (wich I sync from ceph ones) :

# repoquery -q --requires ceph-selinux-14.2.10-0.el7.x86_64 |grep selinux
libselinux-utils
selinux-policy-base >= 3.13.1-266.el7_8.1

# After OS update and then ceph install:
# rpm -qi ceph-selinux-14.2.10-0.el7.x86_64 |grep Build
Build Date  : Thu 09 Jul 2020 08:09:46 PM CEST
Build Host  : braggi03.front.sepia.ceph.com


You'll notice the requirements changed for selinux-policy-base  from 
3.13.1-252.el7_7.6 to 3.13.1-266.el7_8.1
And the build info changed too.
But the RPM package version and release did not change and I don't get it.

I would have assumed an rpm change would at least infer a minor release bump ?
Are the RPMs being rebuilt with the same versions/release but with different 
OSes version (frequently ??) ?

Thanks && Regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Dashboard and Firefox

2020-07-21 Thread Tiago Melo
Hi,

I have created an issue to track this: https://tracker.ceph.com/issues/46653

Could you please tell me which version of ceph are you using?
Thanks

-Original Message-
From: bioh...@yahoo.com  
Sent: 21 de julho de 2020 10:02
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph Dashboard and Firefox

Hi

I'm using Ceph Dashboard on Firefox on Macbook, and lately it has been hanging 
with "A web page is loading slowly - stop it or wait". Some pages load load, 
but some show that warning and stop loading.

I'm on "Firefox Extended Support release 68.10.0esr (64-bit)"

Anyone else seen this issue?
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] script for compiling and running the Ceph source code

2020-07-21 Thread Bobby
Hi,

I am trying to profile the number of invocations to a particular function
in  Ceph source code. I have instrumented the code with time functions.

Can someone please share the script for compiling and running the Ceph
source code? I am struggling with it. That would be great help !

BR
Bobby !
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC profile datastore usage - question

2020-07-21 Thread Igor Fedotov

Hi Steven,

IMO your statement about "not supporting higher block sizes" is too 
strong. From my experience excessive space usage for EC pools tend to 
depend on write access pattern, input block sizes and/or object sizes. 
Hence I'm pretty sure this issue isn't present/visible for every cluster 
with alloc size > 4K and EC pool enabled. E.g. one wouldn't notice it 
when the majority of objects are large enough and/or written just once.


But honestly I haven't performed that comprehensive investigation on the 
topic.


And perhaps you're right and we should cover the issue in the 
documentation. Please feel free to fire corresponding ticket in Ceph 
tracker...



Thanks,

Igor


On 7/20/2020 7:02 PM, Steven Pine wrote:

Hi Igor,

Given the patch histories and the rejection of the previous patch for 
the one in favor of defaulting to 4k block size, does this essentially 
mean ceph does not support higher block sizes when using erasure 
coding? Will the ceph project be updating their documentation and 
references to let everyone know that larger blocks don't interact with 
EC pools as intended?


Sincerely,

On Mon, Jul 20, 2020 at 9:06 AM Igor Fedotov > wrote:


Hi Mateusz,

I think you might be hit by:

https://tracker.ceph.com/issues/44213


This is fixed in upcoming Pacific release. Nautilus/Octopus
backport is
under discussion for now.


Thanks,

Igor

On 7/18/2020 8:35 AM, Mateusz Skała wrote:
> Hello Community,
> I would like to ask about help in explanation situation.
> There is Rados gateway with EC pool profile k=6 m=4. So it shoud
take
> something about 1.4 - 2.0 data usage  more from raw data if I’m
correct.
> Rados df shows me:
> 116 TiB used and WR 26 TiB
> Can You explain this? It is about 4.5*WR used data. Why?
> Regards
> Mateusz Skała
> ___
> ceph-users mailing list -- ceph-users@ceph.io

> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io




--
Steven Pine
webair.com 


*P* 516.938.4100 x

*E * steven.p...@webair.com 

 




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: script for compiling and running the Ceph source code

2020-07-21 Thread Bobby
And to put it more precisely, I would like to figure out how many times
this particular function is called during the execution of the program?

BR
Bobby !

On Tue, Jul 21, 2020 at 1:24 PM Bobby  wrote:

>
> Hi,
>
> I am trying to profile the number of invocations to a particular function
> in  Ceph source code. I have instrumented the code with time functions.
>
> Can someone please share the script for compiling and running the Ceph
> source code? I am struggling with it. That would be great help !
>
> BR
> Bobby !
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [ceph] [nautilus][ceph-ansible] - Dynamic bucket resharding problem

2020-07-21 Thread Erik Johansson


Hello!

I've run into a bit of an issue with one of our radosgw production clusters..

Setup is two radosgw nodes behind haproxy loadbalancing, which in turn are 
connected to the ceph cluster. Everything running 14.2.2 so Nautilus. It's tied 
to a openstack cluster, so keystone as authentication backend (should really 
matter though).

Today both rgw backends crashed. Checking logs it seems to be related to 
dynamic resharding of a bucket, causing Lock errors:

Logs snippet: https://pastebin.com/uBCnhinF

Checking 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021368.html 
(old), I performed a manual reshard of affected bucket with success 
(radosgw-admin bucket reshard --bucket="XXX/YYY" --num-shards=256)

Checking the metadata for bucket, it now correctly shows 256, up from 128.

HOWEVER, the dynamic resharding still kept happening and bringing down the 
backeds. I suspect it is because of the old reshard op hanging around when 
checking a `reshard list`: https://pastebin.com/dPChwBCT

As the resharding seems to have been successful when running manually, I now 
want to remove that reshard op, but can't, getting this 
https://pastebin.com/071kfAsa error when trying..

Right now I had to resort to setting rgw_dynamic_resharding = false in 
ceph.conf to stop the problem from occuring.

Ideas? 

Cheers
Erik

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Thank you!

2020-07-21 Thread Marc Roos


>> I'm happy user since 2014 and I never lost any data. When I remember
>> how painfull was the firmware upgrade of emc, netapp, hp storage and 
the 
>> time passed to recover lost data . Ceph is just amazing !

Interesting I always wondered how ceph compares to propriatary 
solutions. I am getting the impression that closed source environments 
will not survive in the long run. If you see how eg CERN is handling 
this 'bug of the year'. It just shows the value of large support base, 
and having access to detailed info like this lz4 patch.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] osd out vs crush reweight

2020-07-21 Thread Marcel Kuiper
Hi list,

I ran a test with marking an osd out versus setting its crush weight to 0.
I compared to what osds pages were send. The crush map has 3 rooms. This
is what happened.

On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
send to the following osds

NR PG's   OSD
  2   1
  1   4
  1   5
  1   6
  1   7
  2   8
  1   31
  1   34
  1   35
  1   56
  2   57
  1   58
  1   61
  1   83
  1   84
  1   88
  1   99
  1   100
  2   107
  1   114
  2   117
  1   118
  1   119
  1   121

All PG's were send to osds on other nodes in the same room, except for 1
PG on osd 114. I think this works as expected

Now I  marked the osd in and wait until all stabilized. Then I set the
crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
lowers the crush weight of the node so even less chances that PG's end up
on an osd of the same node. However the result are

NR PG's   OSD
  1   61
  1   83
  1   86
  3   108
  4   109
  5   110
  2   112
  5   113
  7   114
  5   115
  2   116

except for 3 PG's all other PG's ended up on an osd belonging to the same
node :-O. Is this expected behaviour? Can someone explain?? This is on
nautilus 14.2.8.

Thanks

Marcel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Dashboard and Firefox

2020-07-21 Thread biohazd
Hi

I'm using Ceph Dashboard on Firefox on Macbook, and lately it has been hanging 
with "A web page is loading slowly - stop it or wait". Some pages load load, 
but some show that warning and stop loading.

I'm on "Firefox Extended Support release 68.10.0esr (64-bit)"

Anyone else seen this issue?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Thank you!

2020-07-21 Thread biohazd
Thanks soo much to the Ceph Teams and Community, all your efforts are amazing 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Thank you!

2020-07-21 Thread Olivier AUDRY
Just sharing my xp :

- storing photos for photoways, now photobox in early 2000. A bug in
the HP storage enclosure earase all the raidgroup. 3 weeks to
recalculate all the thumbnail with a dedicated server specialized in
resizing images.

- little emc storage with something like 10 disk. 3 for the OS ... an
embed windows as I understood. And hundred of dollars per month to get
the performance monitoring. I use to be the monitoring guy for a UK
telco compagny

- the smart netapp  too smart for me.

propriatary solutions are too expensive, too magic and it's very
difficult to understand how it works, and pretty impossible to put
fingers in it.

So ceph is the definitive way of doing storage. Clearly. Many thx for
your great work.

Le mardi 21 juillet 2020 à 15:40 +0200, Marc Roos a écrit :
> > > I'm happy user since 2014 and I never lost any data. When I
> > > remember
> > > how painfull was the firmware upgrade of emc, netapp, hp storage
> > > and 
> the 
> > > time passed to recover lost data . Ceph is just amazing !
> 
> Interesting I always wondered how ceph compares to propriatary solutions. I 
> am getting the impression that closed source
> 
> environments 
> will not survive in the long run. If you see how eg CERN is handling 
> this 'bug of the year'. It just shows the value of large support
> base, 
> and having access to detailed info like this lz4 patch.
> 
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD memory leak?

2020-07-21 Thread Frank Schilder
Quick question: Is there a way to change the frequency of heap dumps? On this 
page http://goog-perftools.sourceforge.net/doc/heap_profiler.html a function 
HeapProfilerSetAllocationInterval() is mentioned, but no other way of 
configuring this. Is there a config parameter or a ceph daemon call to adjust 
this?

If not, can I change the dump path?

Its likely to overrun my log partition quickly if I cannot adjust either of the 
two.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 20 July 2020 15:19:05
To: Mark Nelson; Dan van der Ster
Cc: ceph-users
Subject: [ceph-users] Re: OSD memory leak?

Dear Mark,

thank you very much for the very helpful answers. I will raise 
osd_memory_cache_min, leave everything else alone and watch what happens. I 
will report back here.

Thanks also for raising this as an issue.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Mark Nelson 
Sent: 20 July 2020 15:08:11
To: Frank Schilder; Dan van der Ster
Cc: ceph-users
Subject: Re: [ceph-users] Re: OSD memory leak?

On 7/20/20 3:23 AM, Frank Schilder wrote:
> Dear Mark and Dan,
>
> I'm in the process of restarting all OSDs and could use some quick advice on 
> bluestore cache settings. My plan is to set higher minimum values and deal 
> with accumulated excess usage via regular restarts. Looking at the 
> documentation 
> (https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/), 
> I find the following relevant options (with defaults):
>
> # Automatic Cache Sizing
> osd_memory_target {4294967296} # 4GB
> osd_memory_base {805306368} # 768MB
> osd_memory_cache_min {134217728} # 128MB
>
> # Manual Cache Sizing
> bluestore_cache_meta_ratio {.4} # 40% ?
> bluestore_cache_kv_ratio {.4} # 40% ?
> bluestore_cache_kv_max {512 * 1024*1024} # 512MB
>
> Q1) If I increase osd_memory_cache_min, should I also increase 
> osd_memory_base by the same or some other amount?


osd_memory_base is a hint at how much memory the OSD could consume
outside the cache once it's reached steady state.  It basically sets a
hard cap on how much memory the cache will use to avoid over-committing
memory and thrashing when we exceed the memory limit. It's not necessary
to get it right, it just helps smooth things out by making the automatic
memory tuning less aggressive.  IE if you have a 2 GB memory target and
a 512MB base, you'll never assign more than 1.5GB to the cache on the
assumption that the rest of the OSD will eventually need 512MB to
operate even if it's not using that much right now.  I think you can
probably just leave it alone.  What you and Dan appear to be seeing is
that this number isn't static in your case but increases over time any
way.  Eventually I'm hoping that we can automatically account for more
and more of that memory by reading the data from the mempools.

> Q2) The cache ratio options are shown under the section "Manual Cache 
> Sizing". Do they also apply when cache auto tuning is enabled? If so, is it 
> worth changing these defaults for higher values of osd_memory_cache_min?


They actually do have an effect on the automatic cache sizing and
probably shouldn't only be under the manual section.  When you have the
automatic cache sizing enabled, those options will affect the "fair
share" values of the different caches at each cache priority level.  IE
at priority level 0, if both caches want more memory than is available,
those ratios will determine how much each cache gets.  If there is more
memory available than requested, each cache gets as much as they want
and we move on to the next priority level and do the same thing again.
So in this case the ratios end up being sort of more like fallback
settings for when you don't have enough memory to fulfill all cache
requests at a given priority level, but otherwise are not utilized until
we hit that limit.  The goal with this scheme is to make sure that "high
priority" items in each cache get first dibs at the memory even if it
might skew the ratios.  This might be things like rocksdb bloom filters
and indexes, or potentially very recent hot items in one cache vs very
old items in another cache.  The ratios become more like guidelines than
hard limits.


When you change to manual mode, you set an overall bluestore cache size
and each cache gets a flat percentage of it based on the ratios.  With
0.4/0.4 you will always have 40% for onode, 40% for omap, and 20% for
data even if one of those caches does not use all of it's memory.


>
> Many thanks for your help with this. I can't find answers to these questions 
> in the docs.
>
> There might be two reasons for high osd_map memory usage. One is, that our 
> OSDs seem to hold a large number of OSD maps:


I brought this up in our core team standup last week.  Not sure if
anyone has had time to look at it yet though.

[ceph-users] Re: osd out vs crush reweight

2020-07-21 Thread DHilsbos
Marcel;

Short answer; yes, it might be expected behavior.

PG placement is highly dependent on the cluster layout, and CRUSH rules.  So... 
Some clarifying questions.

What version of Ceph are you running?
How many nodes do you have?
How many pools do you have, and what are their failure domains?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com


-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 6:52 AM
To: ceph-users@ceph.io
Subject: [ceph-users] osd out vs crush reweight

Hi list,

I ran a test with marking an osd out versus setting its crush weight to 0.
I compared to what osds pages were send. The crush map has 3 rooms. This
is what happened.

On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
send to the following osds

NR PG's   OSD
  2   1
  1   4
  1   5
  1   6
  1   7
  2   8
  1   31
  1   34
  1   35
  1   56
  2   57
  1   58
  1   61
  1   83
  1   84
  1   88
  1   99
  1   100
  2   107
  1   114
  2   117
  1   118
  1   119
  1   121

All PG's were send to osds on other nodes in the same room, except for 1
PG on osd 114. I think this works as expected

Now I  marked the osd in and wait until all stabilized. Then I set the
crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
lowers the crush weight of the node so even less chances that PG's end up
on an osd of the same node. However the result are

NR PG's   OSD
  1   61
  1   83
  1   86
  3   108
  4   109
  5   110
  2   112
  5   113
  7   114
  5   115
  2   116

except for 3 PG's all other PG's ended up on an osd belonging to the same
node :-O. Is this expected behaviour? Can someone explain?? This is on
nautilus 14.2.8.

Thanks

Marcel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw, public and private access on the same cluster ?

2020-07-21 Thread Jean-Sebastien Landry
Hi everyone, we have a ceph cluster for object storage only, the rgws are 
accessible from the internet, and everything is ok.

Now, one of our team/client required that their data should not ever be 
accessible from the internet. 
In any case of security bug/breach/whatever, they want to limit the access to 
their data from the local network.

Before creating a second "private" cluster, is there a way to achieve this on 
our current "public" cluster?

Is a multi-zone without replication would help me with that?

A public rgws for public access on the "pub_zone", and a private rgws for 
private access on the "prv_zone"?

pubzone.rgw.buckets.data
prvzone.rgw.buckets.data

If the "public" rgws is hacked, without the access_key/secret_key of the 
private zone, is there any possibilities to access the private zone?

Does a multi-realms would help me to secure it more?

Any input would be really appreciated.

I don't want to put to much energy for false security and/or security by 
obscurity, 
so if these scenarios of multi-sites/multi-realms are useless, in a security 
point of view, please tell me. :-)

Thanks!
JS
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with OSD::osd_op_tp thread had timed out and other connected issues

2020-07-21 Thread Jan Pekař - Imatic

Hi Ben,

we are not using EC pool on that cluster.

OSD out behavior almost stopped when we solved memory issues (less memory 
allocated to OSD's).
Now we are not working on that cluster anymore so we have no other info about 
that problem.

Jan

On 20/07/2020 07.59, Benoît Knecht wrote:

Hi Jan,

Jan Pekař wrote:

Also I'm concerned, that this OSD restart caused data degradation and recovery 
- cluster should be clean immediately after OSD up when no
client was uploading/modifying data during my tests.

We're experiencing the same thing on our 14.2.10 cluster. After marking an OSD 
out, if it's briefly marked down (due to the missed heartbeats or because the 
daemon was manually restarted) the PGs that were still mapped on it disappear 
all at once, and we get degraded objects as a result.

In our case, those PGs belong to an EC pool, and we use the PG balancer in 
upmap mode, so we have a few upmapped PGs on that OSD. Is that the case for you 
too?

We're going to run some tests to try and better understand what's going on 
there, but we welcome any feedback in the meantime.

Cheers,

--
Ben


--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz | +420326555326

--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread Marcel Kuiper


Hi Dominic,

This cluster is running 14.2.8 (nautilus)
There's 172 osds divided over 19 nodes.
There are currently 10 pools.
All pools have 3 replica's of data
There are 3968 PG's (the cluster is not yet fully in use. The number of
PGs is expected to grow)

Marcel

> Marcel;
>
> Short answer; yes, it might be expected behavior.
>
> PG placement is highly dependent on the cluster layout, and CRUSH rules.
> So... Some clarifying questions.
>
> What version of Ceph are you running?
> How many nodes do you have?
> How many pools do you have, and what are their failure domains?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 6:52 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] osd out vs crush reweight
>
> Hi list,
>
> I ran a test with marking an osd out versus setting its crush weight to 0.
> I compared to what osds pages were send. The crush map has 3 rooms. This
> is what happened.
>
> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
> send to the following osds
>
> NR PG's   OSD
>   2   1
>   1   4
>   1   5
>   1   6
>   1   7
>   2   8
>   1   31
>   1   34
>   1   35
>   1   56
>   2   57
>   1   58
>   1   61
>   1   83
>   1   84
>   1   88
>   1   99
>   1   100
>   2   107
>   1   114
>   2   117
>   1   118
>   1   119
>   1   121
>
> All PG's were send to osds on other nodes in the same room, except for 1
> PG on osd 114. I think this works as expected
>
> Now I  marked the osd in and wait until all stabilized. Then I set the
> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
> lowers the crush weight of the node so even less chances that PG's end up
> on an osd of the same node. However the result are
>
> NR PG's   OSD
>   1   61
>   1   83
>   1   86
>   3   108
>   4   109
>   5   110
>   2   112
>   5   113
>   7   114
>   5   115
>   2   116
>
> except for 3 PG's all other PG's ended up on an osd belonging to the same
> node :-O. Is this expected behaviour? Can someone explain?? This is on
> nautilus 14.2.8.
>
> Thanks
>
> Marcel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

Thank you for the information.

Could you send the output of:
ceph osd crush rule dump

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 9:38 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: osd out vs crush reweight]


Hi Dominic,

This cluster is running 14.2.8 (nautilus)
There's 172 osds divided over 19 nodes.
There are currently 10 pools.
All pools have 3 replica's of data
There are 3968 PG's (the cluster is not yet fully in use. The number of
PGs is expected to grow)

Marcel

> Marcel;
>
> Short answer; yes, it might be expected behavior.
>
> PG placement is highly dependent on the cluster layout, and CRUSH rules.
> So... Some clarifying questions.
>
> What version of Ceph are you running?
> How many nodes do you have?
> How many pools do you have, and what are their failure domains?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 6:52 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] osd out vs crush reweight
>
> Hi list,
>
> I ran a test with marking an osd out versus setting its crush weight to 0.
> I compared to what osds pages were send. The crush map has 3 rooms. This
> is what happened.
>
> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
> send to the following osds
>
> NR PG's   OSD
>   2   1
>   1   4
>   1   5
>   1   6
>   1   7
>   2   8
>   1   31
>   1   34
>   1   35
>   1   56
>   2   57
>   1   58
>   1   61
>   1   83
>   1   84
>   1   88
>   1   99
>   1   100
>   2   107
>   1   114
>   2   117
>   1   118
>   1   119
>   1   121
>
> All PG's were send to osds on other nodes in the same room, except for 1
> PG on osd 114. I think this works as expected
>
> Now I  marked the osd in and wait until all stabilized. Then I set the
> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
> lowers the crush weight of the node so even less chances that PG's end up
> on an osd of the same node. However the result are
>
> NR PG's   OSD
>   1   61
>   1   83
>   1   86
>   3   108
>   4   109
>   5   110
>   2   112
>   5   113
>   7   114
>   5   115
>   2   116
>
> except for 3 PG's all other PG's ended up on an osd belonging to the same
> node :-O. Is this expected behaviour? Can someone explain?? This is on
> nautilus 14.2.8.
>
> Thanks
>
> Marcel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

Sorry, could also send the output of:
ceph osd tree

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: dhils...@performair.com [mailto:dhils...@performair.com] 
Sent: Tuesday, July 21, 2020 9:41 AM
To: c...@mknet.nl; ceph-users@ceph.io
Subject: [ceph-users] Re: osd out vs crush reweight]

Marcel;

Thank you for the information.

Could you send the output of:
ceph osd crush rule dump

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 9:38 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: osd out vs crush reweight]


Hi Dominic,

This cluster is running 14.2.8 (nautilus)
There's 172 osds divided over 19 nodes.
There are currently 10 pools.
All pools have 3 replica's of data
There are 3968 PG's (the cluster is not yet fully in use. The number of
PGs is expected to grow)

Marcel

> Marcel;
>
> Short answer; yes, it might be expected behavior.
>
> PG placement is highly dependent on the cluster layout, and CRUSH rules.
> So... Some clarifying questions.
>
> What version of Ceph are you running?
> How many nodes do you have?
> How many pools do you have, and what are their failure domains?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 6:52 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] osd out vs crush reweight
>
> Hi list,
>
> I ran a test with marking an osd out versus setting its crush weight to 0.
> I compared to what osds pages were send. The crush map has 3 rooms. This
> is what happened.
>
> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
> send to the following osds
>
> NR PG's   OSD
>   2   1
>   1   4
>   1   5
>   1   6
>   1   7
>   2   8
>   1   31
>   1   34
>   1   35
>   1   56
>   2   57
>   1   58
>   1   61
>   1   83
>   1   84
>   1   88
>   1   99
>   1   100
>   2   107
>   1   114
>   2   117
>   1   118
>   1   119
>   1   121
>
> All PG's were send to osds on other nodes in the same room, except for 1
> PG on osd 114. I think this works as expected
>
> Now I  marked the osd in and wait until all stabilized. Then I set the
> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
> lowers the crush weight of the node so even less chances that PG's end up
> on an osd of the same node. However the result are
>
> NR PG's   OSD
>   1   61
>   1   83
>   1   86
>   3   108
>   4   109
>   5   110
>   2   112
>   5   113
>   7   114
>   5   115
>   2   116
>
> except for 3 PG's all other PG's ended up on an osd belonging to the same
> node :-O. Is this expected behaviour? Can someone explain?? This is on
> nautilus 14.2.8.
>
> Thanks
>
> Marcel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread Marcel Kuiper
Dominic

The crush rule dump and tree are attached (hope that works). All pools use
crush_rule 1

Marcel

> Marcel;
>
> Sorry, could also send the output of:
> ceph osd tree
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: dhils...@performair.com [mailto:dhils...@performair.com]
> Sent: Tuesday, July 21, 2020 9:41 AM
> To: c...@mknet.nl; ceph-users@ceph.io
> Subject: [ceph-users] Re: osd out vs crush reweight]
>
> Marcel;
>
> Thank you for the information.
>
> Could you send the output of:
> ceph osd crush rule dump
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 9:38 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: osd out vs crush reweight]
>
>
> Hi Dominic,
>
> This cluster is running 14.2.8 (nautilus)
> There's 172 osds divided over 19 nodes.
> There are currently 10 pools.
> All pools have 3 replica's of data
> There are 3968 PG's (the cluster is not yet fully in use. The number of
> PGs is expected to grow)
>
> Marcel
>
>> Marcel;
>>
>> Short answer; yes, it might be expected behavior.
>>
>> PG placement is highly dependent on the cluster layout, and CRUSH rules.
>> So... Some clarifying questions.
>>
>> What version of Ceph are you running?
>> How many nodes do you have?
>> How many pools do you have, and what are their failure domains?
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>> -Original Message-
>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>> Sent: Tuesday, July 21, 2020 6:52 AM
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] osd out vs crush reweight
>>
>> Hi list,
>>
>> I ran a test with marking an osd out versus setting its crush weight to
>> 0.
>> I compared to what osds pages were send. The crush map has 3 rooms. This
>> is what happened.
>>
>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's were
>> send to the following osds
>>
>> NR PG's   OSD
>>   2   1
>>   1   4
>>   1   5
>>   1   6
>>   1   7
>>   2   8
>>   1   31
>>   1   34
>>   1   35
>>   1   56
>>   2   57
>>   1   58
>>   1   61
>>   1   83
>>   1   84
>>   1   88
>>   1   99
>>   1   100
>>   2   107
>>   1   114
>>   2   117
>>   1   118
>>   1   119
>>   1   121
>>
>> All PG's were send to osds on other nodes in the same room, except for 1
>> PG on osd 114. I think this works as expected
>>
>> Now I  marked the osd in and wait until all stabilized. Then I set the
>> crush weight to 0. ceph osd crush reweight osd.111 0. I thought this
>> lowers the crush weight of the node so even less chances that PG's end
>> up
>> on an osd of the same node. However the result are
>>
>> NR PG's   OSD
>>   1   61
>>   1   83
>>   1   86
>>   3   108
>>   4   109
>>   5   110
>>   2   112
>>   5   113
>>   7   114
>>   5   115
>>   2   116
>>
>> except for 3 PG's all other PG's ended up on an osd belonging to the
>> same
>> node :-O. Is this expected behaviour? Can someone explain?? This is on
>> nautilus 14.2.8.
>>
>> Thanks
>>
>> Marcel
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

To answer your question, I don't see anything that would be keeping these PGs 
on the same node.  Someone with more knowledge of how the Crush rules are 
applied, and the code around these operations, would need to weigh in.

I am somewhat curious though; you define racks, and even rooms in your tree, 
but your failure domain is set to host.  Is that intentional?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 10:14 AM
To: ceph-users@ceph.io
Cc: Dominic Hilsbos
Subject: Re: [ceph-users] Re: osd out vs crush reweight]

Dominic

The crush rule dump and tree are attached (hope that works). All pools use 
crush_rule 1

Marcel

> Marcel;
>
> Sorry, could also send the output of:
> ceph osd tree
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: dhils...@performair.com [mailto:dhils...@performair.com]
> Sent: Tuesday, July 21, 2020 9:41 AM
> To: c...@mknet.nl; ceph-users@ceph.io
> Subject: [ceph-users] Re: osd out vs crush reweight]
>
> Marcel;
>
> Thank you for the information.
>
> Could you send the output of:
> ceph osd crush rule dump
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 9:38 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: osd out vs crush reweight]
>
>
> Hi Dominic,
>
> This cluster is running 14.2.8 (nautilus) There's 172 osds divided 
> over 19 nodes.
> There are currently 10 pools.
> All pools have 3 replica's of data
> There are 3968 PG's (the cluster is not yet fully in use. The number 
> of PGs is expected to grow)
>
> Marcel
>
>> Marcel;
>>
>> Short answer; yes, it might be expected behavior.
>>
>> PG placement is highly dependent on the cluster layout, and CRUSH rules.
>> So... Some clarifying questions.
>>
>> What version of Ceph are you running?
>> How many nodes do you have?
>> How many pools do you have, and what are their failure domains?
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>> -Original Message-
>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>> Sent: Tuesday, July 21, 2020 6:52 AM
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] osd out vs crush reweight
>>
>> Hi list,
>>
>> I ran a test with marking an osd out versus setting its crush weight 
>> to 0.
>> I compared to what osds pages were send. The crush map has 3 rooms. 
>> This is what happened.
>>
>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's 
>> were send to the following osds
>>
>> NR PG's   OSD
>>   2   1
>>   1   4
>>   1   5
>>   1   6
>>   1   7
>>   2   8
>>   1   31
>>   1   34
>>   1   35
>>   1   56
>>   2   57
>>   1   58
>>   1   61
>>   1   83
>>   1   84
>>   1   88
>>   1   99
>>   1   100
>>   2   107
>>   1   114
>>   2   117
>>   1   118
>>   1   119
>>   1   121
>>
>> All PG's were send to osds on other nodes in the same room, except 
>> for 1 PG on osd 114. I think this works as expected
>>
>> Now I  marked the osd in and wait until all stabilized. Then I set 
>> the crush weight to 0. ceph osd crush reweight osd.111 0. I thought 
>> this lowers the crush weight of the node so even less chances that 
>> PG's end up on an osd of the same node. However the result are
>>
>> NR PG's   OSD
>>   1   61
>>   1   83
>>   1   86
>>   3   108
>>   4   109
>>   5   110
>>   2   112
>>   5   113
>>   7   114
>>   5   115
>>   2   116
>>
>> except for 3 PG's all other PG's ended up on an osd belonging to the 
>> same node :-O. Is this expected behaviour? Can someone explain?? This 
>> is on nautilus 14.2.8.
>>
>> Thanks
>>
>> Marcel
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io
>>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io 
> ___
> ceph-users mail

[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread Marcel Kuiper
Hi Dominiq

I must say that I inherited this cluster and did not develop the cursh
rule used. The rule reads:

"rule_id": 1,
"rule_name": "hdd",
"ruleset": 1,
"type": 1,
"min_size": 2,
"max_size": 3,
"steps": [
{
"op": "take",
"item": -31,
"item_name": "DC3"
},
{
"op": "choose_firstn",
"num": 0,
"type": "room"
},
{
"op": "chooseleaf_firstn",
"num": 1,
"type": "host"
},

Doesn't that say it will choose DC3, then a room within DC3 and then a
host? (I agree that racks in the tree are superfluous, but it does not
harm either)

Anyway thanks for your effort. I hope someone else can explain why setting
the crushweight of an osd to 0 results in surprisingly much PG's going to
other osd;s on the same node instead of going to other nodes

Marcel

> Marcel;
>
> To answer your question, I don't see anything that would be keeping these
> PGs on the same node.  Someone with more knowledge of how the Crush rules
> are applied, and the code around these operations, would need to weigh in.
>
> I am somewhat curious though; you define racks, and even rooms in your
> tree, but your failure domain is set to host.  Is that intentional?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 10:14 AM
> To: ceph-users@ceph.io
> Cc: Dominic Hilsbos
> Subject: Re: [ceph-users] Re: osd out vs crush reweight]
>
> Dominic
>
> The crush rule dump and tree are attached (hope that works). All pools use
> crush_rule 1
>
> Marcel
>
>> Marcel;
>>
>> Sorry, could also send the output of:
>> ceph osd tree
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>>
>> -Original Message-
>> From: dhils...@performair.com [mailto:dhils...@performair.com]
>> Sent: Tuesday, July 21, 2020 9:41 AM
>> To: c...@mknet.nl; ceph-users@ceph.io
>> Subject: [ceph-users] Re: osd out vs crush reweight]
>>
>> Marcel;
>>
>> Thank you for the information.
>>
>> Could you send the output of:
>> ceph osd crush rule dump
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>>
>> -Original Message-
>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>> Sent: Tuesday, July 21, 2020 9:38 AM
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] Re: osd out vs crush reweight]
>>
>>
>> Hi Dominic,
>>
>> This cluster is running 14.2.8 (nautilus) There's 172 osds divided
>> over 19 nodes.
>> There are currently 10 pools.
>> All pools have 3 replica's of data
>> There are 3968 PG's (the cluster is not yet fully in use. The number
>> of PGs is expected to grow)
>>
>> Marcel
>>
>>> Marcel;
>>>
>>> Short answer; yes, it might be expected behavior.
>>>
>>> PG placement is highly dependent on the cluster layout, and CRUSH
>>> rules.
>>> So... Some clarifying questions.
>>>
>>> What version of Ceph are you running?
>>> How many nodes do you have?
>>> How many pools do you have, and what are their failure domains?
>>>
>>> Thank you,
>>>
>>> Dominic L. Hilsbos, MBA
>>> Director - Information Technology
>>> Perform Air International, Inc.
>>> dhils...@performair.com
>>> www.PerformAir.com
>>>
>>>
>>> -Original Message-
>>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>>> Sent: Tuesday, July 21, 2020 6:52 AM
>>> To: ceph-users@ceph.io
>>> Subject: [ceph-users] osd out vs crush reweight
>>>
>>> Hi list,
>>>
>>> I ran a test with marking an osd out versus setting its crush weight
>>> to 0.
>>> I compared to what osds pages were send. The crush map has 3 rooms.
>>> This is what happened.
>>>
>>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's
>>> were send to the following osds
>>>
>>> NR PG's   OSD
>>>   2   1
>>>   1   4
>>>   1   5
>>>   1   6
>>>   1   7
>>>   2   8
>>>   1   31
>>>   1   34
>>>   1   35
>>>   1   56
>>>   2   57
>>>   1   58
>>>   1   61
>>>   1   83
>>>   1   84
>>>   1   88
>>>   1   99
>>>   1   100
>>>   2   107
>>>   1   114
>>>   2   117
>>>   1   118
>>>   1   119
>>>   1   121
>>>
>>> All PG's were send to osds on other nodes in the same room, except
>>> for 1 PG on osd 114. I think this works as expected
>>>
>>> Now I  marked the osd in and wait until all stabilized. Then I set
>>> the crush weight to 0. ceph osd crush reweight osd.111 0. I thought
>>> this lowers the crush weight of the node so even less chances that

[ceph-users] Re: Module 'cephadm' has failed: auth get failed: failed to find client.crash.ceph0-ote in keyring retval:

2020-07-21 Thread zicherka
Hi!
I have, or actually had a similar problem.
I found the solution on this page:
https://segmentfault.com/a/119023292938
I used the commands:
> ceph auth add client.crash.nodeX.xxx.com mgr "profile crash" mon "profile 
> crash"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osd out vs crush reweight]

2020-07-21 Thread DHilsbos
Marcel;

Yep, you're right.  I focused in on the last op, and missed the ones above it.

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International, Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Marcel Kuiper [mailto:c...@mknet.nl] 
Sent: Tuesday, July 21, 2020 11:49 AM
To: ceph-users@ceph.io
Cc: Dominic Hilsbos
Subject: RE: [ceph-users] Re: osd out vs crush reweight]

Hi Dominiq

I must say that I inherited this cluster and did not develop the cursh
rule used. The rule reads:

"rule_id": 1,
"rule_name": "hdd",
"ruleset": 1,
"type": 1,
"min_size": 2,
"max_size": 3,
"steps": [
{
"op": "take",
"item": -31,
"item_name": "DC3"
},
{
"op": "choose_firstn",
"num": 0,
"type": "room"
},
{
"op": "chooseleaf_firstn",
"num": 1,
"type": "host"
},

Doesn't that say it will choose DC3, then a room within DC3 and then a
host? (I agree that racks in the tree are superfluous, but it does not
harm either)

Anyway thanks for your effort. I hope someone else can explain why setting
the crushweight of an osd to 0 results in surprisingly much PG's going to
other osd;s on the same node instead of going to other nodes

Marcel

> Marcel;
>
> To answer your question, I don't see anything that would be keeping these
> PGs on the same node.  Someone with more knowledge of how the Crush rules
> are applied, and the code around these operations, would need to weigh in.
>
> I am somewhat curious though; you define racks, and even rooms in your
> tree, but your failure domain is set to host.  Is that intentional?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> -Original Message-
> From: Marcel Kuiper [mailto:c...@mknet.nl]
> Sent: Tuesday, July 21, 2020 10:14 AM
> To: ceph-users@ceph.io
> Cc: Dominic Hilsbos
> Subject: Re: [ceph-users] Re: osd out vs crush reweight]
>
> Dominic
>
> The crush rule dump and tree are attached (hope that works). All pools use
> crush_rule 1
>
> Marcel
>
>> Marcel;
>>
>> Sorry, could also send the output of:
>> ceph osd tree
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>>
>> -Original Message-
>> From: dhils...@performair.com [mailto:dhils...@performair.com]
>> Sent: Tuesday, July 21, 2020 9:41 AM
>> To: c...@mknet.nl; ceph-users@ceph.io
>> Subject: [ceph-users] Re: osd out vs crush reweight]
>>
>> Marcel;
>>
>> Thank you for the information.
>>
>> Could you send the output of:
>> ceph osd crush rule dump
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director - Information Technology
>> Perform Air International, Inc.
>> dhils...@performair.com
>> www.PerformAir.com
>>
>>
>>
>> -Original Message-
>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>> Sent: Tuesday, July 21, 2020 9:38 AM
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] Re: osd out vs crush reweight]
>>
>>
>> Hi Dominic,
>>
>> This cluster is running 14.2.8 (nautilus) There's 172 osds divided
>> over 19 nodes.
>> There are currently 10 pools.
>> All pools have 3 replica's of data
>> There are 3968 PG's (the cluster is not yet fully in use. The number
>> of PGs is expected to grow)
>>
>> Marcel
>>
>>> Marcel;
>>>
>>> Short answer; yes, it might be expected behavior.
>>>
>>> PG placement is highly dependent on the cluster layout, and CRUSH
>>> rules.
>>> So... Some clarifying questions.
>>>
>>> What version of Ceph are you running?
>>> How many nodes do you have?
>>> How many pools do you have, and what are their failure domains?
>>>
>>> Thank you,
>>>
>>> Dominic L. Hilsbos, MBA
>>> Director - Information Technology
>>> Perform Air International, Inc.
>>> dhils...@performair.com
>>> www.PerformAir.com
>>>
>>>
>>> -Original Message-
>>> From: Marcel Kuiper [mailto:c...@mknet.nl]
>>> Sent: Tuesday, July 21, 2020 6:52 AM
>>> To: ceph-users@ceph.io
>>> Subject: [ceph-users] osd out vs crush reweight
>>>
>>> Hi list,
>>>
>>> I ran a test with marking an osd out versus setting its crush weight
>>> to 0.
>>> I compared to what osds pages were send. The crush map has 3 rooms.
>>> This is what happened.
>>>
>>> On ceph osd out 111 (first room; this node has osds 108 - 116) pg's
>>> were send to the following osds
>>>
>>> NR PG's   OSD
>>>   2   1
>>>   1   4
>>>   1   5
>>>   1   6
>>>   1   7
>>>   2   8
>>>   1   31
>>>   1   34
>>>   1   35
>>>   1   56
>>>   2   57
>>>   1   58
>>>   1   61
>>>   1   83
>>>   1   84
>>>   1   88
>>>   1   99
>>>   1   100
>>>   

[ceph-users] bluestore_prefer_deferred_size_hdd

2020-07-21 Thread Frank Ritchie
Hi all,

Is it safe to change bluestore_prefer_deferred_size_hdd for an OSD at runtime?

thx
Frank
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: script for compiling and running the Ceph source code

2020-07-21 Thread Yanhu Cao
Hi Bobby,

You can use systemtap script to get this, use Statistical Aggregates.

my systemtap github: https://github.com/gmayyyha/stap-tools

e.g.
func-latency.stp
-
@define BIN_MDS %( "/tiger/source/ceph/build/bin/ceph-mds" %)

global ms
global count
global latency

probe   process(@BIN_MDS).function("MDSDaemon::ms_dispatch2")
{
ms[tid(), ppfunc()] = gettimeofday_us();
count[tid(), ppfunc()] <<< 1;
}

probe   process(@BIN_MDS).function("MDSDaemon::ms_dispatch2").return
{
us = ms[tid(), ppfunc()];
latency[tid(), ppfunc()] <<< gettimeofday_us() - us;

ms[tid(), ppfunc()] = gettimeofday_us();
}

probe   timer.s(1), end
{
foreach ([tid, func] in latency) {
printf("TID: %d\tFUNC: %s\n", tid, func);
printf("count: %d\n", @count(count[tid, func]));
printf("latency:\n");
print(@hist_log(latency[tid, func]));
}

delete count;
delete latency;
}

output

TID: 18960 FUNC: MDSDaemon::ms_dispatch2
count: 3
latency:
value |-- count
   16 |   0
   32 |   0
   64 |@  1
  128 |@  1
  256 |   0
  512 |@  1
 1024 |   0
 2048 |   0

On Tue, Jul 21, 2020 at 8:01 PM Bobby  wrote:
>
>
> And to put it more precisely, I would like to figure out how many times this 
> particular function is called during the execution of the program?
>
> BR
> Bobby !
>
> On Tue, Jul 21, 2020 at 1:24 PM Bobby  wrote:
>>
>>
>> Hi,
>>
>> I am trying to profile the number of invocations to a particular function in 
>>  Ceph source code. I have instrumented the code with time functions.
>>
>> Can someone please share the script for compiling and running the Ceph 
>> source code? I am struggling with it. That would be great help !
>>
>> BR
>> Bobby !
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Help add node to cluster using cephadm

2020-07-21 Thread Hoài Thương
Dear Support,

I need help about add node when install ceph with cephadm .

When i run cpeh orch add host ceph2

 error enoent: new host ceph2 (ceph2) failed check: ['traceback (most
recent call last):',

Please help me fix it.

Thanks & Best Regards

David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Help add node to cluster using cephadm

2020-07-21 Thread davidthuong2424
I need help about add node when install ceph with cephadm . 

When i run cpeh orch add host ceph2

 error enoent: new host ceph2 (ceph2) failed check: ['traceback (most recent 
call last):', 

Please help me fix it. 

Thanks & Best Regards 

David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread steven prothero
Hello,

is podman installed on the new node? also make sure the NTP time sync
is on for new node. The ceph orch checks those on the new node and
then dies if not ready with an error like you see.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread davidthuong2424
hello, 
i use docker, i will check ntp, 

Do new node need to be installed?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread steven prothero
Hello,

Yes, make sure docker & ntp is setup on the new node first.
Also, make sure the public key is added on the new node and firewall
is allowing it through
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bluestore_prefer_deferred_size_hdd

2020-07-21 Thread Irek Fasikhov
No. You need to recreate the OSD.

ср, 22 июл. 2020 г., 2:52 Frank Ritchie :

> Hi all,
>
> Is it safe to change bluestore_prefer_deferred_size_hdd for an OSD at
> runtime?
>
> thx
> Frank
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help add node to cluster using cephadm

2020-07-21 Thread Hoài Thương
Will do, thanks!

Vào Th 4, 22 thg 7, 2020 vào lúc 12:27 steven prothero <
ste...@marimo-tech.com> đã viết:

> Hello,
>
> Yes, make sure docker & ntp is setup on the new node first.
> Also, make sure the public key is added on the new node and firewall
> is allowing it through
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io