[ceph-users] Re: OSD crash on Onode::put

2023-01-13 Thread Frank Schilder
Hi Anthony and Serkan,

I think Anthony had the right idea. I forgot that we re-deployed a number of 
OSDs on existing drives and also did a PG split over Christmas. The relatively 
few disks that stick out with cache_other usage seem all to be these newly 
deployed OSDs. So, it looks like that the cach_other item leakage is rather 
mild in normal operations, but can be substantial after backfilling new disks.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder
Sent: 11 January 2023 12:21:30
To: Serkan Çoban; Anthony D'Atri
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: OSD crash on Onode::put

Hi Anthony and Serkan,

I checked the drive temperatures and there is nothing special about this slot. 
The disks in this slot are from different vendors and were not populated 
incrementally. It might be a very weird coincidence. I seem to have an OSD 
developing this problem in another slot on a different host now. Let's see what 
happens in the future. No reason to turn superstitious :)

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD crash on Onode::put

2023-01-13 Thread Frank Schilder
Hi Igor,

my approach here, before doing something crazy like a daily cron job for 
restarting OSDs, is to do at least a minimum of thread analysis. How much of a 
problem is it really? I'm here also mostly guided by performance loss. As far 
as I know, the onode cache should be one of the most important caches regarding 
performance. Of course, only if the hit-rate is decent and this I can't pull 
out. Since I can't check the hit-rate, the second best thing is to see how an 
OSD's item count compares with average, how it develops on a restarted OSD and 
so on to get an idea what is normal, what is degraded and what requires action.

As far as I can tell after this relatively short amount of time, the item leak 
is a rather mild problem on our cluster. The few OSDs that were exceptional are 
all OSDs that were newly deployed and not restarted since backfill completed. 
It seems that backfill is an operation that triggers a measurable amount of 
cache_other to be lost to cleanup. Otherwise, a restart every 2-3 months might 
be warranted. Since we plan to upgrade to pacific this summer, this means not 
too much needs to be done. I will just keep an eye on onode item counts and 
restart one or the other OSD when warranted.

About "its just a restart". Most of the time it is. However, there was just 
recently a case where a restart meant complete loss of an OSD. The bug causing 
the restart corrupted the rocks DB beyond repair. Therefore, I think its always 
worth checking, doing some thread analysis and preventing unintended restarts 
if possible.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Igor Fedotov 
Sent: 12 January 2023 13:07:11
To: Frank Schilder; Dongdong Tao; ceph-users@ceph.io
Cc: d...@ceph.io
Subject: Re: [ceph-users] Re: OSD crash on Onode::put

Hi Frank,

IMO all the below logic is a bit of overkill and no one can provide 100% valid 
guidance on specific numbers atm. Generally I agree with Dongdong's point that 
crash is effectively an OSD restart and hence no much sense to perform such a 
restart manually - well, the rationale might be to do that gracefully and avoid 
some potential issues though...

Anyway I'd rather recommend to do periodic(!) manual OSD restart e.g. on a 
daily basis at off-peak hours instead of using tricks with mempool stats 
analysis..


Thanks,

Igor


On 1/10/2023 1:15 PM, Frank Schilder wrote:

Hi Dongdong and Igor,

thanks for pointing to this issue. I guess if its a memory leak issue (well, 
cache pool trim issue), checking for some indicator and an OSD restart should 
be a work-around? Dongdong promised a work-around but talks only about a patch 
(fix).

Looking at the tracker items, my conclusion is that unusually low values of 
.mempool.by_pool.bluestore_cache_onode.items of an OSD might be such an 
indicator. I just run a very simple check on all our OSDs:

for o in $(ceph osd ls); do n_onode="$(ceph tell "osd.$o" dump_mempools | jq 
".mempool.by_pool.bluestore_cache_onode.items")"; echo -n "$o: "; 
((n_onode<10)) && echo "$n_onode"; done; echo ""

and found 2 with seemingly very unusual values:

: 3098
1112: 7403

Comparing two OSDs with same disk on the same host gives:

# ceph daemon osd. dump_mempools | jq 
".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_onode.bytes,.mempool.by_pool.bluestore_cache_other.items,.mempool.by_pool.bluestore_cache_other.bytes"
3200
1971200
260924
900303680

# ceph daemon osd.1030 dump_mempools | jq 
".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_onode.bytes,.mempool.by_pool.bluestore_cache_other.items,.mempool.by_pool.bluestore_cache_other.bytes"
60281
37133096
8908591
255862680

OSD  does look somewhat bad. Shortly after restarting this OSD I get

# ceph daemon osd. dump_mempools | jq 
".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_onode.bytes,.mempool.by_pool.bluestore_cache_other.items,.mempool.by_pool.bluestore_cache_other.bytes"
20775
12797400
803582
24017100

So, the above procedure seems to work and, yes, there seems to be a leak of 
items in cache_other that pushes other pools down to 0. There seem to be 2 
useful indicators:

- very low .mempool.by_pool.bluestore_cache_onode.items
- very high 
.mempool.by_pool.bluestore_cache_other.bytes/.mempool.by_pool.bluestore_cache_other.items

Here a command to get both numbers with OSD ID in an awk-friendly format:

for o in $(ceph osd ls); do printf "%6d %8d %7.2f\n" "$o" $(ceph tell "osd.$o" 
dump_mempools | jq 
".mempool.by_pool.bluestore_cache_onode.items,.mempool.by_pool.bluestore_cache_other.bytes/.mempool.by_pool.bluestore_cache_other.items");
 done

Pipe it to a file and do things like:

awk '$2<5 || $3>200' FILE

For example, I still get:

# awk '$2<5 || $3>200' cache_onode.txt
  109249225   43.74
  109346193   43.70
  109847550   43.47
  110148873  

[ceph-users] Re: RGW error Coundn't init storage provider (RADOS)

2023-01-13 Thread Alexander Y. Fomichev
Hi

I facing similar error a couple of days ago:
radosgw-admin --cluster=cl00 realm create --rgw-realm=data00 --default
...
(0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned
(34) Numerical result out of range (this can be due to a pool or placement
group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
exceeded)
...
obviously radosgw-admin unable to create pool .rgw.root (at the same time
"ceph pool create" works as expected)
Crowling on a mon logs with debug=20 leads to record:
"... prepare_new_pool got -34 'pgp_num' must be greater than 0 and lower or
equal than 'pg_num', which in this case is 1"
As for me pg_num=1 looks strange because default value of
osd_pool_default_pg_num=32.
On the other side default osd_pool_default_pgp_num=0 so I tried to set
osd_pool_default_pgp_num=1 and it worked:
pool .rgw.root was built.
 What really looks strange, after first success I can't reproduce it any
more.
After that "radosgw-admin ... realm create" successfully builds .rgw.root
even with osd_pool_default_pgp_num=0.Nevertheless I suspect a record
"pgp_num must be greater than 0 and lower or equal than 'pg_num', which in
this case is 1"
points to  existing bug. It looks like default values of
osd_pool_default_pg[p]_num somway ignored/omitted.


On Tue, Jul 19, 2022 at 9:11 AM Robert Reihs  wrote:

> Yes, I checked pg_num, pgp_num and mon_max_pg_per_osd. I also setup a
> single node cluster with the same ansible script we have. Using cephadm for
> setting um and managing the cluster. I had the same problem on the new
> single node cluster without setup of any other services. When I created the
> pools manually the service started and also the dashboard connection
> directly worked.
>
> On Mon, Jul 18, 2022 at 10:20 AM Janne Johansson 
> wrote:
>
> > No, rgw should have the ability to create its own pools. Check the caps
> on
> > tve keys used by the rgw daemon.
> >
> > Den mån 18 juli 2022 09:59Robert Reihs  skrev:
> >
> >> Hi,
> >> I had to manually create the pools, than the service automatically
> started
> >> and is now available.
> >> pools:
> >> .rgw.root
> >> default.rgw.log
> >> default.rgw.control
> >> default.rgw.meta
> >> default.rgw.buckets.index
> >> default.rgw.buckets.data
> >> default.rgw.buckets.non-ec
> >>
> >> Is this normal behavior? Should then the error message be changed? Or is
> >> this a bug?
> >> Best
> >> Robert Reihs
> >>
> >>
> >> On Fri, Jul 15, 2022 at 3:47 PM Robert Reihs 
> >> wrote:
> >>
> >> > Hi,
> >> > When I have no luck yet solving the issue, but I can add some
> >> > more information. The system pools ".rgw.root" and "default.rgw.log"
> are
> >> > not created. I have created them manually, Now there is more log
> >> activity,
> >> > but still getting the same error message in the log:
> >> > rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create returned
> >> (34)
> >> > Numerical result out of range (this can be due to a pool or placement
> >> group
> >> > misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
> exceeded)
> >> > I can't find the correct pool to create manually.
> >> > Thanks for any help
> >> > Best
> >> > Robert
> >> >
> >> > On Tue, Jul 12, 2022 at 5:22 PM Robert Reihs 
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> We have a problem with deloing radosgw vi cephadm. We have a Ceph
> >> cluster
> >> >> with 3 nodes deployed via cephadm. Pool creation, cephfs and block
> >> storage
> >> >> are working.
> >> >>
> >> >> ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
> >> >> (stable)
> >> >>
> >> >> The service specs is like this for the rgw:
> >> >>
> >> >> ---
> >> >>
> >> >> service_type: rgw
> >> >>
> >> >> service_id: rgw
> >> >>
> >> >> placement:
> >> >>
> >> >>   count: 3
> >> >>
> >> >>   label: "rgw"
> >> >>
> >> >> ---
> >> >>
> >> >> service_type: ingress
> >> >>
> >> >> service_id: rgw.rgw
> >> >>
> >> >> placement:
> >> >>
> >> >>   count: 3
> >> >>
> >> >>   label: "ingress"
> >> >>
> >> >> spec:
> >> >>
> >> >>   backend_service: rgw.rgw
> >> >>
> >> >>   virtual_ip: [IPV6]
> >> >>
> >> >>   virtual_interface_networks: [IPV6 CIDR]
> >> >>
> >> >>   frontend_port: 8080
> >> >>
> >> >>   monitor_port: 1967
> >> >>
> >> >> The error I get in the logfiles:
> >> >>
> >> >> 0 deferred set uid:gid to 167:167 (ceph:ceph)
> >> >>
> >> >> 0 ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1)
> quincy
> >> >> (stable), process radosgw, pid 2
> >> >>
> >> >> 0 framework: beast
> >> >>
> >> >> 0 framework conf key: port, val: 80
> >> >>
> >> >> 1 radosgw_Main not setting numa affinity
> >> >>
> >> >> 1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0
> >> >>
> >> >> 1 D3N datacache enabled: 0
> >> >>
> >> >> 0 rgw main: rgw_init_ioctx ERROR: librados::Rados::pool_create
> returned
> >> >> (34) Numerical result out of range (this can be due to a pool or
> >> placement
> >> >> group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd
> >> >> exceeded)
> >> >>
> >> >> 0 rgw

[ceph-users] heavy rotation in store.db folder alongside with traces and exceptions in the .log

2023-01-13 Thread Jürgen Stawska
Hi everyone,

I'm facing a weird issue with one of my pacific clusters.

Brief into:
- 5 Nodes Ubuntu 20.04. on 16.2.7 ( ceph01…05 )
- bootstrapped with cephadm recent image from quay.io (around 1 year ago) 
- approx. 200TB capacity 5% used
- 5 OSD (2 HDD / 2 SSD / 1 NVMe) on each node
- each node has a MON, yeah 5 MONs in charge
- 3 RGW
- 2 MGR
- 3 MDS (2 active and 1 stby)
The cluster is serving S3 files and cephFS for k8s PVCs and is doing very well.

But:

During a regular maintenance I found a heavy rotating store.db on EVERY node. 
Taking a further look, I found weird stuff going on in the #.log 
The log is growing with a rate of approx. 400k/s and is rotating when reaching 
a certain size.

store.db
-rw-r--r-- 1 ceph ceph 11445745 Jan 13 09:53 1546576.log
-rw-r--r-- 1 ceph ceph 67352998 Jan 13 09:53 1546578.sst
-rw-r--r-- 1 ceph ceph 67349926 Jan 13 09:53 1546579.sst
-rw-r--r-- 1 ceph ceph 67363989 Jan 13 09:53 1546580.sst
-rw-r--r-- 1 ceph ceph 41063487 Jan 13 09:53 1546581.sst



executing refresh((['ceph01', 'ceph02', 'ceph03', 'ceph04', 'ceph05'],)) failed.
Traceback (most recent call last):
  File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 48, in 
bootstrap_exec
s = io.read(1)
  File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 402, in read
raise EOFError("expected %d bytes, got %d" % (numbytes, len(buf)))
EOFError: expected 1 bytes, got 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1357, in _remote_connection
conn, connr = self.mgr._get_connection(addr)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 1340, in _get_connection
sudo=True if self.ssh_user != 'root' else False)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 35, in 
__init__
self.gateway = self._make_gateway(hostname)
  File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 46, in 
_make_gateway
self._make_connection_string(hostname)
  File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway
gw = gateway_bootstrap.bootstrap(io, spec)
  File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, 
in bootstrap
bootstrap_exec(io, spec)
  File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 53, in 
bootstrap_exec
raise HostNotFound(io.remoteaddress)
execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-6p_ae5op -i 
/tmp/cephadm-identity-hc1rt28x ubuntuadmin@<< IP_OF_CEPH-01 REPLACED >>

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work
return f(*arg)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 312, in refresh
with self._remote_connection(host) as tpl:
  File "/lib64/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1391, in _remote_connection
raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to ceph01 << 
IP_OF_CEPH-01 REPLACED >>).
Please make sure that the host is reachable and accepts connections using the 
cephadm SSH key
...
... [some binary stuff here] …
...
ceph01.sjtrntß$Skd???>ö#?cZ+Removing orphan daemon mds.cephfs.ceph02…cephadm
ceph01.sjtrntß$Skd???>ö#?cXx??Z-Removing daemon mds.cephfs.ceph02 from 
ceph01cephadm
ceph01.sjtrntß$Skd???>_#?cԕ?0?Z"Removing key for mds.cephfs.ceph02cephadm
ceph01.sjtrntß$Skd???>_#?cUƾ0?Z=Reconfiguring mds.cephfs.ceph02 (unknown last 
config time)...cephadm
ceph01.sjtrntß$Skd???>_#?cE?"2?Z0Reconfiguring daemon mds.cephfs.ceph02 on 
ceph01cephadm
ceph01.sjtrntß$Skd???>`#?c??&?Zcephadm exited with an error code: 1, 
stderr:Non-zero exit code 1 from /usr/bin/docker container inspect --format 
ää.State.Status¨¨ ceph-<>-mds-cephfs-ceph02
/usr/bin/docker: stdout 
/usr/bin/docker: stderr Error: No such container: ceph-<>-mds-cephfs-ceph02
Non-zero exit code 1 from /usr/bin/docker container inspect --format 
ää.State.Status¨¨ ceph-<>-mds.cephfs.ceph02
/usr/bin/docker: stdout 
/usr/bin/docker: stderr Error: No such container: ceph-<>-mds.cephfs.ceph02
Reconfig daemon mds.cephfs.ceph02 ...
ERROR: cannot reconfig, data path /var/lib/ceph/<>/mds.cephfs.ceph02 does not exist
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
code, 'ön'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 
1, stderr:Non-zero exit code 1 from /usr/bin/docker container inspect --format 
ää.State.Status¨¨ ceph-<>-mds-cephfs-ceph02
/usr/bin/docker: stdout 
/usr/bin/docker: stderr Error: No such container: ceph-<>-mds-cephfs-ceph02
Non-zero exit code 1 from /usr/bin/docker con

[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2023-01-13 Thread Manuel Holtgrewe
Dear Xiubo,

could you explain how to enable kernel debug logs (I assume this is on the
client)?

Thanks,
Manuel

On Fri, May 13, 2022 at 9:39 AM Xiubo Li  wrote:

>
> On 5/12/22 12:06 AM, Stefan Kooman wrote:
> > Hi List,
> >
> > We have quite a few linux kernel clients for CephFS. One of our
> > customers has been running mainline kernels (CentOS 7 elrepo) for the
> > past two years. They started out with 3.x kernels (default CentOS 7),
> > but upgraded to mainline when those kernels would frequently generate
> > MDS warnings like "failing to respond to capability release". That
> > worked fine until 5.14 kernel. 5.14 and up would use a lot of CPU and
> > *way* more bandwidth on CephFS than older kernels (order of
> > magnitude). After the MDS was upgraded from Nautilus to Octopus that
> > behavior is gone (comparable CPU / bandwidth usage as older kernels).
> > However, the newer kernels are now the ones that give "failing to
> > respond to capability release", and worse, clients get evicted
> > (unresponsive as far as the MDS is concerned). Even the latest 5.17
> > kernels have that. No difference is observed between using messenger
> > v1 or v2. MDS version is 15.2.16.
> > Surprisingly the latest stable kernels from CentOS 7 work flawlessly
> > now. Although that is good news, newer operating systems come with
> > newer kernels.
> >
> > Does anyone else observe the same behavior with newish kernel clients?
>
> There have some known bugs, which have been fixed or under fixing
> recently, even in the mainline and, not sure whether are they related.
> Such as [1][2][3][4]. More detail please see ceph-client repo testing
> branch [5].
>
> I have never see the "failing to respond to capability release" issue
> yet, if you have the MDS logs(debug_mds = 25 and debug_ms = 1) and
> kernel debug logs will be better to help debug it further, or provide
> the steps to reproduce it.
>
> [1] https://tracker.ceph.com/issues/55332
> [2] https://tracker.ceph.com/issues/55421
> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2063929
> [4] https://tracker.ceph.com/issues/55377
> [5] https://github.com/ceph/ceph-client/commits/testing
>
> Thanks
>
> -- Xiubo
>
> >
> > Gr. Stefan
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror ceph quincy Not able to find rbd_mirror_journal_max_fetch_bytes config in rbd mirror

2023-01-13 Thread Eugen Block

Hi,

apparently this config option has been removed between N and O  
releases. I found this revision [1] from 2019 and the pull request [2]  
in favor of adjusting the journal fetch based on memory target. I  
didn't read the whole conversation but to me it looks like the docs  
are outdated and I'd recommend to create a tracker issue for that. I  
also don't have an answer how to tune the journal fetch performance,  
hopefully someone can chime in here. But if it's based on the  
rbd_mirror memory target I'd try to play with some of these values  
(maybe in a test environment first):


rbd_mirror_memory_target
rbd_mirror_memory_cache_min

If you have the resources, maybe increase the memory_target and see if  
the speed increases.


Regards,
Eugen

[1]  
https://tracker.ceph.com/projects/ceph/repository/revisions/1ef12ea0d29f95545dc0350143e4a94b115d5e5f

[2] https://github.com/ceph/ceph/pull/27670

Zitat von ankit raikwar :


Hello All,
 In the ceph quincy Not able to find  
rbd_mirror_journal_max_fetch_bytes config

in rbd mirror
 i configured the ceph cluster almost 400 tb and enable  
the rbd-mirror in the
starting stage i'm able to achive the almost 9 GB speed , but after  
the rebalane
completed of the all the images . rbd-mirror speed got automaticily  
reduce to between 4 to

5 mbps.
  in my primary cluster we are continuelsy writing the 50 to 400  
mbps data but replication
speed only we get the 4 to 5 mbps. also we have the 10 Gbps  
replication network

bandwidth.


Note::- I also try to find the option  
rbd_mirror_journal_max_fetch_bytes  but i'm not
able to find the this option in the configuration. also when i try  
to set from the command

line it's showing error  like

command:
 ceph config set client.rbd rbd_mirror_journal_max_fetch_bytes 33554432

error:
Error EINVAL: unrecognized config option 'rbd_mirror_journal_max_fetch_bytes'

cluster version
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)  
quincy (stable)


Please suggest any alternative way to configurre this option or how  
i  can improve the

replication n/w speed.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw ceph.conf question

2023-01-13 Thread Boris Behrens
Hi,
I am just reading through this document (
https://docs.ceph.com/en/octopus/radosgw/config-ref/) and on the top is
states:

The following settings may added to the Ceph configuration file (i.e.,
> usually ceph.conf) under the [client.radosgw.{instance-name}] section.
>

And my ceph.conf looks like this:

[client.eu-central-1-s3db3]
> rgw_frontends = beast endpoint=[::]:7482
> rgw_region = eu
> rgw_zone = eu-central-1
>
> [client.eu-central-1-s3db3-old]
> rgw_frontends = beast endpoint=[::]:7480
> rgw_region = eu
> rgw_zone = eu-central-1
>
> [client.eu-customer-1-s3db3]
> rgw_frontends = beast endpoint=[::]:7481
> rgw_region = eu-someother
> rgw_zone = eu-someother-1
>

Do I need to change the section names? It also seems that rgw_region is a
non-existing config value (this might have come from very old RHCS
documentation)

Would be very nice if someone could help me clarify this.

Cheers and happy weekend
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Current min_alloc_size of OSD?

2023-01-13 Thread Konstantin Shalygin
Hi,

> On 12 Jan 2023, at 04:35, Robert Sander  wrote:
> 
> How can I get the current min_allloc_size of OSDs that were created with 
> older Ceph versions? Is there a command that shows this info from the on disk 
> format of a bluestore OSD?

You can see this via kvstore-tool:


ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-0/ get S min_alloc_size

4096:

  00 10 00 00 00 00 00 00   ||
0008

65536:

  00 00 01 00 00 00 00 00   ||
0008


Cheers,
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Current min_alloc_size of OSD?

2023-01-13 Thread Robert Sander

Hi,

Am 13.01.23 um 14:35 schrieb Konstantin Shalygin:


ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-0/ get S min_alloc_size


This only works when the OSD is not running.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Current min_alloc_size of OSD?

2023-01-13 Thread David Orman
I think this would be valuable to have easily accessible during runtime, 
perhaps submit a report (and patch if possible)?

David

On Fri, Jan 13, 2023, at 08:14, Robert Sander wrote:
> Hi,
> 
> Am 13.01.23 um 14:35 schrieb Konstantin Shalygin:
> 
> > ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-0/ get S 
> > min_alloc_size
> 
> This only works when the OSD is not running.
> 
> Regards
> -- 
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> http://www.heinlein-support.de
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Zwangsangaben lt. §35a GmbHG:
> HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Telemetry service is temporarily down

2023-01-13 Thread Yaarit Hatuka
Hi everyone,

Our telemetry service is up and running again.
Thanks Adam Kraitman and Dan Mick for restoring the service.

We thank you for your patience and appreciate your contribution to the
project!

Thanks,
Yaarit

On Tue, Jan 3, 2023 at 3:14 PM Yaarit Hatuka  wrote:

> Hi everyone,
>
> We are having some infrastructure issues with our telemetry backend, and
> we are working on fixing it.
> Thanks Jan Horacek for opening this issue
>  [1]. We will update once the
> service is back up.
> We are sorry for any inconvenience you may be experiencing, and appreciate
> your patience.
>
> Thanks,
> Yaarit
>
> [1] https://tracker.ceph.com/issues/58371
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS error

2023-01-13 Thread André de Freitas Smaira
Hello!

Yesterday we found some errors in our cephadm disks, which is making it
impossible to access our HPC Cluster:

# ceph health detail
HEALTH_WARN 3 failed cephadm daemon(s); insufficient standby MDS daemons
available
[WRN] CEPHADM_FAILED_DAEMON: 3 failed cephadm daemon(s)
daemon mds.cephfs.s1.nvopyf on s1.ceph.infra.ufscar.br is in error state
daemon mds.cephfs.s2.qikxmw on s2.ceph.infra.ufscar.br is in error state
daemon mds.cftv.s2.anybzk on s2.ceph.infra.ufscar.br is in error state
[WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons available
have 0; want 1 more

Googling we found out that we should remove the failed MDS, but the data in
these disks is relatively important. We would like to know if we need to
remove it or if it can be fixed, and if we have to remove it if the data
will be lost. Please tell me if you need more information.

Thanks in advance,
André de Freitas Smaira
Federal University of São Carlos - UFSCar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ._handle_peer_banner peer [v2:***,v1:***] is using msgr V1 protocol

2023-01-13 Thread Frank Schilder
Hi all,

on an octopus latest cluster I see a lot of these log messages:

Jan 13 20:00:25 ceph-21 journal: 2023-01-13T20:00:25.366+0100 7f47702b8700 -1 
--2- [v2:192.168.16.96:6826/5724,v1:192.168.16.96:6827/5724] >> 
[v2:192.168.16.93:6928/3503064,v1:192.168.16.93:6929/3503064] 
conn(0x55c867624400 0x55c7e9dfa800 unknown :-1 s=BANNER_CONNECTING pgs=22826 
cs=73364 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer 
[v2:192.168.16.93:6928/3503064,v1:192.168.16.93:6929/3503064] is using msgr V1 
protocol

These addresses are on the replication network and both hosts are OSD hosts.

What is the reason for these messages and how can I fix it?

Thanks a lot!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pg mapping verification

2023-01-13 Thread Christopher Durham

 Eugen,
Thank you for the tip. While writing a script is ok, it would be nice if 
therewas an official way to do this.
-Chris
 
 
-Original Message-
From: Eugen Block 
To: ceph-users@ceph.io
Sent: Thu, Jan 12, 2023 8:58 am
Subject: [ceph-users] Re: pg mapping verification

Hi,

I don't have an automation for that. I test a couple of random pg  
mappings if they meet my requirements, usually I do that directly with  
the output of crushtool. Here's one example from a small test cluster  
with three different rooms in the crushmap:

# test cluster (note that the columns may differ between ceph versions  
when using awk as I did here)
storage01:~ # ceph pg ls-by-pool  | awk '{print $15}'
ACTING
[8,13,5]p8
[22,4,13]p22
[28,22,26]p28
[21,5,1]p21
[20,34,27]p20
[...]

for i in {20,34,27}; do ceph osd find $i | grep room; done
        "room": "room2",
        "room": "room3",
        "room": "room1",

For this rule I have a room resiliency requirement so I grep for the  
room of each acting set.
The output of crushtool is helpful if you don't want to inject a new  
osdmap into a production cluster. Just one example:

crushtool -i crushmap.bin --test --rule 5 --show-mappings --num-rep 6 | head
CRUSH rule 5 x 0 [19,7,13,22,16,28]
CRUSH rule 5 x 1 [21,3,15,31,19,7]
[...]

Regards,
Eugen

Zitat von Christopher Durham :

> Hi,
> For a given crush rule and pool that uses it, how can I verify hat  
> the pgs in that pool folllow the rule? I have a requirement to  
> 'prove' that the pgs are mapping correctly.
> I see: https://pypi.org/project/crush/
> This allows me to read in a crushmap file that I could then use to  
> verify a pg with some scripting, but this pypi is very old and seems  
> not to be maintained or updatedsince 2017.
> I am sure there is a way, using osdmaptool or something else, but it  
> is not obvious. Before i spend alot of time searching, I thought I  
> would ask here.
> Basically, having a list of pgs like this:
> [[1,2,3,4,5],[2,3,4,5,6],...]
> Given a read-in crushmap and a specific rule therein, I want to  
> verify that all pgs in my list are consistent with the rule specified.
> Let me know if there is a proper way to do this, and thanks.
> -Chris
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] User access

2023-01-13 Thread Rhys Powell
rhys.g.pow...@gmail.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph orch osd spec questions

2023-01-13 Thread Wyll Ingersoll


Ceph Pacific 16.2.9

We have a storage server with multiple 1.7TB SSDs dedicated to the bluestore DB 
usage.  The osd spec originally was misconfigured slightly and had set the 
"limit" parameter on the db_devices to 5 (there are 8 SSDs available) and did 
not specify a block_db_size.  ceph layed out the original 40 OSDs and put 8 DBs 
across 5 of the SSDs (because of limit param).  Ceph seems to have auto-sized 
the bluestore DB partitions to be about 45GB, which is far less than the 
recommended 1-4% (using 10TB drives).  How does ceph-volume determine the size 
of the bluestore DB/WAL partitions when it is not specified in the spec?

We updated the spec and specified a block_db_size of 300G and removed the 
"limit" value.  Now we can see in the cephadm.log that the ceph-volume command 
being issued is using the correct list of SSD devices (all 8) as options to the 
lvm batch (--db-devices ...), but it keeps failing to create the new OSD 
because we are asking for 300G and it thinks there is only 44G available even 
though the last 3 SSDs in the list are empty (1.7T).  So, it appears that 
somehow the orchestrator is ignoring the last 3 SSDs.  I have verified that 
these SSDs are wiped clean, have no partitions or LVM, and no label (sgdisk -Z, 
wipefs -a). They appear as available in the inventory and not locked or 
otherwise in use.

Also, the "db_slots" spec parameter is ignored in pacific due to a bug so there 
is no way to tell the orchestrator to use "block_db_slots". Adding it to the 
spec like "block_db_size" fails since it is not recognized.

Any help figuring out why these SSDs are being ignored would be much 
appreciated.

Our spec for this host looks like this:
---

spec:

  data_devices:

rotational: 1

size: '3TB:'

  db_devices:

rotational: 0

size: ':2T'

vendor: 'SEAGATE'

  block_db_size: 300G

---

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Filesystem is degraded, offline, mds daemon damaged

2023-01-13 Thread bpurvis
I am really hoping you can help.  THANKS in advance. I have inherited a Docker 
swarm running CEPH but I know very little about it.
Current I have an unhealthy ceph environment that will not mount my data drive.
Its a cluster of 4 vm servers. docker01,docker02, docker03, docker-cloud
CL has the /data that is on a separate drive, currently failing to mount.
how can I recover this without loosing the data?

on server docker-cloud, mount /data returns:
mount error 113 = No route to host

docker ps is healthy on all nodes.
bc81d14dde92   ceph/daemon:latest-mimic  "/opt/ceph-container…" 
  2 years ago  Up 34 minutes ceph-mds
d4fecec5e0e8   ceph/daemon:latest-mimic  "/opt/ceph-container…" 
  2 years ago  Up 34 minutes ceph-osd
482ba41803af   ceph/daemon:latest-mimic  "/opt/ceph-container…" 
  2 years ago  Up 34 minutes ceph-mgr
d6a5c44179c7   ceph/daemon:latest-mimic  "/opt/ceph-container…" 
  2 years ago  Up 32 minutes ceph-mon

ceph -s:
  cluster:
id: 7a5b2243-8e92-4e03-aee7-aa64cea666ec
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
noout,noscrub,nodeep-scrub flag(s) set
clock skew detected on mon.docker02, mon.docker03, mon.docker-cloud
mons docker-cloud,docker01,docker02,docker03 are low on available 
space

  services:
mon: 4 daemons, quorum docker01,docker02,docker03,docker-cloud
mgr: docker01(active), standbys: docker02, docker03, docker-cloud
mds: cephfs-0/1/1 up , 4 up:standby, 1 damaged
osd: 4 osds: 4 up, 4 in
 flags noout,noscrub,nodeep-scrub

  data:
pools:   2 pools, 256 pgs
objects: 194.2 k objects, 241 GiB
usage:   499 GiB used, 1.5 TiB / 2.0 TiB avail
pgs: 256 active+clean
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS error

2023-01-13 Thread afsmaira
Aditional information:

- We tried to reset bothe the services and the entire machine
- journalctl part:

jan 13 02:40:18 s1.ceph.infra.ufscar.br 
ceph-bab39b74-c93a-4e34-aae9-a44a5569d52c-mon-s1[6343]: debug 
2023-01-13T05:40:18.653+ 7fc370b64700  0 log_channel(cluster) log [WRN] : 
Replacing daemon mds.cephfs.s1.nvopyf as rank 1 with standby daemon 
mds.cephfs.s2.qikxmw
jan 13 02:40:18 s1.ceph.infra.ufscar.br 
ceph-bab39b74-c93a-4e34-aae9-a44a5569d52c-mon-s1[6343]: debug 
2023-01-13T05:40:18.653+ 7fc370b64700  1 mon.s1@0(leader).mds e653196 
fail_mds_gid 107853765 mds.cephfs.s1.nvopyf role 1
jan 13 02:40:18 s1.ceph.infra.ufscar.br 
ceph-bab39b74-c93a-4e34-aae9-a44a5569d52c-mon-s1[6343]: debug 
2023-01-13T05:40:18.653+ 7fc370b64700  0 log_channel(cluster) log [INF] : 
MDS daemon mds.cephfs.s1.nvopyf is removed because it is dead or otherwise 
unavailable
jan 13 02:40:18 s1.ceph.infra.ufscar.br 
ceph-bab39b74-c93a-4e34-aae9-a44a5569d52c-mon-s1[6343]: debug 
2023-01-13T05:40:18.677+ 7fc370b64700  0 log_channel(cluster) log [WRN] : 
Health check failed: 1 filesystem is degraded (FS_DEGRADED)
jan 13 02:40:18 s1.ceph.infra.ufscar.br 
ceph-bab39b74-c93a-4e34-aae9-a44a5569d52c-mon-s1[6343]: debug 
2023-01-13T05:40:18.677+ 7fc370b64700  0 log_channel(cluster) log [WRN] : 
Health check failed: insufficient standby MDS daemons available 
(MDS_INSUFFICIENT_STANDBY)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Remove failed multi-part uploads?

2023-01-13 Thread rhys . g . powell
Hello,

We are running an older version of ceph - 14.2.22 nautilus

We have a radosgw/s3 implementation and had some issues with multi-part uploads 
failing to complete.

We used s3cmd to delete the failed uploads and clean out the bucket, but when 
reviewing the space utilization of buckets, it seems this one is still 
consuming space:



[ ~]# radosgw-admin bucket stats --bucket=BUCKETNAME
{
"bucket": "BUCKETNAME",
"num_shards": 32,
"tenant": "",
"zonegroup": "c73e02d6-d479-4cdc-bf86-8b09f0a9f6ba",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "50ee73bc-bc08-4f9f-9d5b-4492cb4c5e77.1689003.1695",
"marker": "50ee73bc-bc08-4f9f-9d5b-4492cb4c5e77.1689003.1695",
"index_type": "Normal",
"owner": "BUCKETNAME",
"ver": 
"0#47066,1#30480,2#42797,3#36437,4#47308,5#33285,6#37127,7#24292,8#44567,9#34273,10#29402,11#36228,12#48153,13#32665,14#42314,15#21143,16#34319,17#42818,18#39301,19#23897,20#26225,21#50957,22#39706,23#29723,24#49619,25#44974,26#44020,27#22505,28#46702,29#49390,30#27263,31#21515",
"master_ver": 
"0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0,30#0,31#0",
"mtime": "2021-02-08 13:06:13.311932Z",
"max_marker": 
"0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#",
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 18446744073709551613
},
"rgw.main": {
"size": 34247260247640,
"size_actual": 34247284682752,
"size_utilized": 34247260247640,
"size_kb": 33444590086,
"size_kb_actual": 33444613948,
"size_kb_utilized": 33444590086,
"num_objects": 340627
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 0
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}



I see under the usage.rgw.main.size_kb_actual the value is 33444613948, or 
roughly 30TB

When I use the radosgw-admin tool to list objects, I can see many failed 
multi-part uploads:

[ ~]# radosgw-admin bucket list --bucket BUCKETNAME | jq '.[] | "\(.name), 
\(.meta.mtime), \(.meta.size)"'
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~07YXhKKZn2XYy-6F0itVB4tpuBm1q1J.1,
 2021-02-10 00:57:08.033082Z, 4194304"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~07YXhKKZn2XYy-6F0itVB4tpuBm1q1J.2,
 2021-02-10 00:56:36.463099Z, 8794011"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~b6-C6I3rky3V2Wh4H56jhsfVjvvTMj2.1,
 2021-02-10 00:38:44.572199Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~b6-C6I3rky3V2Wh4H56jhsfVjvvTMj2.2,
 2021-02-10 00:38:48.680330Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-10.tar.gz.2~b6-C6I3rky3V2Wh4H56jhsfVjvvTMj2.3,
 2021-02-10 00:38:52.232674Z, 95445231"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.1,
 2021-02-11 00:30:55.489965Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.2,
 2021-02-11 00:30:58.832752Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.3,
 2021-02-11 00:31:01.188868Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.4,
 2021-02-11 00:30:53.035172Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~R8SwLZMVNM5kL4Ov7sX47mXdEJf0hfu.5,
 2021-02-11 00:30:21.359861Z, 12448760"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.1,
 2021-02-11 00:11:52.163319Z, 4194304"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.2,
 2021-02-11 00:11:48.293292Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.3,
 2021-02-11 00:11:55.320413Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.4,
 2021-02-11 00:11:55.039628Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-11.tar.gz.2~mPN97GOqO8E93gqVUbt_esJfB4kLu2h.5,
 2021-02-11 00:11:26.493213Z, 2005541"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-12.tar.gz.2~05JmbiZqt8tvgVmJ3Ef6WEzBa3Jla7L.1,
 2021-02-12 00:53:24.453273Z, 104857600"
"_multipart_chi-pl-clh-shard-0-0-0-2021-02-12.tar.gz.2~05JmbiZqt8tvgVmJ3Ef6WEzBa3Jla7L.2,
 2021-0