[ceph-users] Re: Removing OSD very slow (objects misplaced)

2022-12-28 Thread E Taka
Thanks, Liang. But this doesn't help since Ceph 17. Setting the mclock
profile to "high recovery" speeds up a little bit. The main problem
remains: 95% of the recovery time is needed for just one PG. This was not
the case before Quincy.

郑亮  schrieb am Mo., 26. Dez. 2022, 03:52:

> Hi erich,
> You can reference following link:
> https://www.suse.com/support/kb/doc/?id=19693
>
> Thanks,
> Liang Zheng
>
>
> E Taka <0eta...@gmail.com> 于2022年12月16日周五 01:52写道:
>
>> Hi,
>>
>> when removing some OSD with the command `ceph orch osd rm X`, the
>> rebalancing starts very fast, but after a while it almost stalls with a
>> very low recovering rate:
>>
>> Dec 15 18:47:17 … : cluster [DBG] pgmap v125312: 3361 pgs: 13
>> active+clean+scrubbing+deep, 4 active+remapped+backfilling, 3344
>> active+clean; 95 TiB data, 298 TiB used, 320 TiB / 618 TiB avail; 13 MiB/s
>> rd, 3.9 MiB/s wr, 610 op/s; 403603/330817302 objects misplaced (0.122%);
>> 1.1 MiB/s, 2 objects/s recovering
>>
>> As you can see, the rate is 2 Objects/s for over 40 objects. `ceph
>> orch
>> osd rm status` shows long running draining processes (now over 4 days):
>>
>> OSD  HOSTSTATE PGS  REPLACE  FORCE  ZAPDRAIN STARTED AT
>> 64   ceph05  draining1  FalseFalse  False  2022-12-11
>> 16:18:14.692636+00:00
>> …
>>
>> Is there y way to increase the speed of the draining/rebalancing?
>>
>> Thanks!
>> Erich
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS active-active

2022-12-28 Thread Pavin Joseph

Hi Isaiah,

A simple solution for multi-site redundancy is to have two nearby sites 
with < 3ms latency and setup crush map [0] for datacenter level 
redundancy instead of the default host level.


Performance was adequate in my testing for large number of small files 
if the latency between all nodes were kept below 3 ms. Of course it also 
depends on your application.


Ceph fs snapshot mirroring is asynchronous, so your application would 
need to handle the logic of switching to the replica node, operating 
with some missing data in a degraded state, synchronizing changes back 
to primary after it comes online and switching back. Too complicated IMHO.


[0]: https://docs.ceph.com/en/quincy/rados/operations/crush-map/

Kind regards,
Pavin Joseph.

On 28-Dec-22 11:27 AM, Isaiah Tang Yue Shun wrote:

Hi all,

 From the documentation, I can only find Ceph Object Gateway multi-site 
implementation. I wonder is it if we are using CephFS, how can we achieve 
active-active setup for production?

Any input is appreciated.

Thanks.

Regards,
Isaiah Tang
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Urgent help! RGW Disappeared on Quincy

2022-12-28 Thread Pavin Joseph

1. This is a guess, but check /var/[lib|run]/ceph for any lock files.
2. This is more straightforward to fix, add faster WAL/Block device/LV 
for each OSD or create a fast storage pool just for metadata. Also, 
experiment with MDS cache size/trim [0] settings.


[0]: https://docs.ceph.com/en/latest/cephfs/cache-configuration/

On 28-Dec-22 7:23 AM, Deep Dish wrote:

Got logging enabled as per
https://ceph.io/en/news/blog/2022/centralized_logging/.   My embedded
grafana doesn't come up in the dashboard, but at least I have log (files)
on my nodes.   Interesting.

Two issues plaguing my cluster:

1 - RGWs not manageable
2 - MDS_SLOW_METADATA_IO warning (impact to cephfs)

Issue 1:

I have 4x RGWs deployed.   All started / processes running.  They all
report similar log entries:

7fcc32b6a5c0  0 deferred set uid:gid to 167:167 (ceph:ceph)

7fcc32b6a5c0  0 ceph version 17.2.5
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process
radosgw, pid 2

7fcc32b6a5c0  0 framework: beast

7fcc32b6a5c0  0 framework conf key: port, val: 80

7fcc32b6a5c0  1 radosgw_Main not setting numa affinity

7fcc32b6a5c0  1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0

7fcc32b6a5c0  1 D3N datacache enabled: 0

7fcc0869a700  0 INFO: RGWReshardLock::lock found lock on reshard.11
to be held by another RGW process; skipping for now

7fcc0bea1700  0 lifecycle: RGWLC::process() failed to acquire lock on lc.1,
sleep 5, try again

7fcc0dea5700  0 lifecycle: RGWLC::process() failed to acquire lock on lc.3,
sleep 5, try again

7fcc0dea5700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0dea5700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0bea1700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0dea5700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0bea1700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0dea5700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0bea1700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0dea5700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again

7fcc0bea1700  0 lifecycle: RGWLC::process() failed to acquire lock on
lc.16, sleep 5, try again
(repeating)

Seems like a stale lock, not previously cleaned up when the cluster was
busy recovering and rebalancing.

Issue 2:

ceph health detail:

[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs

 mds.fs01.ceph02mon03.rjcxat(mds.0): 8 slow metadata IOs are blocked >
30 secs, oldest blocked for 39485 secs

Log entries from ceph02mon03 MDS host:

  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131271 from mon.4
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131272 from mon.4
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131273 from mon.4
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131274 from mon.4
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131275 from mon.4
  7fe36c6b8700  0 log_channel(cluster) log [WRN] : 1 slow requests, 1
included below; oldest blocked for > 33.126589 secs
  7fe36c6b8700  0 log_channel(cluster) log [WRN] : slow request 33.126588
seconds old, received at 2022-12-27T19:45:45.952225+:
client_request(client.55009:99980 create
#0x1000bc2/vzdump-qemu-30003-2022_12_27-14_43_43.log
2022-12-27T19:45:45.948045+ caller_uid=0, caller_gid=0{}) currently
submit entry: journal_and_reply
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131276 from mon.4
  7fe36c6b8700  0 log_channel(cluster) log [WRN] : 1 slow requests, 0
included below; oldest blocked for > 38.126737 secs
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131277 from mon.4
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131278 from mon.4
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131279 from mon.4
  7fe36debb700  1 mds.fs01.ceph02mon03.rjcxat Updating MDS map to version
131280 from mon.4


I suspect that the file in the log above int's the culprit.   How can I get
to the root cause of MDS slowdowns?


On Tue, Dec 27, 2022 at 3:32 PM Pavin Joseph  wrote:


Interesting, the logs show the crash module [0] itself has crashed.
Something sent it a SIGINT or SIGTERM and the module didn't handle it
correctly due to what seems like a bug in the code.

I haven't experienced the crash module itself crashing yet (in Quincy)
because nothing sent a SIG[INT|TERM] to it yet.

So I'd continue investigating into why these signals were sent to the
crash module.

To fix the crash module from crashing, go to "/usr/bin/ceph-crash" and
edit the handler function on line 82 like so:

def handler(signum, frame):
print('**

[ceph-users] Re: Does Replica Count Affect Tell Bench Result or Not?

2022-12-28 Thread Erik Lindahl
Hi,

Just to add to the previous discussion, consumer SSDs like these can 
unfortunately be significantly *slower* than plain old HDDs for Ceph. This is 
because Ceph always uses SYNC writes to guarantee that data is on disk before 
returning.

Unfortunately NAND writes are intrinsically quite slow, and tri/quad-level SSDs 
are the worst of them all. Enterprise SSDs solve this by having 
power-loss-protection capacitors, which means they can safely return the data 
as written the second it is in the fast RAM on the device.

Cheap consumer SSDs fall in one of two categories:

1. The drive might lie and return the data as written as soon as it's in the 
write cache when a SYNC write is requested. This gives seemingly great 
performance ... until you have a power loss and your data is corrupted. 
Thankfully, very few drives do this today.
2. The drive treats the SYNC write correctly, which means it can't return until 
the request has been moved from cache to actual NAND memory, which is (very) 
slow.


The short story is likely that all drives without power-loss-protection should 
be avoided, because if the performance looks great, it might mean the drive 
falls in category #1 instead of being a magical & cheap solution.

There is unfortunately no inherent "best" SSD, but it depends on your usage. 
For instance, for our large data partitions we need a lot of space and high 
read performance, but we don't store/update the data that frequently, so we 
opted for Samsung PM883 drives that are only designed for 0.8 DWPD 
(disk-writes-per-day). In contrast, for metadata drives where we have more 
writes (but don't need a ton of storage), we use drives that can handle 3DWPD, 
like Samsung sm893.

Virtually all vendors have such different lines of drives, so you will need to 
start by analyzing how much data you expect to write per day relative to the 
total storage volume and get appropriate drives.

If you are operating a very read/write-intensive cluster with hundreds of 
operations in parallel you will benefit a lot from higher-IOPS-rate drives, but 
be aware that those theoretical numbers listed are typically only achieved for 
very large queue depths (i.e., always having 32-64 operations running in 
parallel).

Since you are currently using consumer SSD (which definitely don't have 
endurance to handle intensive IO anyway), my guess is that you might rather 
have a lower-end setup, and then good performance depends more on having 
consistent low latency for all operations (including to/from the network cards).

If I were to invest in new servers today, I would likely go with NVMe, mostly 
because it's the future and not *that* much more expensive, but for old servers 
almost any enterprise-class SSD with power-loss-protection from major vendors 
should be fine - but you need to analyse whether you need write-intensive disks 
or not.


Cheers,

Erik

--
Erik Lindahl 
On 28 Dec 2022 at 08:44 +0100, hosseinz8...@yahoo.com , 
wrote:
> Thanks. I am planning to change all of my disks. But do you know enterprise 
> SSD Disk which is best in trade of between cost & iops performance?Which 
> model and brand.Thanks in advance.
> On Wednesday, December 28, 2022 at 08:44:34 AM GMT+3:30, Konstantin Shalygin 
>  wrote:
>
> Hi,
>
> The cache was gone, optimize is proceed. This is not enterprise device, you 
> should never use it with Ceph 🙂
>
>
> k
> Sent from my iPhone
>
> > On 27 Dec 2022, at 16:41, hosseinz8...@yahoo.com wrote:
> >
> > Thanks AnthonyI have a cluster with QLC SSD disks (Samsung QVO 860). The 
> > cluster works for 2 year. Now all OSDs return 12 iops when running tell 
> > bench which is very slow. But I Buy new QVO disks yesterday, and I added 
> > this new disk to cluster. For the first 1 hour, I got 100 iops from this 
> > new OSD. But after 1 Hour, this new disk (OSD) returns to iops 12 again as 
> > the same as other OLD OSDs.I can not imagine what happening?!!
> >     On Tuesday, December 27, 2022 at 12:18:07 AM GMT+3:30, Anthony D'Atri 
> >  wrote:
> >
> > My understanding is that when you ask an OSD to bench (via the admin 
> > socket), only that OSD executes, there is no replication.  Replication is a 
> > function of PGs.
> >
> > Thus, this is a narrowly-focused tool with both unique advantages and 
> > disadvantages.
> >
> >
> >
> > > > On Dec 26, 2022, at 12:47 PM, hosseinz8...@yahoo.com wrote:
> > > >
> > > > Hi experts,I want to know, when I execute ceph tell osd.x bench 
> > > > command, is replica 3 considered in the bench or not? I mean, for 
> > > > example in case of replica 3, when I executing tell bench command, 
> > > > replica 1 of bench data write to osd.x, replica 2 write to osd.y and 
> > > > replica 3 write to osd.z? If this is true, it means that I can not get 
> > > > benchmark of only one of my OSD in the cluster because the IOPS and 
> > > > throughput of 2 other for example slow OSDs will affect the result of 
> > > > tell bench command for my target OSD.Is that true?
> > > > 

[ceph-users] Re: Cannot create CephFS subvolume

2022-12-28 Thread Daniel Kovacs

We are on: 17.2.4

Ceph fs volume ls output:
[
    {
    "name": "k8s_ssd"
    },
    {
    "name": "inclust"
    },
    {
    "name": "inclust_ssd"
    }
]


I'd like to create a subvol in inclust_ssd volume. I can create 
subvolume with same name in inclust without any problems.



Best regards,

Daniel

On 2022. 12. 28. 4:42, Milind Changire wrote:

Also, please list the volumes available on your system:

$ ceph fs volume ls


On Wed, Dec 28, 2022 at 9:09 AM Milind Changire  wrote:


What ceph version are you using?

$ ceph versions


On Wed, Dec 28, 2022 at 3:17 AM Daniel Kovacs 
wrote:


Hello!

I'd like to create a CephFS subvol, with these command: ceph fs
subvolume create cephfs_ssd subvol_1
I got this error: Error EINVAL: invalid value specified for
ceph.dir.subvolume
If I use another cephfs volume, there were no error reported.

What did I wrong?

Best regards,

Daniel

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
Milind



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Urgent help! RGW Disappeared on Quincy

2022-12-28 Thread Deep Dish
Hi Pavin,

The following are additional developments..  There's one PG that's
stuck and unable to recover.   I've attached relevant ceph -s / health
detail and pg stat outputs below.

- There were some remaining lock files as suggested in /var/run/ceph/
pertaining to rgw.   I removed the service, deleted any stale lock files
and redeployed the RGWs.   All started with the common log entries across
all:

7ff5d9aaf5c0  0 deferred set uid:gid to 167:167 (ceph:ceph)

7ff5d9aaf5c0  0 ceph version 17.2.5
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process
radosgw, pid 2

7ff5d9aaf5c0  0 framework: beast

7ff5d9aaf5c0  0 framework conf key: port, val: 80

7ff5d9aaf5c0  1 radosgw_Main not setting numa affinity

7ff5d9aaf5c0  1 rgw_d3n: rgw_d3n_l1_local_datacache_enabled=0

7ff5d9aaf5c0  1 D3N datacache enabled: 0

No additional log entries are recorded since starting them post
re-deployment as per above.

The cluster settled, there is no recovery activity.  There is one pg that's
stuck and I have a hunch that it's impacting MDS and RGW processes as
stated in the thread.   PG is stuck as as active+remapped+backfilling:



  data:

volumes: 2/2 healthy

pools:   16 pools, 1504 pgs

objects: 24.49M objects, 79 TiB

usage:   119 TiB used, 390 TiB / 508 TiB avail

pgs: 65210/146755179 objects misplaced (0.044%)

 1503 active+clean

 1active+remapped+backfilling



  progress:

Global Recovery Event (6h)

  [===.] (remaining: 73s)

# ceph health detail

HEALTH_WARN 1 MDSs report slow metadata IOs; 1 pgs not deep-scrubbed in
time; 1 pgs not scrubbed in time

[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs

mds.fs01.ceph02mon02.wicrdz(mds.0): 5 slow metadata IOs are blocked >
30 secs, oldest blocked for 74436 secs

[WRN] PG_NOT_DEEP_SCRUBBED: 1 pgs not deep-scrubbed in time

pg 14.ff not deep-scrubbed since 2022-12-14T19:35:51.893008+

[WRN] PG_NOT_SCRUBBED: 1 pgs not scrubbed in time

pg 14.ff not scrubbed since 2022-12-17T06:33:40.577932+



>From the following pg query:

- "pgid": "14.ffs0" is stuck as peering (osd 5)
- "pgid": "14.ffs4" is stuck as unknown (osd 18)
- "pgid": "14.ffs5" is stuck as unknown (osd 24)
- "pgid": "14.ffs3" is stuck as unknown (osd 42)
- "pgid": "14.ffs2" is stick as unknown (osd 58)
- "pgid": "14.ffs1" is marked as active+clean (osd 36)

# ceph pg 14.ff query

{

"snap_trimq": "[]",

"snap_trimq_len": 0,

"state": "active+remapped+backfilling",

"epoch": 19594,

"up": [

5,

36,

58,

42,

18,

24

],

"acting": [

50,

36,

5,

26,

15,

46

],

"backfill_targets": [

"5(0)",

"18(4)",

"24(5)",

"42(3)",

"58(2)"

],

"acting_recovery_backfill": [

"5(0)",

"5(2)",

"15(4)",

"18(4)",

"24(5)",

"26(3)",

"36(1)",

"42(3)",

"46(5)",

"50(0)",

"58(2)"

],

"info": {

"pgid": "14.ffs0",

"last_update": "19550'35077",

"last_complete": "19550'35077",

"log_tail": "13761'32157",

"last_user_version": 35077,

"last_backfill": "MAX",

"purged_snaps": [],

"history": {

"epoch_created": 4537,

"epoch_pool_created": 2032,

"last_epoch_started": 16616,

"last_interval_started": 16615,

"last_epoch_clean": 14655,

"last_interval_clean": 14654,

"last_epoch_split": 4537,

"last_epoch_marked_full": 0,

"same_up_since": 16613,

"same_interval_since": 16615,

"same_primary_since": 16615,

"last_scrub": "3817'25569",

"last_scrub_stamp": "2022-12-17T06:33:40.577932+",

"last_deep_scrub": "3756'21592",

"last_deep_scrub_stamp": "2022-12-14T19:35:51.893008+",

"last_clean_scrub_stamp": "2022-12-17T06:33:40.577932+",

"prior_readable_until_ub": 0

},

"stats": {

"version": "19550'35077",

"reported_seq": 396919,

"reported_epoch": 19594,

"state": "active+remapped+backfilling",

"last_fresh": "2022-12-28T22:03:20.278478+",

"last_change": "2022-12-26T21:27:51.600940+",

"last_active": "2022-12-28T22:03:20.278478+",

"last_peered": "2022-12-28T22:03:20.278478+",

"last_clean": "2022-12-26T21:27:45.471954+",

"last_became_active": "2022-12-26T21:27:51.085966+",

"last_became_peered": "2022-12-26T21:27:51.085966+",

"last_unstale": "2022-12-28T22:03:20.278478+",

"last_undegraded": "2022-12-28T22:03:20.278478+",

"la

[ceph-users] radosgw not working after upgrade to Quincy

2022-12-28 Thread Andrei Mikhailovsky
Hello everyone, 

After the upgrade from Pacific to Quincy the radosgw service is no longer 
listening on network port, but the process is running. I get the following in 
the log: 

2022-12-29T02:07:35.641+ 7f5df868ccc0 0 ceph version 17.2.5 
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy 
(stable), process radosgw, pid 36072 
2022-12-29T02:07:35.641+ 7f5df868ccc0 0 framework: civetweb 
2022-12-29T02:07:35.641+ 7f5df868ccc0 0 framework conf key: port, val: 443s 
2022-12-29T02:07:35.641+ 7f5df868ccc0 0 framework conf key: 
ssl_certificate, val: /etc/ssl/private/s3.arhont. 
com-bundle.pem 
2022-12-29T02:07:35.641+ 7f5df868ccc0 1 radosgw_Main not setting numa 
affinity 
2022-12-29T02:07:35.645+ 7f5df868ccc0 1 rgw_d3n: 
rgw_d3n_l1_local_datacache_enabled=0 
2022-12-29T02:07:35.645+ 7f5df868ccc0 1 D3N datacache enabled: 0 
2022-12-29T02:07:38.917+ 7f5d15ffb700 -1 sync log trim: bool 
{anonymous}::sanity_check_endpoints(const DoutPre 
fixProvider*, rgw::sal::RadosStore*):688 WARNING: Cluster is is misconfigured! 
Zonegroup default (default) in Rea 
lm london-ldex ( 29474c50-f1c2-4155-ac3b-a42e9d413624) has no endpoints! 
2022-12-29T02:07:38.917+ 7f5d15ffb700 -1 sync log trim: bool 
{anonymous}::sanity_check_endpoints(const DoutPre 
fixProvider*, rgw::sal::RadosStore*):698 ERROR: Cluster is is misconfigured! 
Zone default (default) in Zonegroup 
default ( default) in Realm london-ldex ( 29474c50-f1c2-4155-ac3b-a42e9d413624) 
has no endpoints! Trimming is imp 
ossible. 
2022-12-29T02:07:38.917+ 7f5d15ffb700 -1 sync log trim: RGWCoroutine* 
create_meta_log_trim_cr(const DoutPrefixProvider*, rgw::sal::RadosStore*, 
RGWHTTPManager*, int, utime_t):718 ERROR: Cluster is is misconfigured! Refusing 
to trim. 
2022-12-29T02:07:38.917+ 7f5d15ffb700 -1 rgw rados thread: Bailing out of 
trim thread! 
2022-12-29T02:07:38.917+ 7f5d15ffb700 0 rgw rados thread: ERROR: 
processor->process() returned error r=-22 
2022-12-29T02:07:38.953+ 7f5df868ccc0 0 framework: beast 
2022-12-29T02:07:38.953+ 7f5df868ccc0 0 framework conf key: 
ssl_certificate, val: config://rgw/cert/$realm/$zone.crt 
2022-12-29T02:07:38.953+ 7f5df868ccc0 0 framework conf key: 
ssl_private_key, val: config://rgw/cert/$realm/$zone.key 
2022-12-29T02:07:38.953+ 7f5df868ccc0 0 WARNING: skipping unknown 
framework: civetweb 
2022-12-29T02:07:38.977+ 7f5df868ccc0 1 mgrc service_daemon_register 
rgw.1371662715 metadata {arch=x86_64,ceph_release=quincy,ceph_version=ceph 
version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy 
(stable),ceph_version_short=17.2.5,cpu=Intel(R) Xeon(R) CPU E5-2620 0 @ 
2.00GHz,distro=ubuntu,distro_description=Ubuntu 20.04.5 
LTS,distro_version=20.04,frontend_config#0=civetweb port=443s 
ssl_certificate=/etc/ssl/private/s3.arhont.com-bundle.pem,frontend_type#0=civetweb,hostname=arh-ibstorage1-ib,id=radosgw1.gateway,kernel_description=#62~20.04.1-Ubuntu
 SMP Tue Nov 22 21:24:20 UTC 
2022,kernel_version=5.15.0-56-generic,mem_swap_kb=24686688,mem_total_kb=98747048,num_handles=1,os=Linux,pid=36072,realm_id=29474c50-f1c2-4155-ac3b-a42e9d413624,realm_name=london-ldex,zone_id=default,zone_name=default,zonegroup_id=default,zonegroup_name=default}
 
2022-12-29T02:07:39.177+ 7f5d057fa700 0 lifecycle: RGWLC::process() failed 
to acquire lock on lc.29, sleep 5, try again 


I have been running radosgw service on 15.2.x cluster previously without any 
issues. Last week I have upgraded the cluster to 16.2.x followed by a further 
upgrade to 17.2. Here is what my configuration file looks like: 

[client.radosgw1.gateway] 
host = arh-ibstorage1-ib 
keyring = /etc/ceph/keyring.radosgw1.gateway 
log_file = /var/log/ceph/radosgw.log 
rgw_dns_name = s3.arhont.com 
rgw_num_rados_handles = 8 
rgw_thread_pool_size = 512 
rgw_cache_enabled = true 
rgw cache lru size = 10 
rgw enable ops log = false 
rgw enable usage log = false 
rgw_frontends = civetweb port=443s 
ssl_certificate=/etc/ssl/private/s3.arhont.com-bundle.pem 

Please let me know how to fix the problem? 

Many thanks 

Andrei 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw not working after upgrade to Quincy

2022-12-28 Thread Konstantin Shalygin
Hi,

Just try to read your logs:

> 2022-12-29T02:07:38.953+ 7f5df868ccc0 0 WARNING: skipping unknown 
> framework: civetweb 

You try to use the `civetweb`, it was absent in quincy release. You need to 
update your configs and use `beast` instead


k

> On 29 Dec 2022, at 09:20, Andrei Mikhailovsky  wrote:
> 
> Please let me know how to fix the problem? 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Does Replica Count Affect Tell Bench Result or Not?

2022-12-28 Thread Anthony D'Atri

>> Thanks. I am planning to change all of my disks. But do you know enterprise 
>> SSD Disk which is best in trade of between cost & iops performance?

In my prior response I meant to ask what your workload is like.  RBD? RGW? 
Write-heavy? Mostly reads?  This influences what drives make sense.

— aad


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph osd df tree information missing on one node

2022-12-28 Thread Ml Ml
Hello,

after reinstalling one node (ceph06) from Backup the OSDs on that node
do not show any Disk information with "ceph osd df tree":
 https://pastebin.com/raw/7zeAx6EC

Any hint how i could fix this?

Thanks,
Mario
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io