[ceph-users] one PG is stuck on single OSD and does not sync back

2021-02-08 Thread Boris Behrens
Good day together, I've got an issue after rebooting an osd node. It looks like there is one PG that does not sync back to the other UP osds. I've tried to restart the ceph processes for all three OSDs and when I stopped the one on OSD.14 the PG went down. Any ideas what I can do? # ceph pg ls d

[ceph-users] Re: one PG is stuck on single OSD and does not sync back

2021-02-08 Thread Boris Behrens
Hi Eugen, I've set it to 0 but the "degraded objects" count does not go down. Am Mo., 8. Feb. 2021 um 14:23 Uhr schrieb Eugen Block : > Hi, > > one option would be to decrease (or set to 0) the primary-affinity of > osd.14 and see if that brings the pg back. > > Regards, > Eugen > > -- Die Sel

[ceph-users] Re: one PG is stuck on single OSD and does not sync back

2021-02-08 Thread Boris Behrens
> appropriate? I've seen stuck PGs because of OSD weight imbalance. Is > the OSD in the correct subtree? > > > Zitat von Boris Behrens : > > > Hi Eugen, > > > > I've set it to 0 but the "degraded objects" count does not go down. > &

[ceph-users] Re: one PG is stuck on single OSD and does not sync back

2021-02-08 Thread Boris Behrens
I've outed osd.18 and osd.54 and let it sync for some time and now the problem is gone. *shrugs Thank you for the hints. Am Mo., 8. Feb. 2021 um 14:46 Uhr schrieb Boris Behrens : > Hi, > sure > > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF > -1 67

[ceph-users] after update to 14.2.16 osd daemons begin to crash

2021-02-17 Thread Boris Behrens
Hi, currently we experience osd daemon crashes and I can't pin the issue. I hope someone can help me with it. * We operate multiple cluster (440 SSD - 1PB, 36 SSD - 126TB, 40SSD 100TB, 84HDD - 680TB) * All clusters were updated around the same time (2021-02-03) * We restarted ALL ceph daemons (sy

[ceph-users] Re: after update to 14.2.16 osd daemons begin to crash

2021-02-17 Thread Boris Behrens
to switch back to bitmap or avl allocator. > > Thanks, > > Igor > > > On 2/17/2021 12:36 PM, Boris Behrens wrote: > > Hi, > > > > currently we experience osd daemon crashes and I can't pin the issue. I > > hope someone can help me with it. > > >

[ceph-users] ceph pool with a whitespace as name

2021-03-09 Thread Boris Behrens
Good morning ceph people, I have a pool that got a whitespace as name. And I want to know what creates the pool. I already renamed it, but something recreates the pool. Is there a way to find out what created the pool and what the content ist? When I checked it's content I get [root@s3db1 ~]# ra

[ceph-users] Re: ceph pool with a whitespace as name

2021-03-09 Thread Boris Behrens
b84a-459b-bce2-bccac338b3ef" } Am Mi., 10. März 2021 um 07:37 Uhr schrieb Boris Behrens : > Good morning ceph people, > > I have a pool that got a whitespace as name. And I want to know what > creates the pool. > I already renamed it, but something recreates the pool. > &g

[ceph-users] Re: ceph pool with a whitespace as name

2021-03-09 Thread Boris Behrens
Ok, i changed the value to "metadata_heap": "", but it is still used. Any ideas how to stop this? Am Mi., 10. März 2021 um 08:14 Uhr schrieb Boris Behrens : > Found it. > [root@s3db1 ~]# radosgw-admin zone get --rgw-zone=eu-central-1 > { > "id&qu

[ceph-users] Re: ceph pool with a whitespace as name

2021-03-10 Thread Boris Behrens
After doing radosgw-admin period update --commit it looks like it is gone now. Sorry for spamming the ML, but I am not denvercoder9 :) Am Mi., 10. März 2021 um 08:29 Uhr schrieb Boris Behrens : > Ok, > i changed the value to > "metadata_heap": "", > but it is

[ceph-users] buckets with negative num_objects

2021-03-10 Thread Boris Behrens
Hi, I am in the process of resharding large buckets and to find them I ran radosgw-admin bucket limit check | grep '"fill_status": "OVER' -B5 and I see that there are two buckets with negative num_objects "bucket": "ncprod", "tenant": "", "num_object

[ceph-users] how to tell balancer to balance

2021-03-11 Thread Boris Behrens
Hi, I know this topic seems to be handled a lot (as far as I can see), but I reached the end of my google_foo. * We have OSDs that are near full, but there are also OSDs that are only loaded with 50%. * We have 4,8,16 TB rotating disks in the cluster. * The disks that get packed are 4TB disks and

[ceph-users] Re: how to tell balancer to balance

2021-03-11 Thread Boris Behrens
the 8TB will only go to 50% (or 4 TB) - so in > effect wasting 4TB of the 8 TB disk > > our cluster & our pool > All our disks no matter what are 8 TB in size. > > > > > > >>> Boris Behrens 3/11/2021 5:53 AM >>> > Hi, > I know this

[ceph-users] should I increase the amount of PGs?

2021-03-13 Thread Boris Behrens
Hello people, I am still struggeling with the balancer (https://www.mail-archive.com/ceph-users@ceph.io/msg09124.html) Now I've read some more and might think that I do not have enough PGs. Currently I have 84OSDs and 1024PGs for the main pool (3008 total). I have the autoscaler enabled, but I doe

[ceph-users] Re: should I increase the amount of PGs?

2021-03-13 Thread Boris Behrens
And I recommend debug_mgr 4/5 so you can see some basic upmap balancer > logging. > > .. Dan > > > > > > > On Sat, Mar 13, 2021, 3:49 PM Boris Behrens wrote: > >> Hello people, >> >> I am still struggeling with the balancer >> (https://www.ma

[ceph-users] Re: should I increase the amount of PGs?

2021-03-13 Thread Boris Behrens
e than 200 PGs, you definitely > shouldn't increase the num PGs. > > But anyway with your mixed device sizes it might be challenging to make a > perfectly uniform distribution. Give it a try with 1 though, and let us > know how it goes. > > .. Dan > > > >

[ceph-users] Re: Safe to remove osd or not? Which statement is correct?

2021-03-14 Thread Boris Behrens
Hi, do you know why the OSDs are not starting? When I had the problem that a start does not work, I tried the 'ceph-volume lvm activate --all' on the host, which brought the OSDs back up. But I can't tell you if it is safe to remove the OSD. Cheers Boris Am So., 14. März 2021 um 02:38 Uhr schr

[ceph-users] Re: should I increase the amount of PGs?

2021-03-15 Thread Boris Behrens
t; Btw, you might need to fail to a new mgr... I'm not sure if the current > active will read that new config. > > .. dan > > > On Sat, Mar 13, 2021, 4:36 PM Boris Behrens wrote: > >> Hi, >> >> ok thanks. I just changed the value and rewighted everything ba

[ceph-users] Re: should I increase the amount of PGs?

2021-03-15 Thread Boris Behrens
27;, 'eu-central-1.rgw.control'] 2021-03-15 13:51:01.224 7f307d5fd700 4 mgr[balancer] prepared 0/10 changes Am Mo., 15. März 2021 um 14:15 Uhr schrieb Dan van der Ster < d...@vanderster.com>: > I suggest to just disable the autoscaler until your balancing is > understood. >

[ceph-users] Re: should I increase the amount of PGs?

2021-03-15 Thread Boris Behrens
; d...@vanderster.com>: > OK thanks. Indeed "prepared 0/10 changes" means it thinks things are > balanced. > Could you again share the full ceph osd df tree? > > On Mon, Mar 15, 2021 at 2:54 PM Boris Behrens wrote: > > > > Hi Dan, > > > > I've set the aut

[ceph-users] Re: should I increase the amount of PGs?

2021-03-16 Thread Boris Behrens
pect a trimodal for your cluster. > > 2. You can also use another script from that repo to see the PGs per > OSD normalized to crush weight: > ceph-scripts/tools/ceph-pg-histogram --normalize --pool=15 > >This might explain what is going wrong. > > Cheers, Dan >

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Boris Behrens
apped+backfilling 32 active+remapped+backfill_toofull io: client: 27 MiB/s rd, 69 MiB/s wr, 497 op/s rd, 153 op/s wr recovery: 1.5 GiB/s, 922 objects/s Am Di., 16. März 2021 um 09:34 Uhr schrieb Boris Behrens : > Hi Dan, > > my EC profile look very "defa

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Boris Behrens
ow if that's the > case and we can see about changing osd_max_backfills, some weights or > maybe using the upmap-remapped tool. > > -- Dan > > On Tue, Mar 23, 2021 at 6:07 PM Boris Behrens wrote: > > > > Ok, I should have listened to you :) > > > > In th

[ceph-users] Re: should I increase the amount of PGs?

2021-03-23 Thread Boris Behrens
umulating on the mons and osds -- > this itself will start to use a lot of space, and active+clean is the > only way to trim the old maps. > > -- dan > > On Tue, Mar 23, 2021 at 7:05 PM Boris Behrens wrote: > > > > So, > > doing nothing and wait for the ceph to recove

[ceph-users] add and start OSD without rebalancing

2021-03-24 Thread Boris Behrens
Hi people, I currently try to add ~30 OSDs to our cluster and wanted to use the gentle-rerweight script for that. I use ceph-colume lvm prepare --data /dev/sdX to create the osd and want to start it without weighting it in. systemctl start ceph-osd@OSD starts the OSD with full weight. Is this po

[ceph-users] Re: add and start OSD without rebalancing

2021-03-24 Thread Boris Behrens
Oh cool. Thanks :) How do I find the correct weight after it is added? For the current process I just check the other OSDs but this might be a question that someone will raise. I could imagine that I need to adjust the ceph-gentle-reweight's target weight to the correct one. Am Mi., 24. März 202

[ceph-users] Re: add and start OSD without rebalancing

2021-03-24 Thread Boris Behrens
ned crush_weight for %s' % osd) Exception: Undefined crush_weight for 43 I already tried only a single osd and leaving the -t option out. Am Mi., 24. März 2021 um 16:31 Uhr schrieb Janne Johansson < icepic...@gmail.com>: > Den ons 24 mars 2021 kl 14:55 skrev Boris Behrens : > >

[ceph-users] forceful remap PGs

2021-03-30 Thread Boris Behrens
Hi, I have a couple OSDs that currently get a lot of data, and are running towards 95% fillrate. I would like to forcefully remap some PGs (they are around 100GB) to more empty OSDs and drop them from the full OSDs. I know this would lead to degraded objects, but I am not sure how long the cluster

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Boris Behrens
I just move one PG away from the OSD, but the diskspace will not get freed. Do I need to do something to clean obsolete objects from the osd? Am Di., 30. März 2021 um 11:47 Uhr schrieb Boris Behrens : > Hi, > I have a couple OSDs that currently get a lot of data, and are running > t

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
27739 0.95000 7.3 TiB 6.7 TiB 6.7 TiB 322 MiB 16 GiB 548 GiB 92.64 1.18 121 up osd.66 46 hdd 7.27739 1.0 7.3 TiB 6.8 TiB 6.7 TiB 316 MiB 16 GiB 536 GiB 92.81 1.18 119 up osd.46 Am Di., 23. März 2021 um 19:59 Uhr schrieb Boris Behrens : > Good point. Than

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
y to pause them with > upmap or any other trick. > > -- dan > > On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens wrote: > > > > One week later the ceph is still balancing. > > What worries me like hell is the %USE on a lot of those OSDs. Does ceph > > resolv this on

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Boris Behrens
> On 3/30/21 12:55 PM, Boris Behrens wrote: > > I just move one PG away from the OSD, but the diskspace will not get > freed. > > How did you move? I would suggest you use upmap: > > ceph osd pg-upmap-items > Invalid command: missing required parameter pgid() > osd pg

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
is current splitting should take. This will help: > > ceph status > ceph osd pool ls detail > > -- dan > > On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens wrote: > > > > I would think due to splitting, because the balancer doesn't refuses > it'

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens
90% default limit. > > -- dan > > On Tue, Mar 30, 2021 at 3:18 PM Boris Behrens wrote: > > > > The output from ceph osd pool ls detail tell me nothing, except that the > pgp_num is not where it should be. Can you help me to read the output? How > do I estimate

[ceph-users] s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
Hi, maybe it is just a problem in my understanding, but it looks like our s3 requires twice the space it should use. I ran "radosgw-admin bucket stats", and added all "size_kb_actual" values up and divided to TB (/1024/1024/1024). The resulting space is 135,1636733 TB. When I tripple it because o

[ceph-users] Re: s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
e actually bucket object size but on OSD level the > bluestore_min_alloc_size default 64KB and SSD are 16KB > > > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/administration_guide/osd-bluestore > > -AmitG > > On Thu, Apr 15, 2021 at 7:29 PM Boris

[ceph-users] Re: s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
ues ,bluestore_min_alloc_size_hdd & > bluestore_min_alloc_size_sdd, If you are using hdd disk then > bluestore_min_alloc_size_hdd are applicable. > > On Thu, Apr 15, 2021 at 8:06 PM Boris Behrens wrote: > >> So, I need to live with it? A value of zero leads to use the

[ceph-users] Re: s3 requires twice the space it should use

2021-04-15 Thread Boris Behrens
胡 玮文 : > Hi Boris, > > Could you check something like > > ceph daemon osd.23 perf dump | grep numpg > > to see if there are some stray or removing PG? > > Weiwen Hu > > > 在 2021年4月15日,22:53,Boris Behrens 写道: > > > > 

[ceph-users] Re: s3 requires twice the space it should use

2021-04-16 Thread Boris Behrens
Could this also be failed multipart uploads? Am Do., 15. Apr. 2021 um 18:23 Uhr schrieb Boris Behrens : > Cheers, > > [root@s3db1 ~]# ceph daemon osd.23 perf dump | grep numpg > "numpg": 187, > "numpg_primary": 64, > "nump

[ceph-users] cleanup multipart in radosgw

2021-04-19 Thread Boris Behrens
Hi, is there a way to remove multipart uploads that are older than X days? It doesn't need to be build into ceph or is automated to the end. Just something I don't need to build on my own. I currently try to debug a problem where ceph reports a lot more used space than it actually requires ( http

[ceph-users] Re: [Suspicious newsletter] cleanup multipart in radosgw

2021-04-19 Thread Boris Behrens
ngineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > --- > > -Original Message- > From: Boris Behrens > Sent: Monday, April 19, 2021 4:10 PM > To: ceph-users@cep

[ceph-users] rbd snap create now working and just hangs forever

2021-04-22 Thread Boris Behrens
Hi, I have a customer VM that is running fine, but I can not make snapshots anymore. rbd snap create rbd/IMAGE@test-bb-1 just hangs forever. When I checked the status with rbd status rbd/IMAGE it shows one watcher, the cpu node where the VM is running. What can I do to investigate further, witho

[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-22 Thread Boris Behrens
Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov : > On Thu, Apr 22, 2021 at 4:20 PM Boris Behrens wrote: > > > > Hi, > > > > I have a customer VM that is running fine, but I can not make snapshots > > anymore. > > rbd snap create rbd/IMAGE@test-bb

[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-22 Thread Boris Behrens
Am Do., 22. Apr. 2021 um 17:27 Uhr schrieb Ilya Dryomov : > On Thu, Apr 22, 2021 at 5:08 PM Boris Behrens wrote: > > > > > > > > Am Do., 22. Apr. 2021 um 16:43 Uhr schrieb Ilya Dryomov < > idryo...@gmail.com>: > >> > >> On Thu, Apr 22,

[ceph-users] Re: s3 requires twice the space it should use

2021-04-23 Thread Boris Behrens
ge": { "search_stage": "comparing", "shard": 0, "marker": "" } } } }, Am Fr., 16. Apr. 2021 um 10:57 Uhr schrieb Boris Behrens : > Could this also be failed multipart

[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-23 Thread Boris Behrens
Am Fr., 23. Apr. 2021 um 11:52 Uhr schrieb Ilya Dryomov : > > This snippet confirms my suspicion. Unfortunately without a verbose > log from that VM from three days ago (i.e. when it got into this state) > it's hard to tell what exactly went wrong. > > The problem is that the VM doesn't consider

[ceph-users] Re: rbd snap create now working and just hangs forever

2021-04-23 Thread Boris Behrens
Am Fr., 23. Apr. 2021 um 12:16 Uhr schrieb Ilya Dryomov : > On Fri, Apr 23, 2021 at 12:03 PM Boris Behrens wrote: > > > > > > > > Am Fr., 23. Apr. 2021 um 11:52 Uhr schrieb Ilya Dryomov < > idryo...@gmail.com>: > >> > >> > >> This

[ceph-users] how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)

2021-04-26 Thread Boris Behrens
HI, we still have the problem that our rgw eats more diskspace than it should. Summing up the "size_kb_actual" of all buckets show only half of the used diskspace. There are 312TiB stored acording to "ceph df" but we only need around 158TB. I've already wrote to this ML with the problem, but the

[ceph-users] Re: how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)

2021-04-26 Thread Boris Behrens
hich Ceph release were your OSDs built? BlueStore? Filestore? > What is your RGW object population like? Lots of small objects? Mostly > large objects? Average / median object size? > > > On Apr 26, 2021, at 9:32 PM, Boris Behrens wrote: > > > > HI, > > > > we

[ceph-users] Re: how to handle rgw leaked data (aka data that is not available via buckets but eats diskspace)

2021-04-27 Thread Boris Behrens
Uhr schrieb Boris Behrens : > Hi Anthony, > > yes we are using replication, the lost space is calculated before it's > replicated. > RAW STORAGE: > CLASS SIZEAVAIL USEDRAW USED %RAW USED > hdd 1.1 PiB 191 TiB 968 TiB

[ceph-users] global multipart lc policy in radosgw

2021-05-02 Thread Boris Behrens
Hi, I have a lot of multipart uploads that look like they never finished. Some of them date back to 2019. Is there a way to clean them up when they didn't finish in 28 days? I know I can implement a LC policy per bucket, but how do I implement it cluster wide? Cheers Boris -- Die Selbsthilfeg

[ceph-users] radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-05 Thread Boris Behrens
Hi, since a couple of days we experience a strange slowness on some radosgw-admin operations. What is the best way to debug this? For example creating a user takes over 20s. [root@s3db1 ~]# time radosgw-admin user create --uid test-bb-user --display-name=test-bb-user 2021-05-05 14:08:14.297 7f6942

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-10 Thread Boris Behrens
Hi guys, does someone got any idea? Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens : > Hi, > since a couple of days we experience a strange slowness on some > radosgw-admin operations. > What is the best way to debug this? > > For example creating a user takes over

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-10 Thread Boris Behrens
all nodes are successfully ping. > > > -AmitG > > > On Tue, 11 May 2021 at 12:12 AM, Boris Behrens wrote: > >> Hi guys, >> >> does someone got any idea? >> >> Am Mi., 5. Mai 2021 um 16:16 Uhr schrieb Boris Behrens : >> >> > Hi, >&g

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
and the problem was gone. > I have no good way to debug the problem since it never occured again after > we restarted the OSDs. > > Kind regards, > Thomas > > > Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens : > >Hi Amit, > > > >I just pinged the mons fr

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
>> >> Kind regards, >> Thomas >> >> >> Am 11. Mai 2021 08:47:06 MESZ schrieb Boris Behrens : >> >Hi Amit, >> > >> >I just pinged the mons from every system and they are all available. >> > >> >Am Mo., 10. Mai 2021 um

[ceph-users] "radosgw-admin bucket radoslist" loops when a multipart upload is happening

2021-05-11 Thread Boris Behrens
Hi together, I still search for orphan objects and came across a strange bug: There is a huge multipart upload happening (around 4TB), and listing the rados objects in the bucket loops over the multipart upload. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groü

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
I tried to debug it with --debug-ms=1. Maybe someone could help me to wrap my head around it? https://pastebin.com/LD9qrm3x Am Di., 11. Mai 2021 um 11:17 Uhr schrieb Boris Behrens : > Good call. I just restarted the whole cluster, but the problem still > persists. > I don't

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
I actually WAS the amount of watchers... narf.. This is so embarissing.. Thanks a lot for all your input. Am Di., 11. Mai 2021 um 13:54 Uhr schrieb Boris Behrens : > I tried to debug it with --debug-ms=1. > Maybe someone could help me to wrap my head around it? > https://pastebin.com

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
Hi, sorry for replying to this old thread: I tried to add a block.db to an OSD but now the OSD can not start with the error: Mai 17 09:50:38 s3db10.fra2.gridscale.it ceph-osd[26038]: -7> 2021-05-17 09:50:38.362 7fc48ec94a80 -1 rocksdb: Corruption: CURRENT file does not end with newline Mai 17 09:5

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
t; > Igor > > On 5/17/2021 1:09 PM, Boris Behrens wrote: > > Hi, > > sorry for replying to this old thread: > > > > I tried to add a block.db to an OSD but now the OSD can not start with > the > > error: > > Mai 17 09:50:38 s3db10.fra2.gridscale.it ce

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
lid BlueFS directory > structure - multiple .sst files, CURRENT and IDENTITY files etc? > > If so then please check and share the content of /db/CURRENT > file. > > > Thanks, > > Igor > > On 5/17/2021 1:32 PM, Boris Behrens wrote: > > Hi Igor, > > I posted it on

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
; > > Am Mo., 17. Mai 2021 um 13:45 Uhr schrieb Igor Fedotov >: > > > >> You might want to check file structure at new DB using bluestore-tools's > >> bluefs-export command: > >> > >> ceph-bluestore-tool --path --command bluefs-export --out

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
The FSCK looks good: [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-68 fsck fsck success Am Mo., 17. Mai 2021 um 14:39 Uhr schrieb Boris Behrens : > Here is the new output. I kept both for now. > > [root@s3db10 export-bluefs2]# ls * > db: > 018

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
See my last mail :) Am Mo., 17. Mai 2021 um 14:52 Uhr schrieb Igor Fedotov : > Would you try fsck without standalone DB? > > On 5/17/2021 3:39 PM, Boris Behrens wrote: > > Here is the new output. I kept both for now. > > > > [root@s3db10 export-bluefs2]# ls * > &g

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-17 Thread Boris Behrens
on. > > > Thanks, > > Igor > > On 5/17/2021 3:47 PM, Boris Behrens wrote: > > The FSCK looks good: > > > > [root@s3db10 export-bluefs2]# ceph-bluestore-tool --path > > /var/lib/ceph/osd/ceph-68 fsck > > fsck success > > > > Am Mo., 17

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-18 Thread Boris Behrens
One more question: How do I get rid of the bluestore spillover message? osd.68 spilled over 64 KiB metadata from 'db' device (13 GiB used of 50 GiB) to slow device I tried an offline compactation, which did not help. Am Mo., 17. Mai 2021 um 15:56 Uhr schrieb Boris Behrens : &

[ceph-users] Re: Process for adding a separate block.db to an osd

2021-05-19 Thread Boris Behrens
your support Igor <3 Am Di., 18. Mai 2021 um 09:54 Uhr schrieb Boris Behrens : > One more question: > How do I get rid of the bluestore spillover message? > osd.68 spilled over 64 KiB metadata from 'db' device (13 GiB used of > 50 GiB) to slow device > >

[ceph-users] Re: "radosgw-admin bucket radoslist" loops when a multipart upload is happening

2021-05-20 Thread Boris Behrens
I try to bump it once more, because it makes finding orphan objects nearly impossible. Am Di., 11. Mai 2021 um 13:03 Uhr schrieb Boris Behrens : > Hi together, > > I still search for orphan objects and came across a strange bug: > There is a huge multipart upload happening (arou

[ceph-users] Re: "radosgw-admin bucket radoslist" loops when a multipart upload is happening

2021-05-20 Thread Boris Behrens
Reading through the bugtracker: https://tracker.ceph.com/issues/50293 Thanks for your patience. Am Do., 20. Mai 2021 um 15:10 Uhr schrieb Boris Behrens : > I try to bump it once more, because it makes finding orphan objects nearly > impossible. > > Am Di., 11. Mai 2021 um 13:03

[ceph-users] question regarding markers in radosgw

2021-05-21 Thread Boris Behrens
Hello everybody, It seems that I have a metric ton of orphan objects in my s3 cluster. They look like this: $ rados -p eu-central-1.rgw.buckets.data stat ff7a8b0c-07e6-463a-861b-78f0adeba8ad.811806.9_1063978/features/2018-02-23.json eu-central-1.rgw.buckets.data/ff7a8b0c-07e6-463a-861b-78f0adeba8a

[ceph-users] summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Hi, I am still searching for a reason why these two values differ so much. I am currently deleting a giant amount of orphan objects (43mio, most of them under 64kb), but the difference get larger instead of smaller. This was the state two days ago: > > [root@s3db1 ~]# radosgw-admin bucket stats |

[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Am Di., 25. Mai 2021 um 09:39 Uhr schrieb Konstantin Shalygin : > > Hi, > > On 25 May 2021, at 10:23, Boris Behrens wrote: > > I am still searching for a reason why these two values differ so much. > > I am currently deleting a giant amount of orphan objects (43mio, m

[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens : > > Hi, > I am still searching for a reason why these two values differ so much. > > I am currently deleting a giant amount of orphan objects (43mio, most > of them under 64kb), but the difference get larger instead of sma

[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
The more files I delete, the more space is used. How can this be? Am Di., 25. Mai 2021 um 14:41 Uhr schrieb Boris Behrens : > > Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens : > > > > Hi, > > I am still searching for a reason why these two values diff

[ceph-users] best practice balance mode in HAproxy in front of RGW?

2021-05-26 Thread Boris Behrens
Hello togehter, is there any best practive on the balance mode when I have a HAproxy in front of my rgw_frontend? currently we use "balance leastconn". Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph

[ceph-users] Re: best practice balance mode in HAproxy in front of RGW?

2021-05-26 Thread Boris Behrens
Janne Johansson : > > I guess normal round robin should work out fine too, regardless of if > there are few clients making several separate connections or many > clients making a few. > > Den ons 26 maj 2021 kl 12:32 skrev Boris Behrens : > > > > Hello togehter, > > >

[ceph-users] Re: best practice balance mode in HAproxy in front of RGW?

2021-05-27 Thread Boris Behrens
Am Do., 27. Mai 2021 um 07:47 Uhr schrieb Janne Johansson : > > Den ons 26 maj 2021 kl 16:33 skrev Boris Behrens : > > > > Hi Janne, > > do you know if there can be data duplication which leads to orphan objects? > > > > I am currently huntin strange err

[ceph-users] Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
Hi everybody, a customer mentioned that he got problems in accessing hist rgw data. I checked the bucket index and the file should be available. Then I pulled a list with radosgw-admin radoslist --bucket BUCKET and it seems that the file is gone. beside the "yaiks, is there a way the file might be

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
Is there way to remove a file from a bucket without removing it from the bucketindex? Am Fr., 16. Juli 2021 um 17:36 Uhr schrieb Boris Behrens : > > Hi everybody, > a customer mentioned that he got problems in accessing hist rgw data. > I checked the bucket index and the file should

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
id you reshard lately? > Did you test using client programs like s3cmd & rclone...? > > I didn't have time to work on that this week, but I have to find a > solution too. > Meanwhile, I run with a lower shard number and my customer can access > all his data. > Cheers! &

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-16 Thread Boris Behrens
stats that can confirm that the data has been > deleted and/or are still there. (at the pool level maybe?) > Hopping for you that it's just a data/index/shard mismatch... > > > On 7/16/21 12:44 PM, Boris Behrens wrote: > > [Externe UL*] > > > > Hi Jean-Sebas

[ceph-users] difference between rados ls and radosgw-admin bucket radoslist

2021-07-16 Thread Boris Behrens
Hi, is there a difference between those two? I always thought that radosgw-admin radoslist only shows the objects that are somehow associated with a bucket. But if the bucketindex is broken, would this reflect in the output? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend

[ceph-users] Re: difference between rados ls and radosgw-admin bucket radoslist

2021-07-17 Thread Boris Behrens
it rebuild the "bi" from the pool level (rados ls), so I'm not sure the > bucketindex is "that" much important, knowing that you can rebuilt it > from the pool. (?) > > > > > On 7/16/21 1:47 PM, Boris Behrens wrote: > > [Externe UL*] >

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-17 Thread Boris Behrens
Am Fr., 16. Juli 2021 um 19:35 Uhr schrieb Boris Behrens : > > exactly. > rados rm wouldn't remove it from the "radosgw-admin bucket radoslist" > list, correct? > > our usage statistics are not really usable because it fluctuates in a > 200tb range. > >

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-17 Thread Boris Behrens
Is it possible to not complete a file upload so the actual file is not there, but it is listed in the bucket index? I really need help with this issue. Am Fr., 16. Juli 2021 um 19:35 Uhr schrieb Boris Behrens : > > exactly. > rados rm wouldn't remove it from the "radosgw-adm

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-17 Thread Boris Behrens
Hi k, all systems run 14.2.21 Cheers Boris Am Sa., 17. Juli 2021 um 22:12 Uhr schrieb Konstantin Shalygin : > > Boris, what is your Ceph version? > > > k > > On 17 Jul 2021, at 11:04, Boris Behrens wrote: > > I really need help with this issue. > > -- Die

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
-78f0adeba8ad.83821626.6927__shadow_.yscyiu0DpWRh_Agsnii3635ZNnrO16x_5 What are those files? o0 Am Sa., 17. Juli 2021 um 22:54 Uhr schrieb Boris Behrens : > > Hi k, > > all systems run 14.2.21 > > Cheers > Boris > > Am Sa., 17. Juli 2021 um 22:12 Uhr schrieb Konstantin Shalyg

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
t seem to have a filename (_shadow_.Sxj4BEhZS6PZg1HhsvSeqJM4Y0wRCto_4) It doesn't seem to be a careless "rados -p POOL rm OBJECT" because then it should be still in the "radosgw-admin bucket radoslist --bucket BUCKET" output. (just tested that on a testbucket). Am Fr., 16. Jul

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
lopers can help diagnose the problem. > > Cheers, Dan > > > On Fri, Jul 16, 2021 at 6:45 PM Boris Behrens wrote: > > > > Hi Jean-Sebastien, > > > > I have the exact opposite. Files can be listed (the are in the bucket > > index), but are no

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-19 Thread Boris Behrens
w index shard much larger than others - ceph-users - > lists.ceph.io" > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/MO7IHRGJ7TGPKT3GXCKMFLR674G3YGUX/ > > On Mon, 19 Jul 2021, 18:00 Boris Behrens, wrote: >> >> Hi Dan, >> how do I find out if

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-21 Thread Boris Behrens
Good morning everybody, we've dug further into it but still don't know how this could happen. What we ruled out for now: * Orphan objects cleanup process. ** There is only one bucket with missing data (I checked all other buckets yesterday) ** The "keep this files" list is generated by radosgw-adm

[ceph-users] Re: Files listed in radosgw BI but is not available in ceph

2021-07-23 Thread Boris Behrens
>> > > https://tracker.ceph.com/issues/47866?next_issue_id=48255#note-59 >> > > https://www.mail-archive.com/ceph-users@ceph.io/msg05312.html >> > > >> > > Basically, a read request on a s3/swift object that took a very long >> > > t

[ceph-users] Deleting large objects via s3 API leads to orphan objects

2021-07-27 Thread Boris Behrens
Hello my dear ceph community, I am now dealing with a lot of orphan objects and today I got the time to dig into it. What I basically found is that large objects get removed from radosgw, but not from rados. This leads to a huge amount of orphan objects. I"ve found this RH bug from last year (htt

[ceph-users] understanding multisite radosgw syncing

2021-07-27 Thread Boris Behrens
Hi, I wanted to set up a multisite radosgw environment where only bucketnames and userinfo should get synced. Basically I don't want that user data is synced but buckets and userids are still uniqe inside the zonegroup. For this I've gone though this howto ( https://docs.ceph.com/en/latest/rados

[ceph-users] create a Multi-zone-group sync setup

2021-07-30 Thread Boris Behrens
Hi people, I try to create a Multi-zone-group setup (like it is described here: https://docs.ceph.com/en/latest/radosgw/multisite/) But I simply fail. I just created a testcluster to mess with it, and no matter how I try to. Is there a howto avaialable? I don't want to get a multi-zone setup,

[ceph-users] Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-12 Thread Boris Behrens
Hi everybody, we just stumbled over a problem where the rbd image does not shrink, when files are removed. This only happenes when the rbd image is partitioned. * We tested it with centos8/ubuntu20.04 with ext4 and a gpt partition table (/boot and /) * the kvm device is virtio-scsi-pci with krbd

[ceph-users] Re: Discard / Trim does not shrink rbd image size when disk is partitioned

2021-08-13 Thread Boris Behrens
create and attach an empty block device, and they will certainly not check if the partitions are aligned correctly. Cheers Boris Am Fr., 13. Aug. 2021 um 08:44 Uhr schrieb Janne Johansson < icepic...@gmail.com>: > Den tors 12 aug. 2021 kl 17:04 skrev Boris Behrens : > > Hi ev

[ceph-users] Re: create a Multi-zone-group sync setup

2021-08-17 Thread Boris Behrens
an empty response (because there are no buckets to list). I get this against both radosgw locations. I have an nginx in between the internet and radosgw that will just proxy pass every address and sets host and x-forwarded-for header. Am Fr., 30. Juli 2021 um 16:46 Uhr schrieb Boris Behrens

[ceph-users] Re: [Suspicious newsletter] Re: create a Multi-zone-group sync setup

2021-08-17 Thread Boris Behrens
So > 1 realm, multiple dc BUT no sync? > > Istvan Szabo > Senior Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > ----------- > > -Original M

  1   2   3   >