[ceph-users] Inconsistent Space Usage reporting
Hi Friends, We have some inconsistent storage space usage reporting. We used only 46TB with single copy but the space used on the pool is close to 128TB. Any idea where's the extra space is utilized and how to reclaim it? Ceph Version : 12.2.11 with XFS OSDs. We are planning to upgrade soon. # ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 363TiB 131TiB 231TiB 63.83 43.80M POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED%USED MAX AVAIL OBJECTS DIRTY READWRITE RAW USED fcp 15 N/A N/A 23.6TiB 42.69 31.7TiB 3053801 3.05M 6.10GiB 12.6GiB 47.3TiB nfs 16 N/A N/A 128TiB 66.91 63.4TiB 33916181 33.92M 3.93GiB 4.73GiB 128TiB # df -h Filesystem Size Used Avail Use% Mounted on /dev/nbd0 200T 46T 155T 23% /vol/dir_research #ceph osd pool get nfs all size: 1 min_size: 1 crash_replay_interval: 0 pg_num: 128 pgp_num: 128 crush_rule: replicated_ruleset hashpspool: true nodelete: false nopgchange: false nosizechange: false write_fadvise_dontneed: false noscrub: false nodeep-scrub: false use_gmt_hitset: 1 auid: 0 fast_read: 0 Appreciate your help. Thanks, -Vikas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Inconsistent Space Usage reporting
Any help or direction in this below case is highly appreciated. Thanks, -Vikas -Original Message- From: Vikas Rana Sent: Monday, November 2, 2020 12:53 PM To: ceph-users@ceph.io Subject: [ceph-users] Inconsistent Space Usage reporting Hi Friends, We have some inconsistent storage space usage reporting. We used only 46TB with single copy but the space used on the pool is close to 128TB. Any idea where's the extra space is utilized and how to reclaim it? Ceph Version : 12.2.11 with XFS OSDs. We are planning to upgrade soon. # ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 363TiB 131TiB 231TiB 63.83 43.80M POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED%USED MAX AVAIL OBJECTS DIRTY READWRITE RAW USED fcp 15 N/A N/A 23.6TiB 42.69 31.7TiB 3053801 3.05M 6.10GiB 12.6GiB 47.3TiB nfs 16 N/A N/A 128TiB 66.91 63.4TiB 33916181 33.92M 3.93GiB 4.73GiB 128TiB # df -h Filesystem Size Used Avail Use% Mounted on /dev/nbd0 200T 46T 155T 23% /vol/dir_research #ceph osd pool get nfs all size: 1 min_size: 1 crash_replay_interval: 0 pg_num: 128 pgp_num: 128 crush_rule: replicated_ruleset hashpspool: true nodelete: false nopgchange: false nosizechange: false write_fadvise_dontneed: false noscrub: false nodeep-scrub: false use_gmt_hitset: 1 auid: 0 fast_read: 0 Appreciate your help. Thanks, -Vikas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Inconsistent Space Usage reporting
Thanks. Let me try it and I'll report back. -Original Message- From: Adam Tygart Sent: Tuesday, November 3, 2020 12:42 PM To: Vikas Rana Cc: ceph-users Subject: Re: [ceph-users] Re: Inconsistent Space Usage reporting I'm not sure exactly what you're doing with your volumes. It looks like fcp might be size 3. nfs is size 1, possibly with an 200TB rbd volume inside nbd mounted into another box. If so, it is likely you can reclaim space from deleted files with fstrim, if your filesystem supports it. -- Adam On Tue, Nov 3, 2020 at 11:00 AM Vikas Rana wrote: > > Any help or direction in this below case is highly appreciated. > > Thanks, > -Vikas > > -Original Message- > From: Vikas Rana > Sent: Monday, November 2, 2020 12:53 PM > To: ceph-users@ceph.io > Subject: [ceph-users] Inconsistent Space Usage reporting > > Hi Friends, > > > > We have some inconsistent storage space usage reporting. We used only > 46TB with single copy but the space used on the pool is close to 128TB. > > > > Any idea where's the extra space is utilized and how to reclaim it? > > > > Ceph Version : 12.2.11 with XFS OSDs. We are planning to upgrade soon. > > > > # ceph df detail > > GLOBAL: > > SIZE AVAIL RAW USED %RAW USED OBJECTS > > 363TiB 131TiB 231TiB 63.83 43.80M > > POOLS: > > NAMEID QUOTA OBJECTS QUOTA BYTES USED%USED > MAX AVAIL OBJECTS DIRTY READWRITE RAW USED > > fcp 15 N/A N/A 23.6TiB 42.69 > 31.7TiB 3053801 3.05M 6.10GiB 12.6GiB 47.3TiB > > nfs 16 N/A N/A 128TiB 66.91 > 63.4TiB 33916181 33.92M 3.93GiB 4.73GiB 128TiB > > > > > > > > > > # df -h > > Filesystem Size Used Avail Use% Mounted on > > /dev/nbd0 200T 46T 155T 23% /vol/dir_research > > > > > > #ceph osd pool get nfs all > > size: 1 > > min_size: 1 > > crash_replay_interval: 0 > > pg_num: 128 > > pgp_num: 128 > > crush_rule: replicated_ruleset > > hashpspool: true > > nodelete: false > > nopgchange: false > > nosizechange: false > > write_fadvise_dontneed: false > > noscrub: false > > nodeep-scrub: false > > use_gmt_hitset: 1 > > auid: 0 > > fast_read: 0 > > > > > > Appreciate your help. > > > > Thanks, > > -Vikas > > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Slow Replication on Campus
Hi Friends, We have 2 Ceph clusters on campus and we setup the second cluster as the DR solution. The images on the DR side are always behind the master. Ceph Version : 12.2.11 VMWARE_LUN0: global_id: 23460954-6986-4961-9579-0f2a1e58e2b2 state: up+replaying description: replaying, master_position=[object_number=2632711, tag_tid=24, entry_tid=1967382595], mirror_position=[object_number=1452837, tag_tid=24, entry_tid=456440697], entries_behind_master=1510941898 last_update: 2020-11-30 14:13:38 VMWARE_LUN1: global_id: cb579579-13b0-4522-b65f-c64ec44cbfaf state: up+replaying description: replaying, master_position=[object_number=1883943, tag_tid=28, entry_tid=1028822927], mirror_position=[object_number=1359161, tag_tid=28, entry_tid=358296085], entries_behind_master=670526842 last_update: 2020-11-30 14:13:33 Any suggestion on tuning or any parameters we can set on RBD-mirror to speed up the replication. Both cluster have very little activity. Appreciate your help. Thanks, -Vikas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Data Missing with RBD-Mirror
Hi Friends, We have a very weird issue with rbd-mirror replication. As per the command output, we are in sync but the OSD usage on DR side doesn't match the Prod Side. On Prod, we are using close to 52TB but on DR side we are only 22TB. We took a snap on Prod and mounted the snap on DR side and compared the data and we found lot of missing data. Please see the output below. Please help us resolve this issue or point us in right direction. Thanks, -Vikas DR# rbd --cluster cephdr mirror pool status cifs --verbose health: OK images: 1 total 1 replaying research_data: global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a state: up+replaying description: replaying, master_position=[object_number=390133, tag_tid=4, entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4, entry_tid=447832541], entries_behind_master=0 last_update: 2021-01-29 15:10:13 DR# ceph osd pool ls detail pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0 application rbd removed_snaps [1~5] PROD# ceph df detail POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED%USED MAX AVAIL OBJECTS DIRTY READWRITE RAW USED cifs17 N/A N/A 26.0TiB 30.10 60.4TiB 6860550 6.86M 873MiB 509MiB 52.1TiB DR# ceph df detail POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED%USED MAX AVAIL OBJECTS DIRTY READWRITE RAW USED cifs5 N/A N/A 11.4TiB 15.78 60.9TiB 3043260 3.04M 2.65MiB 431MiB 22.8TiB PROD#:/vol/research_data# du -sh * 11T Flab1 346GKLab 1.5TMore 4.4TReLabs 4.0TWLab DR#:/vol/research_data# du -sh * 2.6TFlab1 14G KLab 52K More 8.0KRLabs 202MWLab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Data Missing with RBD-Mirror
Friends, Any help or suggestion here for missing data? Thanks, -Vikas From: Vikas Rana Sent: Tuesday, February 16, 2021 12:20 PM To: 'ceph-users@ceph.io' Subject: Data Missing with RBD-Mirror Hi Friends, We have a very weird issue with rbd-mirror replication. As per the command output, we are in sync but the OSD usage on DR side doesn't match the Prod Side. On Prod, we are using close to 52TB but on DR side we are only 22TB. We took a snap on Prod and mounted the snap on DR side and compared the data and we found lot of missing data. Please see the output below. Please help us resolve this issue or point us in right direction. Thanks, -Vikas DR# rbd --cluster cephdr mirror pool status cifs --verbose health: OK images: 1 total 1 replaying research_data: global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a state: up+replaying description: replaying, master_position=[object_number=390133, tag_tid=4, entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4, entry_tid=447832541], entries_behind_master=0 last_update: 2021-01-29 15:10:13 DR# ceph osd pool ls detail pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0 application rbd removed_snaps [1~5] PROD# ceph df detail POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED%USED MAX AVAIL OBJECTS DIRTY READWRITE RAW USED cifs17 N/A N/A 26.0TiB 30.10 60.4TiB 6860550 6.86M 873MiB 509MiB 52.1TiB DR# ceph df detail POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED%USED MAX AVAIL OBJECTS DIRTY READWRITE RAW USED cifs5 N/A N/A 11.4TiB 15.78 60.9TiB 3043260 3.04M 2.65MiB 431MiB 22.8TiB PROD#:/vol/research_data# du -sh * 11T Flab1 346GKLab 1.5TMore 4.4TReLabs 4.0TWLab DR#:/vol/research_data# du -sh * 2.6TFlab1 14G KLab 52K More 8.0KRLabs 202MWLab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Data Missing with RBD-Mirror
Hello Mykola and Eugen, There was no interruption and we are in a campus with 10G backbone. We are on 12.2.10 I believe. We wanted to check the data on DR side and then we created a snapshot on primary which was available on DR side very quickly. It kind of gave me feeling that rbd-mirror is not stuck. I will run those commands and also restart the rbd-mirror and will report back. Thanks, -Vikas -Original Message- From: Mykola Golub Sent: Thursday, February 18, 2021 2:51 PM To: Vikas Rana ; Eugen Block Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: Data Missing with RBD-Mirror On Thu, Feb 18, 2021 at 03:28:11PM +, Eugen Block wrote: > Hi, > > was there an interruption between those sites? > > > last_update: 2021-01-29 15:10:13 > > If there was an interruption you'll probably need to resync those images. If your results shown below are not from that past then yes, it looks like the rbd-mirror (at least the image replayer) got stuck for some reason long time ago. Then I can't see though how could you mount a newly created snap, because it would not be replayed. Probably you had a snapshot with such name previously, it was replayed, then the rbd-mirror got stuck, the snapshot was deleted on the primary and a new one created recently. And on the secondary you was still seeing and mounting the old snapshot? This would also explain why you were able to mount it -- if data is really missing I expect you are not able to mount the fs due to corruption. If the rbd-mirror just got stuck then you probably don't need to resync. Just restarting the rbd-mirror should make it to start replaying again. Though taking how long it was not replaying, if the journal is very large, the resync might be faster. You can try: rbd journal info -p cifs --image research_data to see how large the journal is currently (the difference in the master and the rbd-mirror client positions). And if this is really the case that rbd-mirror got stuck, any additional info you could provide (rbd-mirror logs, the core dump) might be helpful for fixing the bug. It is can be reported right to the tracker. What version are you running BTW? -- Mykola Golub > Zitat von Vikas Rana : > > > Hi Friends, > > > > > > > > We have a very weird issue with rbd-mirror replication. As per the command > > output, we are in sync but the OSD usage on DR side doesn't match the Prod > > Side. > > > > On Prod, we are using close to 52TB but on DR side we are only 22TB. > > > > We took a snap on Prod and mounted the snap on DR side and compared the data > > and we found lot of missing data. Please see the output below. > > > > > > > > Please help us resolve this issue or point us in right direction. > > > > > > > > Thanks, > > > > -Vikas > > > > > > > > DR# rbd --cluster cephdr mirror pool status cifs --verbose > > > > health: OK > > > > images: 1 total > > > > 1 replaying > > > > > > > > research_data: > > > > global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a > > > > state: up+replaying > > > > description: replaying, master_position=[object_number=390133, tag_tid=4, > > entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4, > > entry_tid=447832541], entries_behind_master=0 > > > > last_update: 2021-01-29 15:10:13 > > > > > > > > DR# ceph osd pool ls detail > > > > pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins > > pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0 > > application rbd > > > > removed_snaps [1~5] > > > > > > > > > > > > PROD# ceph df detail > > > > POOLS: > > > > NAMEID QUOTA OBJECTS QUOTA BYTES USED %USED > > MAX AVAIL OBJECTS DIRTY READWRITE RAW USED > > > > cifs17 N/A N/A 26.0TiB 30.10 > > 60.4TiB 6860550 6.86M 873MiB 509MiB 52.1TiB > > > > > > > > DR# ceph df detail > > > > POOLS: > > > > NAMEID QUOTA OBJECTS QUOTA BYTES USED %USED > > MAX AVAIL OBJECTS DIRTY READWRITE RAW USED > > > > cifs5 N/A N/A 11.4TiB 15.78 > > 60.9TiB 3043260 3.04M 2.65MiB 431MiB 22.8TiB > > > > > > > > > > > > > > > > PROD#:/vol/research_data# du -sh * > &g
[ceph-users] Re: Data Missing with RBD-Mirror
Hello Mykola/Eugen, Here's the output. We also restarted the rbd-mirror process # rbd journal info -p cifs --image research_data rbd journal '11cb6c2ae8944a': header_oid: journal.11cb6c2ae8944a object_oid_prefix: journal_data.17.11cb6c2ae8944a. order: 24 (16MiB objects) splay_width: 4 We restarted the rbd-mirror process on the DR side # rbd --cluster cephdr mirror pool status cifs --verbose health: OK images: 1 total 1 replaying research_data: global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a state: up+replaying description: replaying, master_position=[object_number=396351, tag_tid=4, entry_tid=455084955], mirror_position=[object_number=396351, tag_tid=4, entry_tid=455084955], entries_behind_master=0 last_update: 2021-02-19 15:36:30 Thanks, -Vikas -Original Message----- From: Vikas Rana Sent: Friday, February 19, 2021 2:00 PM To: 'Mykola Golub' ; 'Eugen Block' Cc: ceph-users@ceph.io Subject: [ceph-users] Re: Data Missing with RBD-Mirror Hello Mykola and Eugen, There was no interruption and we are in a campus with 10G backbone. We are on 12.2.10 I believe. We wanted to check the data on DR side and then we created a snapshot on primary which was available on DR side very quickly. It kind of gave me feeling that rbd-mirror is not stuck. I will run those commands and also restart the rbd-mirror and will report back. Thanks, -Vikas -Original Message- From: Mykola Golub Sent: Thursday, February 18, 2021 2:51 PM To: Vikas Rana ; Eugen Block Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: Data Missing with RBD-Mirror On Thu, Feb 18, 2021 at 03:28:11PM +, Eugen Block wrote: > Hi, > > was there an interruption between those sites? > > > last_update: 2021-01-29 15:10:13 > > If there was an interruption you'll probably need to resync those images. If your results shown below are not from that past then yes, it looks like the rbd-mirror (at least the image replayer) got stuck for some reason long time ago. Then I can't see though how could you mount a newly created snap, because it would not be replayed. Probably you had a snapshot with such name previously, it was replayed, then the rbd-mirror got stuck, the snapshot was deleted on the primary and a new one created recently. And on the secondary you was still seeing and mounting the old snapshot? This would also explain why you were able to mount it -- if data is really missing I expect you are not able to mount the fs due to corruption. If the rbd-mirror just got stuck then you probably don't need to resync. Just restarting the rbd-mirror should make it to start replaying again. Though taking how long it was not replaying, if the journal is very large, the resync might be faster. You can try: rbd journal info -p cifs --image research_data to see how large the journal is currently (the difference in the master and the rbd-mirror client positions). And if this is really the case that rbd-mirror got stuck, any additional info you could provide (rbd-mirror logs, the core dump) might be helpful for fixing the bug. It is can be reported right to the tracker. What version are you running BTW? -- Mykola Golub > Zitat von Vikas Rana : > > > Hi Friends, > > > > > > > > We have a very weird issue with rbd-mirror replication. As per the command > > output, we are in sync but the OSD usage on DR side doesn't match > > the Prod > > Side. > > > > On Prod, we are using close to 52TB but on DR side we are only 22TB. > > > > We took a snap on Prod and mounted the snap on DR side and compared > > the data > > and we found lot of missing data. Please see the output below. > > > > > > > > Please help us resolve this issue or point us in right direction. > > > > > > > > Thanks, > > > > -Vikas > > > > > > > > DR# rbd --cluster cephdr mirror pool status cifs --verbose > > > > health: OK > > > > images: 1 total > > > > 1 replaying > > > > > > > > research_data: > > > > global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a > > > > state: up+replaying > > > > description: replaying, master_position=[object_number=390133, tag_tid=4, > > entry_tid=447832541], mirror_position=[object_number=390133, > > tag_tid=4, entry_tid=447832541], entries_behind_master=0 > > > > last_update: 2021-01-29 15:10:13 > > > > > > > > DR# ceph osd pool ls detail > > > > pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins > > pg_num 128 pgp_num 128 last_change 1294 flags hashpspool > > stripe_width 0 applicatio
[ceph-users] Re: Data Missing with RBD-Mirror
That is correct. On Prod we do have 22TB and on DR we only have 5.5TB Thanks, -Vikas -Original Message- From: Mykola Golub Sent: Monday, February 22, 2021 10:47 AM To: Vikas Rana Cc: 'Eugen Block' ; ceph-users@ceph.io; dilla...@redhat.com Subject: Re: [ceph-users] Re: Data Missing with RBD-Mirror On Mon, Feb 22, 2021 at 09:41:44AM -0500, Vikas Rana wrote: > # rbd journal info -p cifs --image research_data rbd journal > '11cb6c2ae8944a': > header_oid: journal.11cb6c2ae8944a > object_oid_prefix: journal_data.17.11cb6c2ae8944a. > order: 24 (16MiB objects) > splay_width: 4 Eh, I asked for a wrong command. Actually I wanted to see `rbd journal status`. Anyway, I have that info in mirror status below, which looks like up to date now. > We restarted the rbd-mirror process on the DR side # rbd --cluster > cephdr mirror pool status cifs --verbose > health: OK > images: 1 total > 1 replaying > > research_data: > global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a > state: up+replaying > description: replaying, master_position=[object_number=396351, > tag_tid=4, entry_tid=455084955], > mirror_position=[object_number=396351, tag_tid=4, entry_tid=455084955], entries_behind_master=0 > last_update: 2021-02-19 15:36:30 And I suppose, after creating and replaying a snapshot, you still see files missing on the secondary after mounting it? -- Mykola Golub ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Data Missing with RBD-Mirror
We did compare and it was missing lot of data. I'll issue resync and report back. Thanks, -Vikas -Original Message- From: Mykola Golub Sent: Monday, February 22, 2021 12:09 PM To: Vikas Rana Cc: 'Mykola Golub' ; 'Eugen Block' ; ceph-users@ceph.io; dilla...@redhat.com Subject: Re: [ceph-users] Re: Data Missing with RBD-Mirror On Mon, Feb 22, 2021 at 11:37:52AM -0500, Vikas Rana wrote: > That is correct. On Prod we do have 22TB and on DR we only have 5.5TB But did you check that you really have missing files/data? Just to make sure it is not just some issue with how data is stored/counted in different clusters. Assuming you did and data is missing, then the only way to proceed I think is to issue resync, running on the secondary site: rbd mirror image resync cifs/research_data Note, it would mean recreating the image and resyncing all 22TB though. And it would be nice to add some monitoring to be able to detect the moment if the issue happens again, and report it to the tracker attaching the rbd-mirror logs. -- Mykola Golub ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Zero Reclaim/Trim on RBD image
Hi Friends, We have multiple RBD images/devices. We used fstrim to reclaim space on XFS filesystem on RBD images and it works great. Some of the RBD images, we share as block devices to Vmware using SCST. Any suggestion on how to reclaim space on these devices? The images are about 20TB but showing as 100TB used inside ceph pool. Thanks, -Vikas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io