[ceph-users] Re: some ceph general questions about the design

2020-04-21 Thread Anthony D'Atri
> > 1. shoud i use a raid controller a create for example a raid 5 with all disks > on each osd server? or should i passtrough all disks to ceph osd? > > If your OSD servers have HDDs, buy a good RAID Controller with a > battery-backed write cache and configure it using multiple RAID-0 volume

[ceph-users] Re: RGW and the orphans

2020-04-21 Thread Janne Johansson
Den tis 21 apr. 2020 kl 07:29 skrev Eric Ivancich : > Please be certain to read the associated docs in both: > > doc/radosgw/orphans.rst > doc/man/8/rgw-orphan-list.rst > > so you understand the limitations and potential pitfalls. Generally this > tool will be a precursor to a larg

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Paul Emmerich
On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > Wait for recovery to finish so you know whether any data from the down > OSDs is required. If not just reprovision them. Recovery will not finish from this state as several PGs are down and/or stale. Paul > > If data is required from the

[ceph-users] Re: RGW and the orphans

2020-04-21 Thread Katarzyna Myrek
Hi I was looking into running the tool. The question is: Do I need to compile the whole Ceph? Or is there radosgw-admin available for download precompiled? A nightly build or sth? Kind regards / Pozdrawiam, Katarzyna Myrek wt., 21 kwi 2020 o 09:57 Janne Johansson napisał(a): > > Den tis 21 apr

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Marc Roos
I had a test data cephfs pool with 1x replication, that left me with 1 stale pg also. I have no idea how to resolve this. I already marked the osd as lost. Do I need to manually 'unconfigure' this cepfs data pool? Or can I 'reinitialize' it? -Original Message- To: Brad Hubbard Cc:

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Robert Sander
Hi, On 21.04.20 10:33, Paul Emmerich wrote: > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: >> >> Wait for recovery to finish so you know whether any data from the down >> OSDs is required. If not just reprovision them. > > Recovery will not finish from this state as several PGs are down a

[ceph-users] Re: missing amqp-exchange on bucket-notification with AMQP endpoint

2020-04-21 Thread Yuval Lifshitz
Hi Andreas, The message format you tried to use is the standard one (the one being emitted from boto3, or any other AWS SDK [1]). It passes the arguments using 'x-www-form-urlencoded'. For example: POST / HTTP/1.1 Host: localhost:8000 Accept-Encoding: identity Date: Tue, 21 Apr 2020 08:52:35 GMT

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Brad Hubbard
On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich wrote: > > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > > > Wait for recovery to finish so you know whether any data from the down > > OSDs is required. If not just reprovision them. > > Recovery will not finish from this state as several P

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Paul Emmerich
On Tue, Apr 21, 2020 at 12:44 PM Brad Hubbard wrote: > > On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich wrote: > > > > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > > > > > Wait for recovery to finish so you know whether any data from the down > > > OSDs is required. If not just reprovi

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Jonas Jelten
Hi! Yes, it looks like you hit the same bug. My corruption back then happed because the server was out-of-memory and OSDs restarted and crashed quickly again and again for quite some time... What I think happens is that the journals somehow get out of sync between OSDs, which is something that

[ceph-users] block.db symlink missing after each reboot

2020-04-21 Thread Stefan Priebe - Profihost AG
Hi there, i've a bunch of hosts where i migrated HDD only OSDs to hybird ones using: sudo -E -u ceph -- bash -c 'ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} bluefs-bdev-new-db --dev-target /dev/bluefs_db1/db-osd${OSD}' while this worked fine and each OSD was running fine. It looses

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Robert Sander
Hi Jonas, On 21.04.20 14:47, Jonas Jelten wrote: > I hope my script still works for you. If you need any help, I'll see what I > can do :) The script currently does not find the info it needs and wants us to increase to logging level. We set the logging level to 10 and tried to restart the OSD

[ceph-users] Re: block.db symlink missing after each reboot

2020-04-21 Thread Igor Fedotov
Hi Stefan, I think that's the cause: https://tracker.ceph.com/issues/42928 On 4/21/2020 4:02 PM, Stefan Priebe - Profihost AG wrote: Hi there, i've a bunch of hosts where i migrated HDD only OSDs to hybird ones using: sudo -E -u ceph -- bash -c 'ceph-bluestore-tool --path /var/lib/ceph/osd/c

[ceph-users] Sporadic mgr segmentation fault

2020-04-21 Thread XuYun
Dear ceph users, We are experiencing sporadic mgr crash in all three ceph clusters (version 14.2.6 and version 14.2.8), the crash log is: 2020-04-17 23:10:08.986 7fed7fe07700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Jonas Jelten
Hi! Since you are on nautilus and I was on mimic back then, the messages may have changed. The script is only an automatization for deleting many broken PGs, you can perform the procedure by hand first. You can perform the steps in my state machine by hand and identify the right messages, and

[ceph-users] Re: block.db symlink missing after each reboot

2020-04-21 Thread Stefan Priebe - Profihost AG
Hi Igor, Am 21.04.20 um 15:52 schrieb Igor Fedotov: > Hi Stefan, > > I think that's the cause: > > https://tracker.ceph.com/issues/42928 thanks yes that matches. Is there any way to fix this manually? And is this also related to: https://tracker.ceph.com/issues/44509 Greets, Stefan > > On 4

[ceph-users] Re: block.db symlink missing after each reboot

2020-04-21 Thread Igor Fedotov
On 4/21/2020 4:59 PM, Stefan Priebe - Profihost AG wrote: Hi Igor, Am 21.04.20 um 15:52 schrieb Igor Fedotov: Hi Stefan, I think that's the cause: https://tracker.ceph.com/issues/42928 thanks yes that matches. Is there any way to fix this manually? I think so - AFAIK missed tags are pure L

[ceph-users] Re: some ceph general questions about the design

2020-04-21 Thread Antoine Lecrux
Hi Anthony, You bring a very valid point. My advice is to carefully chose the HBA and the disks and do extensive testing during the initial phase of the project and have controlled Firmware upgrade campains with a good pre-production setup. In a multiple RAID-0 scenario, there are some paramete

[ceph-users] Re: PG deep-scrub does not finish

2020-04-21 Thread Andras Pataki
Hi Brad, Indeed - osd.694 kept crashing with a read error (medium error on the hard drive), and got restarted by systemd.  So net net the system ended up in an infinite loop of deep scrub attempts on the PG for a week.  Typically when a scrub encounters a read error, I get an inconsistent pla

[ceph-users] Re: block.db symlink missing after each reboot

2020-04-21 Thread Stefan Priebe - Profihost AG
Hi Igor, mhm i updated the missing lv tags: # lvs -o lv_tags /dev/ceph-3a295647-d5a1-423c-81dd-1d2b32d7c4c5/osd-block-c2676c5f-111c-4603-b411-473f7a7638c2 | tr ',' '\n' | sort LV Tags ceph.block_device=/dev/ceph-3a295647-d5a1-423c-81dd-1d2b32d7c4c5/osd-block-c2676c5f-111c-4603-b411-473

[ceph-users] Rebuilding the Ceph.io site with Jekyll

2020-04-21 Thread Lars Marowsky-Bree
Hi all, as part of the Ceph Foundation, we're considering to re-launch the Ceph website and migrate it away from a dated WordPress to Jekyll, backed by Git et al. (Either hosted on our own infrastructure or even GitHub pages.) This would involve building/customizing a Jekyll theme, providing feed

[ceph-users] Re: PG deep-scrub does not finish

2020-04-21 Thread Brad Hubbard
Looks like that drive is dying. On Wed, Apr 22, 2020 at 12:25 AM Andras Pataki wrote: > > Hi Brad, > > Indeed - osd.694 kept crashing with a read error (medium error on the > hard drive), and got restarted by systemd. So net net the system ended > up in an infinite loop of deep scrub attempts on