[ceph-users] Re: some ceph general questions about the design
> > 1. shoud i use a raid controller a create for example a raid 5 with all disks > on each osd server? or should i passtrough all disks to ceph osd? > > If your OSD servers have HDDs, buy a good RAID Controller with a > battery-backed write cache and configure it using multiple RAID-0 volumes (1 > physical disk per volume). That way, reads and write will be accelerated by > the cache on the HBA. I’ve lived this scenario and hated it. Multiple firmware and manufacturing issues, batteries/supercaps can fail and need to be monitored, bugs causing staged data to be lost before writing to disk, another bug that required replacing the card if there was preserved cache for a failed drive, because it would refuse to boot, difficulties in drive monitoring, HBA monitoring utility that would lock the HBA or peg the CPU, the list goes on. For the additional cost of RoC, cache RAM, supercap to (fingers crossed) protect the cache, all the additional monitoring and hands work … you might find that SATA SSDs on a JBOD HBA are no more expensive. > 3. if i have a 3 physically node osd cluster, did i need 5 physicall mons? > No. 3 MON are enough If you have good hands and spares. If your cluster is on a different continent and colo hands can’t find their own butts ….. it’s nice to survive a double failure. ymmv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW and the orphans
Den tis 21 apr. 2020 kl 07:29 skrev Eric Ivancich : > Please be certain to read the associated docs in both: > > doc/radosgw/orphans.rst > doc/man/8/rgw-orphan-list.rst > > so you understand the limitations and potential pitfalls. Generally this > tool will be a precursor to a large delete job, so understanding what’s > going on is important. > I look forward to your report! And please feel free to post additional > questions in this forum. > > Where are those? https://github.com/ceph/ceph/tree/master/doc/man/8 https://github.com/ceph/ceph/tree/master/doc/radosgw don't seem to contain them in master. Nor in nautilus branch or octopus. This whole issue feels weird, rgw (or its users) produces dead fragments of mulitparts, orphans and whatnot that needs cleaning up sooner or later and the info we get is that the old cleaner isn't meant to be used, it hasn't worked for a long while, there is no fixed version, perhaps there is a script somewhere with caveats. This (slightly frustrated) issue is of course on top of "bi trim" "bilog trim" "mdlog trim" "usage trim" "datalog trim" "sync error trim" "gc process" "reshard stale-instances rm" that we rgw admins are supposed to know when to run, how often, what their quirks are and so on. 'Docs' for rgw means "datalog trim" --help says "trims the datalog", and the long version on the web would be "this operation trims the datalog" or something that doesn't add anything more. -- "Grumpy cat was an optimist" ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > Wait for recovery to finish so you know whether any data from the down > OSDs is required. If not just reprovision them. Recovery will not finish from this state as several PGs are down and/or stale. Paul > > If data is required from the down OSDs you will need to run a query on > the pg(s) to find out what OSDs have the required copies of the > pg/object required. you can then export the pg from the down osd using > the ceph-objectstore-tool, back it up, then import it back into the > cluster. > > On Tue, Apr 21, 2020 at 1:05 AM Robert Sander > wrote: > > > > Hi, > > > > one of our customers had his Ceph cluster crashed due to a power or network > > outage (they still try to figure out what happened). > > > > The cluster is very unhealthy but recovering: > > > > # ceph -s > > cluster: > > id: 1c95ca5d-948b-4113-9246-14761cb9a82a > > health: HEALTH_ERR > > 1 filesystem is degraded > > 1 mds daemon damaged > > 1 osds down > > 1 pools have many more objects per pg than average > > 1/115117480 objects unfound (0.000%) > > Reduced data availability: 71 pgs inactive, 53 pgs down, 18 pgs > > peering, 27 pgs stale > > Possible data damage: 1 pg recovery_unfound > > Degraded data redundancy: 7303464/230234960 objects degraded > > (3.172%), 693 pgs degraded, 945 pgs undersized > > 14 daemons have recently crashed > > > > services: > > mon: 3 daemons, quorum maslxlabstore01,maslxlabstore02,maslxlabstore04 > > (age 64m) > > mgr: maslxlabstore01(active, since 69m), standbys: maslxlabstore03, > > maslxlabstore02, maslxlabstore04 > > mds: cephfs:2/3 > > {0=maslxlabstore03=up:resolve,1=maslxlabstore01=up:resolve} 2 up:standby, 1 > > damaged > > osd: 140 osds: 130 up (since 4m), 131 in (since 4m); 847 remapped pgs > > rgw: 4 daemons active (maslxlabstore01.rgw0, maslxlabstore02.rgw0, > > maslxlabstore03.rgw0, maslxlabstore04.rgw0) > > > > data: > > pools: 6 pools, 8328 pgs > > objects: 115.12M objects, 218 TiB > > usage: 425 TiB used, 290 TiB / 715 TiB avail > > pgs: 0.853% pgs not active > > 7303464/230234960 objects degraded (3.172%) > > 13486/230234960 objects misplaced (0.006%) > > 1/115117480 objects unfound (0.000%) > > 7311 active+clean > > 338 active+undersized+degraded+remapped+backfill_wait > > 255 active+undersized+degraded+remapped+backfilling > > 215 active+undersized+remapped+backfilling > > 99 active+undersized+degraded > > 44 down > > 37 active+undersized+remapped+backfill_wait > > 13 stale+peering > > 9stale+down > > 5stale+remapped+peering > > 1active+recovery_unfound+undersized+degraded+remapped > > 1active+clean+remapped > > > > io: > > client: 168 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr > > recovery: 1.9 GiB/s, 15 keys/s, 948 objects/s > > > > > > The MDS cluster is unable to start because one of them is damaged. > > > > 10 of the OSDs do not start. They crash very early in the boot process: > > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 > > (ceph:ceph) > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version 14.2.9 > > (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process > > ceph-osd, pid 69463 > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 pidfile_write: ignore empty > > --pid-file > > 2020-04-20 16:26:15.503 7f818ec8cc00 0 starting osd.42 osd_data > > /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal > > 2020-04-20 16:26:15.523 7f818ec8cc00 0 load: jerasure load: lrc load: isa > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > compaction_readahead_size = 2MB > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > compaction_style = kCompactionStyleLevel > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > compaction_threads = 32 > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = > > kNoCompression > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option flusher_threads > > = 8 > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > level0_file_num_compaction_trigger = 8 > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > level0_slowdown_writes_trigger = 32 > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > level0_stop_writes_trigger = 64 > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > max_background_compactions = 31 > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > max_bytes_for_level_base = 536870912 > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > max_bytes_for_level_multiplier = 8 > > 2
[ceph-users] Re: RGW and the orphans
Hi I was looking into running the tool. The question is: Do I need to compile the whole Ceph? Or is there radosgw-admin available for download precompiled? A nightly build or sth? Kind regards / Pozdrawiam, Katarzyna Myrek wt., 21 kwi 2020 o 09:57 Janne Johansson napisał(a): > > Den tis 21 apr. 2020 kl 07:29 skrev Eric Ivancich : >> >> Please be certain to read the associated docs in both: >> >> doc/radosgw/orphans.rst >> doc/man/8/rgw-orphan-list.rst >> >> so you understand the limitations and potential pitfalls. Generally this >> tool will be a precursor to a large delete job, so understanding what’s >> going on is important. >> I look forward to your report! And please feel free to post additional >> questions in this forum. >> > > Where are those? > https://github.com/ceph/ceph/tree/master/doc/man/8 > https://github.com/ceph/ceph/tree/master/doc/radosgw > don't seem to contain them in master. Nor in nautilus branch or octopus. > > This whole issue feels weird, rgw (or its users) produces dead fragments of > mulitparts, orphans and whatnot that needs cleaning up sooner or later and > the info we get is that the old cleaner isn't meant to be used, it hasn't > worked for a long while, there is no fixed version, perhaps there is a script > somewhere with caveats. This (slightly frustrated) issue is of course on top > of > "bi trim" > "bilog trim" > "mdlog trim" > "usage trim" > > "datalog trim" > > "sync error trim" > > "gc process" > > "reshard stale-instances rm" > > > > that we rgw admins are supposed to know when to run, how often, what their > quirks are and so on. > > > 'Docs' for rgw means "datalog trim" --help says "trims the datalog", and the > long version on the web would be "this operation trims the datalog" or > something that doesn't add anything more. > > > > > -- > > "Grumpy cat was an optimist" > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
I had a test data cephfs pool with 1x replication, that left me with 1 stale pg also. I have no idea how to resolve this. I already marked the osd as lost. Do I need to manually 'unconfigure' this cepfs data pool? Or can I 'reinitialize' it? -Original Message- To: Brad Hubbard Cc: ceph-users Subject: [ceph-users] Re: Nautilus cluster damaged + crashing OSDs On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > Wait for recovery to finish so you know whether any data from the down > OSDs is required. If not just reprovision them. Recovery will not finish from this state as several PGs are down and/or stale. Paul > > If data is required from the down OSDs you will need to run a query on > the pg(s) to find out what OSDs have the required copies of the > pg/object required. you can then export the pg from the down osd using > the ceph-objectstore-tool, back it up, then import it back into the > cluster. > > On Tue, Apr 21, 2020 at 1:05 AM Robert Sander > wrote: > > > > Hi, > > > > one of our customers had his Ceph cluster crashed due to a power or network outage (they still try to figure out what happened). > > > > The cluster is very unhealthy but recovering: > > > > # ceph -s > > cluster: > > id: 1c95ca5d-948b-4113-9246-14761cb9a82a > > health: HEALTH_ERR > > 1 filesystem is degraded > > 1 mds daemon damaged > > 1 osds down > > 1 pools have many more objects per pg than average > > 1/115117480 objects unfound (0.000%) > > Reduced data availability: 71 pgs inactive, 53 pgs down, 18 pgs peering, 27 pgs stale > > Possible data damage: 1 pg recovery_unfound > > Degraded data redundancy: 7303464/230234960 objects degraded (3.172%), 693 pgs degraded, 945 pgs undersized > > 14 daemons have recently crashed > > > > services: > > mon: 3 daemons, quorum maslxlabstore01,maslxlabstore02,maslxlabstore04 (age 64m) > > mgr: maslxlabstore01(active, since 69m), standbys: maslxlabstore03, maslxlabstore02, maslxlabstore04 > > mds: cephfs:2/3 {0=maslxlabstore03=up:resolve,1=maslxlabstore01=up:resolve} 2 up:standby, 1 damaged > > osd: 140 osds: 130 up (since 4m), 131 in (since 4m); 847 remapped pgs > > rgw: 4 daemons active (maslxlabstore01.rgw0, > > maslxlabstore02.rgw0, maslxlabstore03.rgw0, maslxlabstore04.rgw0) > > > > data: > > pools: 6 pools, 8328 pgs > > objects: 115.12M objects, 218 TiB > > usage: 425 TiB used, 290 TiB / 715 TiB avail > > pgs: 0.853% pgs not active > > 7303464/230234960 objects degraded (3.172%) > > 13486/230234960 objects misplaced (0.006%) > > 1/115117480 objects unfound (0.000%) > > 7311 active+clean > > 338 active+undersized+degraded+remapped+backfill_wait > > 255 active+undersized+degraded+remapped+backfilling > > 215 active+undersized+remapped+backfilling > > 99 active+undersized+degraded > > 44 down > > 37 active+undersized+remapped+backfill_wait > > 13 stale+peering > > 9stale+down > > 5stale+remapped+peering > > 1 active+recovery_unfound+undersized+degraded+remapped > > 1active+clean+remapped > > > > io: > > client: 168 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr > > recovery: 1.9 GiB/s, 15 keys/s, 948 objects/s > > > > > > The MDS cluster is unable to start because one of them is damaged. > > > > 10 of the OSDs do not start. They crash very early in the boot process: > > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 > > (ceph:ceph) 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version > > 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), > > process ceph-osd, pid 69463 2020-04-20 16:26:14.935 7f818ec8cc00 0 > > pidfile_write: ignore empty --pid-file 2020-04-20 16:26:15.503 > > 7f818ec8cc00 0 starting osd.42 osd_data /var/lib/ceph/osd/ceph-42 > > /var/lib/ceph/osd/ceph-42/journal 2020-04-20 16:26:15.523 > > 7f818ec8cc00 0 load: jerasure load: lrc load: isa 2020-04-20 > > 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > compaction_readahead_size = 2MB 2020-04-20 16:26:16.339 7f818ec8cc00 > > 0 set rocksdb option compaction_style = kCompactionStyleLevel > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > compaction_threads = 32 2020-04-20 16:26:16.339 7f818ec8cc00 0 set > > rocksdb option compression = kNoCompression 2020-04-20 16:26:16.339 > > 7f818ec8cc00 0 set rocksdb option flusher_threads = 8 2020-04-20 > > 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > level0_file_num_compaction_trigger = 8 2020-04-20 16:26:16.339 > > 7f818ec8cc00 0 set rocksdb option level0_slowdown_writes_trigger = > > 32 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option >
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
Hi, On 21.04.20 10:33, Paul Emmerich wrote: > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: >> >> Wait for recovery to finish so you know whether any data from the down >> OSDs is required. If not just reprovision them. > > Recovery will not finish from this state as several PGs are down and/or stale. > Thanks for your input so far. It looks like this issue: https://tracker.ceph.com/issues/36337 We will try to use the linked Python script to repair the OSD. ceph-bluestore-tool repair did not find anything. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 93818 B Geschäftsführer: Peer Heinlein - Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: missing amqp-exchange on bucket-notification with AMQP endpoint
Hi Andreas, The message format you tried to use is the standard one (the one being emitted from boto3, or any other AWS SDK [1]). It passes the arguments using 'x-www-form-urlencoded'. For example: POST / HTTP/1.1 Host: localhost:8000 Accept-Encoding: identity Date: Tue, 21 Apr 2020 08:52:35 GMT Content-Length: 293 Content-Type: application/x-www-form-urlencoded; charset=utf-8 Authorization: AWS KOC0EIWUFANCC3FX:8PunIZ4F36uK2c+3AKwhaKXgK84= User-Agent: Boto3/1.9.225 Python/2.7.17 Linux/5.5.13-200.fc31.x86_64 Botocore/1.15.28 Name=ajmmvc-1_topic_1& Attributes.entry.2.key=amqp-exchange& Attributes.entry.1.key=amqp-ack-level& Attributes.entry.2.value=amqp.direct& Version=2010-03-31& Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001& Attributes.entry.1.value=none& Action=CreateTopic& Attributes.entry.3.key=push-endpoint Note that the arguments are passed inside the message body (no '?' in the URL), and are using the "Attributes" for all the non-standard parameters we added on top of the standard AWS topic creation command. The format that worked for you, is a non-standard one that we support, as documented for pubsub [2], which is using regular URL encoded parameters. Feel free to use either, but would recommend on the standard one. Anyway, thanks for pointing this confusion, will clarify that in the doc, and also fix the 'push-endpoint' part. Yuval [1] https://docs.aws.amazon.com/sns/latest/api/API_CreateTopic.html [2] https://docs.ceph.com/docs/master/radosgw/pubsub-module/#create-a-topic On Mon, Apr 20, 2020 at 8:05 PM Andreas Unterkircher wrote: > I've tried to debug this a bit. > > > > > amqp:// > rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 > > > Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp:// > rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 > > testtopic > > > > For the above I was using the following request to create the topic - > similar as it is described here [1]: > > > https://ceph.example.com/?Action=CreateTopic&Name=testtopic&Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 > > (of course endpoint then URL-encoded) > > It seems to me that RGWHTTPArgs::parse() is not translating the > "Attributes.entry.1..." strings into keys & values in its map. > > This are the keys & values that can now be found in the map: > > > Found name: Attributes.entry.1.key > Found value: amqp-exchange > Found name: Attributes.entry.1.value > Found value: amqp.direct > Found name: push-endpoint > Found value: amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 > > If I simply change the request to: > > > https://ceph.example.com/?Action=CreateTopic&Name=testtopic&amqp-exchange=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672/foobar > > -> at voila, the entries in the map are correct > > > Found name: amqp-exchange > Found value: amqp.direct > Found name: push-endpoint > Found value: amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 > > And then the bucket-notification works like it should. > > But I don't think the documentation is wrong, or is it? > > Cheers, > Andreas > > > [1] > https://docs.ceph.com/docs/master/radosgw/notifications/#create-a-topic > > > > [2] Index: ceph-15.2.1/src/rgw/rgw_common.cc > === > --- ceph-15.2.1.orig/src/rgw/rgw_common.cc > +++ ceph-15.2.1/src/rgw/rgw_common.cc > @@ -810,6 +810,8 @@ int RGWHTTPArgs::parse() > string& name = nv.get_name(); > string& val = nv.get_val(); > > + cout << "Found name: " << name << std::endl; > + cout << "Found value: " << val << std::endl; > append(name, val); > } > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich wrote: > > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > > > Wait for recovery to finish so you know whether any data from the down > > OSDs is required. If not just reprovision them. > > Recovery will not finish from this state as several PGs are down and/or stale. What I meant was let recovery get as far as it can. > > > Paul > > > > > If data is required from the down OSDs you will need to run a query on > > the pg(s) to find out what OSDs have the required copies of the > > pg/object required. you can then export the pg from the down osd using > > the ceph-objectstore-tool, back it up, then import it back into the > > cluster. > > > > On Tue, Apr 21, 2020 at 1:05 AM Robert Sander > > wrote: > > > > > > Hi, > > > > > > one of our customers had his Ceph cluster crashed due to a power or > > > network outage (they still try to figure out what happened). > > > > > > The cluster is very unhealthy but recovering: > > > > > > # ceph -s > > > cluster: > > > id: 1c95ca5d-948b-4113-9246-14761cb9a82a > > > health: HEALTH_ERR > > > 1 filesystem is degraded > > > 1 mds daemon damaged > > > 1 osds down > > > 1 pools have many more objects per pg than average > > > 1/115117480 objects unfound (0.000%) > > > Reduced data availability: 71 pgs inactive, 53 pgs down, 18 > > > pgs peering, 27 pgs stale > > > Possible data damage: 1 pg recovery_unfound > > > Degraded data redundancy: 7303464/230234960 objects degraded > > > (3.172%), 693 pgs degraded, 945 pgs undersized > > > 14 daemons have recently crashed > > > > > > services: > > > mon: 3 daemons, quorum > > > maslxlabstore01,maslxlabstore02,maslxlabstore04 (age 64m) > > > mgr: maslxlabstore01(active, since 69m), standbys: maslxlabstore03, > > > maslxlabstore02, maslxlabstore04 > > > mds: cephfs:2/3 > > > {0=maslxlabstore03=up:resolve,1=maslxlabstore01=up:resolve} 2 up:standby, > > > 1 damaged > > > osd: 140 osds: 130 up (since 4m), 131 in (since 4m); 847 remapped pgs > > > rgw: 4 daemons active (maslxlabstore01.rgw0, maslxlabstore02.rgw0, > > > maslxlabstore03.rgw0, maslxlabstore04.rgw0) > > > > > > data: > > > pools: 6 pools, 8328 pgs > > > objects: 115.12M objects, 218 TiB > > > usage: 425 TiB used, 290 TiB / 715 TiB avail > > > pgs: 0.853% pgs not active > > > 7303464/230234960 objects degraded (3.172%) > > > 13486/230234960 objects misplaced (0.006%) > > > 1/115117480 objects unfound (0.000%) > > > 7311 active+clean > > > 338 active+undersized+degraded+remapped+backfill_wait > > > 255 active+undersized+degraded+remapped+backfilling > > > 215 active+undersized+remapped+backfilling > > > 99 active+undersized+degraded > > > 44 down > > > 37 active+undersized+remapped+backfill_wait > > > 13 stale+peering > > > 9stale+down > > > 5stale+remapped+peering > > > 1active+recovery_unfound+undersized+degraded+remapped > > > 1active+clean+remapped > > > > > > io: > > > client: 168 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr > > > recovery: 1.9 GiB/s, 15 keys/s, 948 objects/s > > > > > > > > > The MDS cluster is unable to start because one of them is damaged. > > > > > > 10 of the OSDs do not start. They crash very early in the boot process: > > > > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 > > > (ceph:ceph) > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version 14.2.9 > > > (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process > > > ceph-osd, pid 69463 > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 pidfile_write: ignore empty > > > --pid-file > > > 2020-04-20 16:26:15.503 7f818ec8cc00 0 starting osd.42 osd_data > > > /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal > > > 2020-04-20 16:26:15.523 7f818ec8cc00 0 load: jerasure load: lrc load: isa > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > > compaction_readahead_size = 2MB > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > > compaction_style = kCompactionStyleLevel > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > > compaction_threads = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option compression = > > > kNoCompression > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > > flusher_threads = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > > level0_file_num_compaction_trigger = 8 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > > level0_slowdown_writes_trigger = 32 > > > 2020-04-20 16:26:16.339 7f818ec8cc00 0 set rocksdb option > > > level0_stop_writes_t
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
On Tue, Apr 21, 2020 at 12:44 PM Brad Hubbard wrote: > > On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich wrote: > > > > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > > > > > Wait for recovery to finish so you know whether any data from the down > > > OSDs is required. If not just reprovision them. > > > > Recovery will not finish from this state as several PGs are down and/or > > stale. > > What I meant was let recovery get as far as it can. Which doesn't solve anything, you can already see that you need to get at least some of these OSDs back in order to fix it. No point in waiting for the recovery. I agree that it looks like https://tracker.ceph.com/issues/36337 I happen to know Jonas who opened that issue and wrote the script; I'll poke him maybe he has an idea or additional input Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 > > > > > > > Paul > > > > > > > > If data is required from the down OSDs you will need to run a query on > > > the pg(s) to find out what OSDs have the required copies of the > > > pg/object required. you can then export the pg from the down osd using > > > the ceph-objectstore-tool, back it up, then import it back into the > > > cluster. > > > > > > On Tue, Apr 21, 2020 at 1:05 AM Robert Sander > > > wrote: > > > > > > > > Hi, > > > > > > > > one of our customers had his Ceph cluster crashed due to a power or > > > > network outage (they still try to figure out what happened). > > > > > > > > The cluster is very unhealthy but recovering: > > > > > > > > # ceph -s > > > > cluster: > > > > id: 1c95ca5d-948b-4113-9246-14761cb9a82a > > > > health: HEALTH_ERR > > > > 1 filesystem is degraded > > > > 1 mds daemon damaged > > > > 1 osds down > > > > 1 pools have many more objects per pg than average > > > > 1/115117480 objects unfound (0.000%) > > > > Reduced data availability: 71 pgs inactive, 53 pgs down, 18 > > > > pgs peering, 27 pgs stale > > > > Possible data damage: 1 pg recovery_unfound > > > > Degraded data redundancy: 7303464/230234960 objects > > > > degraded (3.172%), 693 pgs degraded, 945 pgs undersized > > > > 14 daemons have recently crashed > > > > > > > > services: > > > > mon: 3 daemons, quorum > > > > maslxlabstore01,maslxlabstore02,maslxlabstore04 (age 64m) > > > > mgr: maslxlabstore01(active, since 69m), standbys: maslxlabstore03, > > > > maslxlabstore02, maslxlabstore04 > > > > mds: cephfs:2/3 > > > > {0=maslxlabstore03=up:resolve,1=maslxlabstore01=up:resolve} 2 > > > > up:standby, 1 damaged > > > > osd: 140 osds: 130 up (since 4m), 131 in (since 4m); 847 remapped > > > > pgs > > > > rgw: 4 daemons active (maslxlabstore01.rgw0, maslxlabstore02.rgw0, > > > > maslxlabstore03.rgw0, maslxlabstore04.rgw0) > > > > > > > > data: > > > > pools: 6 pools, 8328 pgs > > > > objects: 115.12M objects, 218 TiB > > > > usage: 425 TiB used, 290 TiB / 715 TiB avail > > > > pgs: 0.853% pgs not active > > > > 7303464/230234960 objects degraded (3.172%) > > > > 13486/230234960 objects misplaced (0.006%) > > > > 1/115117480 objects unfound (0.000%) > > > > 7311 active+clean > > > > 338 active+undersized+degraded+remapped+backfill_wait > > > > 255 active+undersized+degraded+remapped+backfilling > > > > 215 active+undersized+remapped+backfilling > > > > 99 active+undersized+degraded > > > > 44 down > > > > 37 active+undersized+remapped+backfill_wait > > > > 13 stale+peering > > > > 9stale+down > > > > 5stale+remapped+peering > > > > 1active+recovery_unfound+undersized+degraded+remapped > > > > 1active+clean+remapped > > > > > > > > io: > > > > client: 168 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr > > > > recovery: 1.9 GiB/s, 15 keys/s, 948 objects/s > > > > > > > > > > > > The MDS cluster is unable to start because one of them is damaged. > > > > > > > > 10 of the OSDs do not start. They crash very early in the boot process: > > > > > > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 set uid:gid to 64045:64045 > > > > (ceph:ceph) > > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 ceph version 14.2.9 > > > > (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable), process > > > > ceph-osd, pid 69463 > > > > 2020-04-20 16:26:14.935 7f818ec8cc00 0 pidfile_write: ignore empty > > > > --pid-file > > > > 2020-04-20 16:26:15.503 7f818ec8cc00 0 starting osd.42 osd_data > > > > /var/lib/ceph/osd/ceph-42 /var/lib/ceph/osd/ceph-42/journal > > > > 2020-04-20 16:26:15.523 7f818ec8cc00 0 load: jerasure load: lrc load: > > > > isa > > > > 2020-04-2
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
Hi! Yes, it looks like you hit the same bug. My corruption back then happed because the server was out-of-memory and OSDs restarted and crashed quickly again and again for quite some time... What I think happens is that the journals somehow get out of sync between OSDs, which is something that should definitely not happen under the intended consistency guarantees. However, I've managed to resolve it back then by deleting the PG with the older log (under the assumption that the newer one is the more recent and better one). This only works if enough shards of that PG are available of course, and then the regular recovery process will restore the missing shards again. I hope my script still works for you. If you need any help, I'll see what I can do :) If things fail, you can still manually import the exported-and-deleted PGs back into any OSD (which will probably cause the other OSDs of the PG to crash since then the logs won't overlap once again). Cheers -- Jonas On 21/04/2020 11.26, Robert Sander wrote: > Hi, > > On 21.04.20 10:33, Paul Emmerich wrote: >> On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: >>> >>> Wait for recovery to finish so you know whether any data from the down >>> OSDs is required. If not just reprovision them. >> >> Recovery will not finish from this state as several PGs are down and/or >> stale. >> > > Thanks for your input so far. > > It looks like this issue: https://tracker.ceph.com/issues/36337 > We will try to use the linked Python script to repair the OSD. > ceph-bluestore-tool repair did not find anything. > > Regards > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] block.db symlink missing after each reboot
Hi there, i've a bunch of hosts where i migrated HDD only OSDs to hybird ones using: sudo -E -u ceph -- bash -c 'ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} bluefs-bdev-new-db --dev-target /dev/bluefs_db1/db-osd${OSD}' while this worked fine and each OSD was running fine. It looses it's block.db symlink after reboot. If i manually recreate the block.db symlink inside: /var/lib/ceph/osd/ceph-* all osds start fine. Can anybody help who creates those symlinks and why they're not created automatically in case of migrated db? Greets, Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
Hi Jonas, On 21.04.20 14:47, Jonas Jelten wrote: > I hope my script still works for you. If you need any help, I'll see what I > can do :) The script currently does not find the info it needs and wants us to increase to logging level. We set the logging level to 10 and tried to restart the OSD (which resulted in a crash) but the script still is not able to find the info. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 93818 B Geschäftsführer: Peer Heinlein - Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db symlink missing after each reboot
Hi Stefan, I think that's the cause: https://tracker.ceph.com/issues/42928 On 4/21/2020 4:02 PM, Stefan Priebe - Profihost AG wrote: Hi there, i've a bunch of hosts where i migrated HDD only OSDs to hybird ones using: sudo -E -u ceph -- bash -c 'ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} bluefs-bdev-new-db --dev-target /dev/bluefs_db1/db-osd${OSD}' while this worked fine and each OSD was running fine. It looses it's block.db symlink after reboot. If i manually recreate the block.db symlink inside: /var/lib/ceph/osd/ceph-* all osds start fine. Can anybody help who creates those symlinks and why they're not created automatically in case of migrated db? Greets, Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Sporadic mgr segmentation fault
Dear ceph users, We are experiencing sporadic mgr crash in all three ceph clusters (version 14.2.6 and version 14.2.8), the crash log is: 2020-04-17 23:10:08.986 7fed7fe07700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/common/buffer.cc: In function 'const char* ceph::buffer::v14_2_0::ptr::c_str() const' thread 7fed7fe07700 time 2020-04-17 23:10:08.984887 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/common/buffer.cc: 578: FAILED ceph_assert(_raw) ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7fed8605c325] 2: (()+0x2534ed) [0x7fed8605c4ed] 3: (()+0x5a21ed) [0x7fed863ab1ed] 4: (PosixConnectedSocketImpl::send(ceph::buffer::v14_2_0::list&, bool)+0xbd) [0x7fed863840ed] 5: (AsyncConnection::_try_send(bool)+0xb6) [0x7fed8632fc76] 6: (ProtocolV2::write_message(Message*, bool)+0x832) [0x7fed8635bf52] 7: (ProtocolV2::write_event()+0x175) [0x7fed863718c5] 8: (AsyncConnection::handle_write()+0x40) [0x7fed86332600] 9: (EventCenter::process_events(unsigned int, std::chrono::duration >*)+0x1397) [0x7fed8637f997] 10: (()+0x57c977) [0x7fed86385977] 11: (()+0x80bdaf) [0x7fed86614daf] 12: (()+0x7e65) [0x7fed8394ce65] 13: (clone()+0x6d) [0x7fed825fa88d] 2020-04-17 23:10:08.990 7fed7ee05700 -1 *** Caught signal (Segmentation fault) ** in thread 7fed7ee05700 thread_name:msgr-worker-2 ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable) 1: (()+0xf5f0) [0x7fed839545f0] 2: (ceph::buffer::v14_2_0::ptr::release()+0x8) [0x7fed863aafd8] 3: (ceph::crypto::onwire::AES128GCM_OnWireTxHandler::~AES128GCM_OnWireTxHandler()+0x59) [0x7fed86388669] 4: (ProtocolV2::reset_recv_state()+0x11f) [0x7fed8635f5af] 5: (ProtocolV2::stop()+0x77) [0x7fed8635f857] 6: (ProtocolV2::handle_existing_connection(boost::intrusive_ptr)+0x5ef) [0x7fed86374f8f] 7: (ProtocolV2::handle_client_ident(ceph::buffer::v14_2_0::list&)+0xd9c) [0x7fed8637673c] 8: (ProtocolV2::handle_frame_payload()+0x1fb) [0x7fed86376c1b] 9: (ProtocolV2::handle_read_frame_dispatch()+0x150) [0x7fed86376e70] 10: (ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&&, int)+0x44d) [0x7fed863773cd] 11: (ProtocolV2::run_continuation(Ct&)+0x34) [0x7fed86360534] 12: (AsyncConnection::process()+0x186) [0x7fed86330656] 13: (EventCenter::process_events(unsigned int, std::chrono::duration >*)+0xa15) [0x7fed8637f015] 14: (()+0x57c977) [0x7fed86385977] 15: (()+0x80bdaf) [0x7fed86614daf] 16: (()+0x7e65) [0x7fed8394ce65] 17: (clone()+0x6d) [0x7fed825fa88d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Any thoughts about this issue? Xu Yun ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Nautilus cluster damaged + crashing OSDs
Hi! Since you are on nautilus and I was on mimic back then, the messages may have changed. The script is only an automatization for deleting many broken PGs, you can perform the procedure by hand first. You can perform the steps in my state machine by hand and identify the right messages, and then update the parser. -- Jonas On 21/04/2020 15.13, Robert Sander wrote: > Hi Jonas, > > On 21.04.20 14:47, Jonas Jelten wrote: > >> I hope my script still works for you. If you need any help, I'll see what I >> can do :) > > The script currently does not find the info it needs and wants us to > increase to logging level. > > We set the logging level to 10 and tried to restart the OSD (which > resulted in a crash) but the script still is not able to find the info. > > Regards > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db symlink missing after each reboot
Hi Igor, Am 21.04.20 um 15:52 schrieb Igor Fedotov: > Hi Stefan, > > I think that's the cause: > > https://tracker.ceph.com/issues/42928 thanks yes that matches. Is there any way to fix this manually? And is this also related to: https://tracker.ceph.com/issues/44509 Greets, Stefan > > On 4/21/2020 4:02 PM, Stefan Priebe - Profihost AG wrote: >> Hi there, >> >> i've a bunch of hosts where i migrated HDD only OSDs to hybird ones >> using: >> sudo -E -u ceph -- bash -c 'ceph-bluestore-tool --path >> /var/lib/ceph/osd/ceph-${OSD} bluefs-bdev-new-db --dev-target >> /dev/bluefs_db1/db-osd${OSD}' >> >> while this worked fine and each OSD was running fine. >> >> It looses it's block.db symlink after reboot. >> >> If i manually recreate the block.db symlink inside: >> /var/lib/ceph/osd/ceph-* >> >> all osds start fine. Can anybody help who creates those symlinks and why >> they're not created automatically in case of migrated db? >> >> Greets, >> Stefan >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db symlink missing after each reboot
On 4/21/2020 4:59 PM, Stefan Priebe - Profihost AG wrote: Hi Igor, Am 21.04.20 um 15:52 schrieb Igor Fedotov: Hi Stefan, I think that's the cause: https://tracker.ceph.com/issues/42928 thanks yes that matches. Is there any way to fix this manually? I think so - AFAIK missed tags are pure LVM stuff and hence can be set by regular LVM tools. ceph-volume does that during OSD provisioning as well. But unfortunately I haven't dived into this topic deeper yet. So can't provide you with the details how to fix this step-by-step. And is this also related to: https://tracker.ceph.com/issues/44509 Probably unrelated. That's either a different bug or rather some artifact from RocksDB/BlueFS interaction. Leaving a request for more info in the ticket... Greets, Stefan On 4/21/2020 4:02 PM, Stefan Priebe - Profihost AG wrote: Hi there, i've a bunch of hosts where i migrated HDD only OSDs to hybird ones using: sudo -E -u ceph -- bash -c 'ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} bluefs-bdev-new-db --dev-target /dev/bluefs_db1/db-osd${OSD}' while this worked fine and each OSD was running fine. It looses it's block.db symlink after reboot. If i manually recreate the block.db symlink inside: /var/lib/ceph/osd/ceph-* all osds start fine. Can anybody help who creates those symlinks and why they're not created automatically in case of migrated db? Greets, Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: some ceph general questions about the design
Hi Anthony, You bring a very valid point. My advice is to carefully chose the HBA and the disks and do extensive testing during the initial phase of the project and have controlled Firmware upgrade campains with a good pre-production setup. In a multiple RAID-0 scenario, there are some parameters you need to disable such as rebuild priority or consistency check if you don't want your entire OSD server to temporarily go down in case of a single drive failure. The points you bring are valid too with SATA flash disks as you have to deal with Disks, HBA and sometimes Backplane Firmware. - Antoine PS: the "preserved cache" issue you're refering too... I had to ditch an HBA that had that "feature" during my initial hardware tests. It was dramatically affecting the stability of the entire OSD. From: Anthony D'Atri Sent: Tuesday, April 21, 2020 2:59 AM To: ceph-users Subject: [ceph-users] Re: some ceph general questions about the design > > 1. shoud i use a raid controller a create for example a raid 5 with all disks > on each osd server? or should i passtrough all disks to ceph osd? > > If your OSD servers have HDDs, buy a good RAID Controller with a > battery-backed write cache and configure it using multiple RAID-0 volumes (1 > physical disk per volume). That way, reads and write will be accelerated by > the cache on the HBA. I’ve lived this scenario and hated it. Multiple firmware and manufacturing issues, batteries/supercaps can fail and need to be monitored, bugs causing staged data to be lost before writing to disk, another bug that required replacing the card if there was preserved cache for a failed drive, because it would refuse to boot, difficulties in drive monitoring, HBA monitoring utility that would lock the HBA or peg the CPU, the list goes on. For the additional cost of RoC, cache RAM, supercap to (fingers crossed) protect the cache, all the additional monitoring and hands work … you might find that SATA SSDs on a JBOD HBA are no more expensive. > 3. if i have a 3 physically node osd cluster, did i need 5 physicall mons? > No. 3 MON are enough If you have good hands and spares. If your cluster is on a different continent and colo hands can’t find their own butts ….. it’s nice to survive a double failure. ymmv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: PG deep-scrub does not finish
Hi Brad, Indeed - osd.694 kept crashing with a read error (medium error on the hard drive), and got restarted by systemd. So net net the system ended up in an infinite loop of deep scrub attempts on the PG for a week. Typically when a scrub encounters a read error, I get an inconsistent placement group, not an OSD crash and an infinite loop of scrub attempts. With the inconsistent placement group a pg repair fixes the read error (by reallocating the sector inside the drive). Here is the stack trace of the osd.694 crash on the scrub read error: Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 Sense Key : Medium Error [current] [descriptor] Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 Add. Sense: Unrecovered read error Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 CDB: Read(10) 28 00 6e a9 7a 30 00 00 80 00 Apr 19 03:39:17 popeye-oss-3-03 kernel: print_req_error: critical medium error, dev sdz, sector 14852804992 Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 2020-04-19 03:39:17.095 7fffd2e2b700 -1 bluestore(/var/lib/ceph/osd/ceph-694) _do_read bdev-read failed: (61) No data available Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' thread 7fffd2e2b700 time 2020-04-19 03:39:17.099677 Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/os/bluestore/BlueStore.cc: 9214: FAILED ceph_assert(r == 0) Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: ceph version 14.2.8 (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable) Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55a1ea4d] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 2: (()+0x4cac15) [0x55a1ec15] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 3: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x3512) [0x55f64132] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 4: (BlueStore::read(boost::intrusive_ptr&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1b8) [0x55f64778] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 5: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap::object&)+0x2c2) [0x55deb832] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 6: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x663) [0x55d082c3] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 7: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x8b) [0x55bbaacb] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 8: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x181c) [0x55be4fcc] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 9: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x4bb) [0x55be61db] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 10: (PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x12) [0x55d8c7b2] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x55b1898f] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x560bd056] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x560bfb70] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 14: (()+0x7e65) [0x75025e65] Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 15: (clone()+0x6d) [0x73ee988d] I ended up recreating the OSD (and thus overwriting all data) to fix the issue. Andras On 4/20/20 9:28 PM, Brad Hubbard wrote: On Mon, Apr 20, 2020 at 11:01 PM Andras Pataki wrote: On a cluster running Nautilus (14.2.8), we are getting a complaint about a PG not being deep-scrubbed on time. Looking at the primary OSD's logs, it looks like it tries to deep-scrub the PG every hour or so, emits some complaints that I don't understand, but the deep scrub does not finish (either with or without a scrub error). Here is the PG from pg dump: 1.43f 31794 00 0 0 66930087214 0 0 3004 3004 active+clean+scrubbing+deep 2020-04-20 04:48:13.055481 46286'483734 46286:563439 [354,694,851]354 [354,694,851]354 3959
[ceph-users] Re: block.db symlink missing after each reboot
Hi Igor, mhm i updated the missing lv tags: # lvs -o lv_tags /dev/ceph-3a295647-d5a1-423c-81dd-1d2b32d7c4c5/osd-block-c2676c5f-111c-4603-b411-473f7a7638c2 | tr ',' '\n' | sort LV Tags ceph.block_device=/dev/ceph-3a295647-d5a1-423c-81dd-1d2b32d7c4c5/osd-block-c2676c5f-111c-4603-b411-473f7a7638c2 ceph.block_uuid=0wBREi-I5t1-UeUa-EvbA-sET0-S9O0-VaxOgg ceph.cephx_lockbox_secret= ceph.cluster_fsid=7e242332-55c3-4926-9646-149b2f5c8081 ceph.cluster_name=ceph ceph.crush_device_class=None ceph.db_device=/dev/bluefs_db1/db-osd0 ceph.db_uuid=UUw35K-YnNT-HZZE-IfWd-Rtxn-0eVW-kTuQmj ceph.encrypted=0 ceph.osd_fsid=c2676c5f-111c-4603-b411-473f7a7638c2 ceph.osd_id=0 ceph.type=block ceph.vdo=0 # lvdisplay /dev/bluefs_db1/db-osd0 --- Logical volume --- LV Path/dev/bluefs_db1/db-osd0 LV Namedb-osd0 VG Namebluefs_db1 LV UUIDUUw35K-YnNT-HZZE-IfWd-Rtxn-0eVW-kTuQmj LV Write Accessread/write LV Creation host, time cloud10-1517, 2020-02-28 21:32:48 +0100 LV Status available # open 0 LV Size185,00 GiB Current LE 47360 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 but lvm trigger still says: # /usr/sbin/ceph-volume lvm trigger 0-c2676c5f-111c-4603-b411-473f7a7638c2 --> RuntimeError: could not find db with uuid UUw35K-YnNT-HZZE-IfWd-Rtxn-0eVW-kTuQmj Mit freundlichen Grüßen Stefan Priebe Bachelor of Science in Computer Science (BSCS) Vorstand (CTO) --- Profihost AG Expo Plaza 1 30539 Hannover Deutschland Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282 URL: http://www.profihost.com | E-Mail: i...@profihost.com Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827 Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350 Vorstand: Cristoph Bluhm, Stefan Priebe Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender) Am 21.04.20 um 16:07 schrieb Igor Fedotov: > On 4/21/2020 4:59 PM, Stefan Priebe - Profihost AG wrote: >> Hi Igor, >> >> Am 21.04.20 um 15:52 schrieb Igor Fedotov: >>> Hi Stefan, >>> >>> I think that's the cause: >>> >>> https://tracker.ceph.com/issues/42928 >> thanks yes that matches. Is there any way to fix this manually? > > I think so - AFAIK missed tags are pure LVM stuff and hence can be set > by regular LVM tools. > > ceph-volume does that during OSD provisioning as well. But > unfortunately I haven't dived into this topic deeper yet. So can't > provide you with the details how to fix this step-by-step. > >> >> And is this also related to: >> https://tracker.ceph.com/issues/44509 > > Probably unrelated. That's either a different bug or rather some > artifact from RocksDB/BlueFS interaction. > > Leaving a request for more info in the ticket... > >> >> Greets, >> Stefan >> >>> On 4/21/2020 4:02 PM, Stefan Priebe - Profihost AG wrote: Hi there, i've a bunch of hosts where i migrated HDD only OSDs to hybird ones using: sudo -E -u ceph -- bash -c 'ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-${OSD} bluefs-bdev-new-db --dev-target /dev/bluefs_db1/db-osd${OSD}' while this worked fine and each OSD was running fine. It looses it's block.db symlink after reboot. If i manually recreate the block.db symlink inside: /var/lib/ceph/osd/ceph-* all osds start fine. Can anybody help who creates those symlinks and why they're not created automatically in case of migrated db? Greets, Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Rebuilding the Ceph.io site with Jekyll
Hi all, as part of the Ceph Foundation, we're considering to re-launch the Ceph website and migrate it away from a dated WordPress to Jekyll, backed by Git et al. (Either hosted on our own infrastructure or even GitHub pages.) This would involve building/customizing a Jekyll theme, providing feedback on the site structure proposal and usability, migrating content (where appropriate) from the existing site, and working with the Ceph infra team on getting it hosted/deployed. Some help with improving the design would be welcome. Content creation isn't necessarily part of the requirements, but working with stakeholders on filling in blanks is; and if we could get someone savvy with Ceph who wants to fill in a few pages, that's a plus! After the launch, we should be mostly self-sufficient again for day-to-day tasks. If that's the kind of contract work you or a friend is interested in, please reach out to me. (The Foundation hasn't yet approved the budget, we're still trying to get a feeling for the funding required. But I'd be fairly optimistic.) Regards, Lars -- SUSE Software Solutions Germany GmbH, MD: Felix Imendörffer, HRB 36809 (AG Nürnberg) "Architects should open possibilities and not determine everything." (Ueli Zbinden) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: PG deep-scrub does not finish
Looks like that drive is dying. On Wed, Apr 22, 2020 at 12:25 AM Andras Pataki wrote: > > Hi Brad, > > Indeed - osd.694 kept crashing with a read error (medium error on the > hard drive), and got restarted by systemd. So net net the system ended > up in an infinite loop of deep scrub attempts on the PG for a week. > Typically when a scrub encounters a read error, I get an inconsistent > placement group, not an OSD crash and an infinite loop of scrub > attempts. With the inconsistent placement group a pg repair fixes the > read error (by reallocating the sector inside the drive). > > Here is the stack trace of the osd.694 crash on the scrub read error: > > Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 FAILED > Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 Sense > Key : Medium Error [current] [descriptor] > Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 Add. > Sense: Unrecovered read error > Apr 19 03:39:17 popeye-oss-3-03 kernel: sd 14:0:27:0: [sdz] tag#1 CDB: > Read(10) 28 00 6e a9 7a 30 00 00 80 00 > Apr 19 03:39:17 popeye-oss-3-03 kernel: print_req_error: critical medium > error, dev sdz, sector 14852804992 > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 2020-04-19 03:39:17.095 > 7fffd2e2b700 -1 bluestore(/var/lib/ceph/osd/ceph-694) _do_read bdev-read > failed: (61) No data available > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/os/bluestore/BlueStore.cc: > In function 'int BlueStore::_do_read(BlueStore::Collection*, > BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, > uint64_t)' thread 7fffd2e2b700 time 2020-04-19 03:39:17.099677 > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/src/os/bluestore/BlueStore.cc: > 9214: FAILED ceph_assert(r == 0) > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: ceph version 14.2.8 > (2d095e947a02261ce61424021bb43bd3022d35cb) nautilus (stable) > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 1: > (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x14a) [0x55a1ea4d] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 2: (()+0x4cac15) [0x55a1ec15] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 3: > (BlueStore::_do_read(BlueStore::Collection*, > boost::intrusive_ptr, unsigned long, unsigned long, > ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x3512) > [0x55f64132] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 4: > (BlueStore::read(boost::intrusive_ptr&, > ghobject_t const&, unsigned long, unsigned long, > ceph::buffer::v14_2_0::list&, unsigned int)+0x1b8) [0x55f64778] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 5: > (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&, > ScrubMapBuilder&, ScrubMap::object&)+0x2c2) [0x55deb832] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 6: > (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x663) > [0x55d082c3] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 7: > (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, > hobject_t, bool, ThreadPool::TPHandle&)+0x8b) [0x55bbaacb] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 8: > (PG::chunky_scrub(ThreadPool::TPHandle&)+0x181c) [0x55be4fcc] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 9: (PG::scrub(unsigned int, > ThreadPool::TPHandle&)+0x4bb) [0x55be61db] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 10: (PGScrub::run(OSD*, > OSDShard*, boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x12) > [0x55d8c7b2] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 11: > (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x90f) [0x55b1898f] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 12: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) > [0x560bd056] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 13: > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x560bfb70] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 14: (()+0x7e65) [0x75025e65] > Apr 19 03:39:17 popeye-oss-3-03 ceph-osd: 15: (clone()+0x6d) > [0x73ee988d] > > I ended up recreating the OSD (and thus overwriting all data) to fix the > issue. > > Andras > > > On 4/20/20 9:28 PM, Brad Hubbard wrote: > > On Mon, Apr 20, 2020 at 11:01 PM Andras Pataki > > wrote: > >> On a cluster running Nautilus (14.2.8), we are getting a complaint about > >> a PG not being deep-scrubbed on time. Looking at the primary OSD's > >> logs, it looks like it tries to deep-scrub the PG every hour or so, > >> emits some complaints that I don't understand, but the deep scrub does > >> not finish (either with or without a scrub error). > >> > >> Here is the PG from pg dump: > >> > >> 1.43f