Can you include more or your osd log file?
On July 28, 2018 9:46:16 AM CDT, ceph.nov...@habmalnefrage.de wrote: >Dear users and developers. > >I've updated our dev-cluster from v13.2.0 to v13.2.1 yesterday and >since then everything is badly broken. >I've restarted all Ceph components via "systemctl" and also rebootet >the server SDS21 and SDS24, nothing changes. > >This cluster started as Kraken, was updated to Luminous (up to v12.2.5) >and then to Mimic. > >Here are some system related infos, see >https://semestriel.framapad.org/p/DTkBspmnfU > >Somehow I guess this may have to do with the various "ceph-disk", >"ceph-volume", ceph-lvm" changes in the last months?!? > >Thanks & regards > Anton > >------------------------------------------------------ > > > >Gesendet: Samstag, 28. Juli 2018 um 00:22 Uhr >Von: "Bryan Stillwell" <bstillw...@godaddy.com> >An: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> >Betreff: Re: [ceph-users] v13.2.1 Mimic released > >I decided to upgrade my home cluster from Luminous (v12.2.7) to Mimic >(v13.2.1) today and ran into a couple issues: > >1. When restarting the OSDs during the upgrade it seems to forget my >upmap settings. I had to manually return them to the way they were >with commands like: > >ceph osd pg-upmap-items 5.1 11 18 8 6 9 0 >ceph osd pg-upmap-items 5.1f 11 17 > >I also saw this when upgrading from v12.2.5 to v12.2.7. > >2. Also after restarting the first OSD during the upgrade I saw 21 >messages like these in ceph.log: > >2018-07-27 15:53:49.868552 osd.1 osd.1 10.0.0.207:6806/4029643 97 : >cluster [WRN] failed to encode map e100467 with expected crc >2018-07-27 15:53:49.922365 osd.6 osd.6 10.0.0.16:6804/90400 25 : >cluster [WRN] failed to encode map e100467 with expected crc >2018-07-27 15:53:49.925585 osd.6 osd.6 10.0.0.16:6804/90400 26 : >cluster [WRN] failed to encode map e100467 with expected crc >2018-07-27 15:53:49.944414 osd.18 osd.18 10.0.0.15:6808/120845 8 : >cluster [WRN] failed to encode map e100467 with expected crc >2018-07-27 15:53:49.944756 osd.17 osd.17 10.0.0.15:6800/120749 13 : >cluster [WRN] failed to encode map e100467 with expected crc > >Is this a sign that full OSD maps were sent out by the mons to every >OSD like back in the hammer days? I seem to remember that OSD maps >should be a lot smaller now, so maybe this isn't as big of a problem as >it was back then? > >Thanks, >Bryan > > >From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Sage >Weil <sw...@redhat.com> >Date: Friday, July 27, 2018 at 1:25 PM >To: "ceph-annou...@lists.ceph.com" <ceph-annou...@lists.ceph.com>, >"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>, >"ceph-maintain...@lists.ceph.com" <ceph-maintain...@lists.ceph.com>, >"ceph-de...@vger.kernel.org" <ceph-de...@vger.kernel.org> >Subject: [ceph-users] v13.2.1 Mimic released > > > >This is the first bugfix release of the Mimic v13.2.x long term stable >release > >series. This release contains many fixes across all components of Ceph, > >including a few security fixes. We recommend that all users upgrade. > > > >Notable Changes > >-------------- > > > >* CVE 2018-1128: auth: cephx authorizer subject to replay attack >(issue#24836 http://tracker.ceph.com/issues/24836, Sage Weil) > >* CVE 2018-1129: auth: cephx signature check is weak (issue#24837 >http://tracker.ceph.com/issues/24837[http://tracker.ceph.com/issues/24837], >Sage Weil) > >* CVE 2018-10861: mon: auth checks not correct for pool ops >(issue#24838 > >* ><http://tracker.ceph.com/issues/24838[http://tracker.ceph.com/issues/24838], >Jason Dillaman) > > > >For more details and links to various issues and pull requests, please > >refer to the ceph release blog at >https://ceph.com/releases/13-2-1-mimic-released[https://ceph.com/releases/13-2-1-mimic-released] > > > >Changelog > >--------- > >* bluestore: common/hobject: improved hash calculation for hobject_t >etc (pr#22777, Adam Kupczyk, Sage Weil) > >* bluestore,core: mimic: os/bluestore: don't store/use >path_block.{db,wal} from meta (pr#22477, Sage Weil, Alfredo Deza) > >* bluestore: os/bluestore: backport 24319 and 24550 (issue#24550, >issue#24502, issue#24319, issue#24581, pr#22649, Sage Weil) > >* bluestore: os/bluestore: fix incomplete faulty range marking when >doing compression (pr#22910, Igor Fedotov) > >* bluestore: spdk: fix ceph-osd crash when activate SPDK (issue#24472, >issue#24371, pr#22684, tone-zhang) > >* build/ops: build/ops: ceph.git has two different versions of dpdk in >the source tree (issue#24942, issue#24032, pr#23070, Kefu Chai) > >* build/ops: build/ops: install-deps.sh fails on newest openSUSE Leap >(issue#25065, pr#23178, Kyr Shatskyy) > >* build/ops: build/ops: Mimic build fails with -DWITH_RADOSGW=0 >(issue#24766, pr#22851, Dan Mick) > >* build/ops: cmake: enable RTTI for both debug and release RocksDB >builds (pr#22299, Igor Fedotov) > >* build/ops: deb/rpm: add python-six as build-time and run-time >dependency (issue#24885, pr#22948, Nathan Cutler, Kefu Chai) > >* build/ops: deb,rpm: fix block.db symlink ownership (pr#23246, Sage >Weil) > >* build/ops: include: fix build with older clang (OSX target) >(pr#23049, Christopher Blum) > >* build/ops: include: fix build with older clang (pr#23034, Kefu Chai) > >* build/ops,rbd: build/ops: order rbdmap.service before >remote-fs-pre.target (issue#24713, issue#24734, pr#22843, Ilya Dryomov) > >* cephfs: cephfs: allow prohibiting user snapshots in CephFS >(issue#24705, issue#24284, pr#22812, "Yan, Zheng") > >* cephfs: cephfs-journal-tool: Fix purging when importing an >zero-length journal (issue#24861, pr#22981, yupeng chen, zhongyan gu) > >* cephfs: client: fix bug #24491 _ll_drop_pins may access invalid >iterator (issue#24534, pr#22791, Liu Yangkuan) > >* cephfs: client: update inode fields according to issued caps >(issue#24539, issue#24269, pr#22819, "Yan, Zheng") > >* cephfs: common/DecayCounter: set last_decay to current time when >decoding dec… (issue#24440, issue#24537, pr#22816, Zhi Zhang) > >* cephfs,core: mon/MDSMonitor: do not send redundant MDS health >messages to cluster log (issue#24308, issue#24330, pr#22265, Sage Weil) > >* cephfs: mds: add magic to header of open file table (issue#24541, >issue#24240, pr#22841, "Yan, Zheng") > >* cephfs: mds: low wrlock efficiency due to dirfrags traversal >(issue#24704, issue#24467, pr#22884, Xuehan Xu) > >* cephfs: PurgeQueue sometimes ignores Journaler errors (issue#24533, >issue#24703, pr#22810, John Spray) > >* cephfs,rbd: osdc: Fix the wrong BufferHead offset (issue#24583, >pr#22869, dongdong tao) > >* cephfs: repeated eviction of idle client until some IO happens >(issue#24052, issue#24296, pr#22550, "Yan, Zheng") > >* cephfs: test gets ENOSPC from bluestore block device (issue#24238, >issue#24913, issue#24899, issue#24758, pr#22835, Patrick Donnelly, Sage >Weil) > >* cephfs,tests: pjd: cd: too many arguments (issue#24310, pr#22882, >Neha Ojha) > >* cephfs,tests: qa: client socket inaccessible without sudo >(issue#24872, issue#24904, pr#23030, Patrick Donnelly) > >* cephfs,tests: qa: fix ffsb cd argument (issue#24719, issue#24829, >issue#24680, issue#24579, pr#22956, Yan, Zheng, Patrick Donnelly) > >* cephfs,tests: qa/suites: Add supported-random-distro$ links >(issue#24706, issue#24138, pr#22700, Warren Usui) > >* ceph-volume describe better the options for migrating away from >ceph-disk (pr#22514, Alfredo Deza) > >* ceph-volume dmcrypt and activate --all documentation updates >(pr#22529, Alfredo Deza) > >* ceph-volume: error on commands that need ceph.conf to operate >(issue#23941, pr#22747, Andrew Schoen) > >* ceph-volume expand on the LVM API to create multiple LVs at different >sizes (pr#22508, Alfredo Deza) > >* ceph-volume initial take on auto sub-command (pr#22515, Alfredo Deza) > >* ceph-volume lvm.activate Do not search for a MON configuration >(pr#22398, Wido den Hollander) > >* ceph-volume lvm.common use destroy-new, doesn't need admin keyring >(issue#24585, pr#22900, Alfredo Deza) > >* ceph-volume: provide a nice errror message when missing ceph.conf >(pr#22832, Andrew Schoen) > >* ceph-volume tests destroy osds on monitor hosts (pr#22507, Alfredo >Deza) > >* ceph-volume tests do not include admin keyring in OSD nodes >(pr#22425, Alfredo Deza) > >* ceph-volume tests.functional install new ceph-ansible dependencies >(pr#22535, Alfredo Deza) > >* ceph-volume: tests/functional run lvm list after OSD provisioning >(issue#24961, pr#23148, Alfredo Deza) > >* ceph-volume tests/functional use Ansible 2.6 (pr#23244, Alfredo Deza) > >* ceph-volume: unmount lvs correctly before zapping (issue#24796, >pr#23127, Andrew Schoen) > >* cmake: bump up the required boost version to 1.67 (pr#22412, Kefu >Chai) > >* common: common: Abort in OSDMap::decode() during >qa/standalone/erasure-code/test-erasure-eio.sh (issue#24865, >issue#23492, pr#23024, Sage Weil) > >* common: common: fix typo in rados bench write JSON output >(issue#24292, issue#24199, pr#22406, Sandor Zeestraten) > >* common,core: common: partially revert 95fc248 to make >get_process_name work (issue#24123, issue#24215, pr#22311, Mykola >Golub) > >* common: osd: Change osd_skip_data_digest default to false and make it >LEVEL_DEV (pr#23084, Sage Weil, David Zafman) > >* common: tell ... config rm <foo> not idempotent (issue#24468, >issue#24408, pr#22552, Sage Weil) > >* core: bluestore: flush_commit is racy (issue#24261, issue#21480, >pr#22382, Sage Weil) > >* core: ceph osd safe-to-destroy crashes the mgr (issue#24708, >issue#23249, pr#22805, Sage Weil) > >* core: change default filestore_merge_threshold to -10 (issue#24686, >issue#24747, pr#22813, Douglas Fuller) > >* core: common/hobject: improved hash calculation (pr#22722, Adam >Kupczyk) > >* core: cosbench stuck at booting cosbench driver (issue#24473, >pr#22887, Neha Ojha) > >* core: librados: fix buffer overflow for aio_exec python binding >(issue#24475, pr#22707, Aleksei Gutikov) > >* core: mon: enable level_compaction_dynamic_level_bytes for rocksdb >(issue#24375, issue#24361, pr#22361, Kefu Chai) > >* core: mon/MgrMonitor: change 'unresponsive' message to info level >(issue#24246, issue#24222, pr#22333, Sage Weil) > >* core: mon/OSDMonitor: no_reply on MOSDFailure messages (issue#24322, >issue#24350, pr#22297, Sage Weil) > >* core: os/bluestore: firstly delete db then delete bluefs if open db >met error (pr#22525, Jianpeng Ma) > >* core: os/bluestore: fix races on SharedBlob::coll in ~SharedBlob >(issue#24859, issue#24887, pr#23065, Radoslaw Zarzynski) > >* core: osd: choose_acting loop (issue#24383, issue#24618, pr#22889, >Neha Ojha) > >* core: osd: do not blindly roll forward to log.head (issue#24597, >pr#22997, Sage Weil) > >* core: osd: eternal stuck PG in 'unfound_recovery' (issue#24500, >issue#24373, pr#22545, Sage Weil) > >* core: osd: fix deep scrub with osd_skip_data_digest=true (default) >and blue… (issue#24922, issue#24958, pr#23094, Sage Weil) > >* core: osd: fix getting osd maps on initial osd startup (pr#22651, >Paul Emmerich) > >* core: osd: increase default hard pg limit (issue#24355, pr#22621, >Josh Durgin) > >* core: osd: may get empty info at recovery (issue#24771, issue#24588, >pr#22861, Sage Weil) > >* core: osd/PrimaryLogPG: rebuild attrs from clients (issue#24768, >issue#24805, pr#22960, Sage Weil) > >* core: osd: retry to read object attrs at EC recovery (issue#24406, >pr#22394, xiaofei cui) > >* core: osd/Session: fix invalid iterator dereference in >Sessoin::have_backoff() (issue#24486, issue#24494, pr#22730, Sage Weil) > >* core: PG: add custom_reaction Backfilled and release reservations >after bac… (issue#24332, pr#22559, Neha Ojha) > >* core: set correctly shard for existed Collection (issue#24769, >issue#24761, pr#22859, Jianpeng Ma) > >* core,tests: Bring back diff -y for non-FreeBSD (issue#24738, >issue#24470, pr#22826, Sage Weil, David Zafman) > >* core,tests: ceph_test_rados_api_misc: fix >LibRadosMiscPool.PoolCreationRace (issue#24204, issue#24150, pr#22291, >Sage Weil) > >* core,tests: qa/workunits/suites/blogbench.sh: use correct dir name >(pr#22775, Neha Ojha) > >* core,tests: Wip scrub omap (issue#24366, issue#24381, pr#22374, David >Zafman) > >* core,tools: ceph-detect-init: stop using platform.linux_distribution >(issue#18163, pr#21523, Nathan Cutler) > >* core: ValueError: too many values to unpack due to lack of subdir >(issue#24617, pr#22888, Neha Ojha) > >* doc: ceph-bluestore-tool manpage not getting rendered correctly >(issue#25062, issue#24800, pr#23176, Nathan Cutler) > >* doc: doc: update experimental features - snapshots (pr#22803, Jos >Collin) > >* doc: fix the links in releases/schedule.rst (pr#22372, Kefu Chai) > >* doc: [mimic] doc/cephfs: remove lingering "experimental" note about >multimds (pr#22854, John Spray) > >* lvm: when osd creation fails log the exception (issue#24456, >pr#22640, Andrew Schoen) > >* mgr/dashboard: Fix bug when creating S3 keys (pr#22468, Volker >Theile) > >* mgr/dashboard: fix lint error caused by codelyzer update (pr#22713, >Tiago Melo) > >* mgr/dashboard: Fix some datatable CSS issues (pr#22274, Volker >Theile) > >* mgr/dashboard: Float numbers incorrectly formatted (issue#24081, >issue#24707, pr#22886, Stephan Müller, Tiago Melo) > >* mgr/dashboard: Missing breadcrumb on monitor performance counters >page (issue#24764, pr#22849, Ricardo Marques, Tiago Melo) > >* mgr/dashboard: Replace Pool with Pools (issue#24699, pr#22807, Lenz >Grimmer) > >* mgr: mgr/dashboard: Listen on port 8443 by default and not 8080 >(pr#22449, Wido den Hollander) > >* mgr,mon: exception for dashboard in config-key warning (pr#22770, >John Spray) > >* mgr,pybind: Python bindings use iteritems method which is not Python >3 compatible (issue#24803, issue#24779, pr#22917, Nathan Cutler) > >* mgr: Sync up ceph-mgr prometheus related changes (pr#22341, Boris >Ranto) > >* mon: don't require CEPHX_V2 from mons until nautilus (pr#23233, Sage >Weil) > >* mon/OSDMonitor: Respect paxos_propose_interval (pr#22268, Xiaoxi >CHEN) > >* osd: forward-port osd_distrust_data_digest from luminous (pr#23184, >Sage Weil) > >* osd/OSDMap: fix CEPHX_V2 osd requirement to nautilus, not mimic >(pr#23250, Sage Weil) > >* qa/rgw: disable testing on ec-cache pools (issue#23965, pr#23096, >Casey Bodley) > >* qa/suites/upgrade/mimic-p2p: allow target version to apply (pr#23262, >Sage Weil) > >* qa/tests: added supported distro for powercycle suite (pr#22224, Yuri >Weinstein) > >* qa/tests: changed distro symlink to point to new way using supported >OSes (pr#22653, Yuri Weinstein) > >* rbd: librbd: deep_copy: resize head object map if needed >(issue#24499, issue#24399, pr#22768, Mykola Golub) > >* rbd: librbd: fix crash when opening nonexistent snapshot >(issue#24637, issue#24698, pr#22943, Mykola Golub) > >* rbd: librbd: force 'invalid object map' flag on-disk update >(issue#24496, issue#24434, pr#22754, Mykola Golub) > >* rbd: librbd: utilize the journal disabled policy when removing images >(issue#24388, issue#23512, pr#22662, Jason Dillaman) > >* rbd: Prevent the use of internal feature bits from outside cls/rbd >(issue#24165, issue#24203, pr#22222, Jason Dillaman) > >* rbd: rbd-mirror daemon failed to stop on active/passive test case >(issue#24390, pr#22667, Jason Dillaman) > >* rbd: [rbd-mirror] entries_behind_master will not be zero after mirror >over (issue#24391, issue#23516, pr#22549, Jason Dillaman) > >* rbd: rbd-mirror simple image map policy doesn't always level-load >instances (issue#24519, issue#24161, pr#22892, Venky Shankar) > >* rbd: rbd trash purge --threshold should support data pool >(issue#24476, issue#22872, pr#22891, Mahati Chamarthy) > >* rbd,tests: qa: krbd_exclusive_option.sh: bump lock_timeout to 60 >seconds (issue#25081, pr#23209, Ilya Dryomov) > >* rbd: yet another case when deep copying a clone may result in invalid >object map (issue#24596, issue#24545, pr#22894, Mykola Golub) > >* rgw: cls_bucket_list fails causes cascading osd crashes (issue#24631, >issue#24117, pr#22927, Yehuda Sadeh) > >* rgw: multisite: RGWSyncTraceNode released twice and crashed in reload >(issue#24432, issue#24619, pr#22926, Tianshan Qu) > >* rgw: objects in cache never refresh after rgw_cache_expiry_interval >(issue#24346, issue#24385, pr#22643, Casey Bodley) > >* rgw: add configurable AWS-compat invalid range get behavior >(issue#24317, issue#24352, pr#22590, Matt Benjamin) > >* rgw: Admin OPS Api overwrites email when user is modified >(issue#24253, pr#22523, Volker Theile) > >* rgw: fix gc may cause a large number of read traffic (issue#24807, >issue#24767, pr#22941, Xin Liao) > >* rgw: have a configurable authentication order (issue#23089, >issue#24547, pr#22842, Abhishek Lekshmanan) > >* rgw: index complete miss zones_trace set (issue#24701, issue#24590, >pr#22818, Tianshan Qu) > >* rgw: Invalid Access-Control-Request-Request may bypass >validate_cors_rule_method (issue#24809, issue#24223, pr#22935, Jeegn >Chen) > >* rgw: meta and data notify thread miss stop cr manager (issue#24702, >issue#24589, pr#22821, Tianshan Qu) > >* rgw:-multisite: endless loop in RGWBucketShardIncrementalSyncCR >(issue#24700, issue#24603, pr#22815, cfanz) > >* rgw: performance regression for luminous 12.2.4 (issue#23379, >issue#24633, pr#22929, Mark Kogan) > >* rgw: radogw-admin reshard status command should print text for >reshar… (issue#24834, issue#23257, pr#23021, Orit Wasserman) > >* rgw: "radosgw-admin objects expire" always returns ok even if the >pro… (issue#24831, issue#24592, pr#23001, Zhang Shaowen) > >* rgw: require --yes-i-really-mean-it to run radosgw-admin orphans find >(issue#24146, issue#24843, pr#22986, Matt Benjamin) > >* rgw: REST admin metadata API paging failure bucket & bucket.instance: >InvalidArgument (issue#23099, issue#24813, pr#22933, Matt Benjamin) > >* rgw: set cr state if aio_read err return in >RGWCloneMetaLogCoroutine:state_send_rest_request (issue#24566, >issue#24783, pr#22880, Tianshan Qu) > >* rgw: test/rgw: fix for bucket checkpoints (issue#24212, issue#24313, >pr#22466, Casey Bodley) > >* rgw,tests: add unit test for cls bi list command (issue#24736, >issue#24483, pr#22845, Orit Wasserman) > >* tests: mimic - qa/tests: Set ansible-version: 2.4 (issue#24926, >pr#23122, Yuri Weinstein) > >* tests: osd sends op_reply out of order (issue#25010, pr#23136, Neha >Ojha) > >* tests: qa/tests - added overrides stanza to allow runs on ovh on rhel >OS (pr#23156, Yuri Weinstein) > >* tests: qa/tests - added skeleton for mimic point to point upgrades >testing (pr#22697, Yuri Weinstein) > >* tests: qa/tests: fix supported distro lists for ceph-deploy >(pr#23017, Vasu Kulkarni) > >* tests: qa: wait longer for osd to flush pg stats (issue#24321, >pr#22492, Kefu Chai) > >* tests: tests: Health check failed: 1 MDSs report slow requests >(MDS_SLOW_REQUEST) in powercycle (issue#25034, pr#23154, Neha Ojha) > >* tests: tests: make test_ceph_argparse.py pass on py3-only systems >(issue#24825, issue#24816, pr#22988, Nathan Cutler) > >* tests: upgrade/luminous-x: whitelist REQUEST_SLOW for >rados_mon_thrash (issue#25056, issue#25051, pr#23164, Nathan Cutler) > > > >Getting ceph: > >* Git at git://github.com/ceph/ceph.git > >* Tarball at >http://download.ceph.com/tarballs/ceph-13.2.1.tar.gz[http://download.ceph.com/tarballs/ceph-13.2.1.tar.gz] > >* For packages, see >http://docs.ceph.com/docs/master/install/get-packages/[http://docs.ceph.com/docs/master/install/get-packages/] > >* Release git sha1: 5533ecdc0fda920179d7ad84e0aa65a127b20d77 >_______________________________________________ ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com] > > >-- >To unsubscribe from this list: send the line "unsubscribe ceph-devel" >in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com