This will cap single bluefs space allocation. Currently it attempts to allocate 70Gb which seems to overflow some 32-bit length fields. With the adjustment such allocation should be capped at ~700MB.

I doubt there is any relation between this specific failure and the pool. At least at the moment.

In short the history is: starting OSD tries to flush bluefs data to disk, detects lack of space and asks for more from main device - allocations succeeds but returned extent has length field set to 0.

On 7/9/2019 8:33 PM, Brett Chancellor wrote:
What does bluestore_bluefs_gift_ratio do?  I can't find any documentation on it.  Also do you think this could be related to the .rgw.meta pool having too many objects per PG? The disks that die always seem to be backfilling a pg from that pool, and they have ~550k objects per PG.

-Brett

On Tue, Jul 9, 2019 at 1:03 PM Igor Fedotov <ifedo...@suse.de <mailto:ifedo...@suse.de>> wrote:

    Please try to set bluestore_bluefs_gift_ratio to 0.0002


    On 7/9/2019 7:39 PM, Brett Chancellor wrote:
    Too large for pastebin.. The problem is continually crashing new
    OSDs. Here is the latest one.

    On Tue, Jul 9, 2019 at 11:46 AM Igor Fedotov <ifedo...@suse.de
    <mailto:ifedo...@suse.de>> wrote:

        could you please set debug bluestore to 20 and collect
        startup log for this specific OSD once again?


        On 7/9/2019 6:29 PM, Brett Chancellor wrote:
        I restarted most of the OSDs with the stupid allocator (6 of
        them wouldn't start unless bitmap allocator was set), but
        I'm still seeing issues with OSDs crashing.  Interestingly
        it seems that the dying OSDs are always working on a pg from
        the .rgw.meta pool when they crash.

        Log : https://pastebin.com/yuJKcPvX

        On Tue, Jul 9, 2019 at 5:14 AM Igor Fedotov
        <ifedo...@suse.de <mailto:ifedo...@suse.de>> wrote:

            Hi Brett,

            in Nautilus you can do that via

            ceph config set osd.N bluestore_allocator stupid

            ceph config set osd.N bluefs_allocator stupid

            See
            
https://ceph.com/community/new-mimic-centralized-configuration-management/
            for more details on a new way of configuration options
            setting.


            A known issue with Stupid allocator is gradual write
            request latency increase (occurred within several days
            after OSD restart). Seldom observed though. There were
            some posts about that behavior in the mail list  this year.

            Thanks,

            Igor.


            On 7/8/2019 8:33 PM, Brett Chancellor wrote:


            I'll give that a try.  Is it something like...
            ceph tell 'osd.*' bluestore_allocator stupid
            ceph tell 'osd.*' bluefs_allocator stupid

            And should I expect any issues doing this?


            On Mon, Jul 8, 2019 at 1:04 PM Igor Fedotov
            <ifedo...@suse.de <mailto:ifedo...@suse.de>> wrote:

                I should read call stack more carefully... It's not
                about lacking free space - this is rather the bug
                from this ticket:

                http://tracker.ceph.com/issues/40080


                You should upgrade to v14.2.2 (once it's available)
                or temporarily switch to stupid allocator as a
                workaround.


                Thanks,

                Igor



                On 7/8/2019 8:00 PM, Igor Fedotov wrote:

                Hi Brett,

                looks like BlueStore is unable to allocate
                additional space for BlueFS at main device. It's
                either lacking free space or it's too fragmented...

                Would you share osd log, please?

                Also please run "ceph-bluestore-tool --path
                <substitute with path-to-osd!!!>
                bluefs-bdev-sizes" and share the output.

                Thanks,

                Igor

                On 7/3/2019 9:59 PM, Brett Chancellor wrote:
                Hi All! Today I've had 3 OSDs stop themselves and
                are unable to restart, all with the same error.
                These OSDs are all on different hosts. All are
                running 14.2.1

                I did try the following two commands
                - ceph-kvstore-tool bluestore-kv
                /var/lib/ceph/osd/ceph-80 list > keys
                  ## This failed with the same error below
                - ceph-bluestore-tool --path
                /var/lib/ceph/osd/ceph-80 fsck
                 ## After a couple of hours returned...
                2019-07-03 18:30:02.095 7fe7c1c1ef00 -1
                bluestore(/var/lib/ceph/osd/ceph-80) fsck
                warning: legacy statfs record found, suggest to
                run store repair to get consistent statistic reports
                fsck success


                ## Error when trying to start one of the OSDs
                 -12> 2019-07-03 18:36:57.450 7f5e42366700 -1 ***
                Caught signal (Aborted) **
                 in thread 7f5e42366700 thread_name:rocksdb:low0

                 ceph version 14.2.1
                (d555a9489eb35f84f2e1ef49b77e19da9d113972)
                nautilus (stable)
                 1: (()+0xf5d0) [0x7f5e50bd75d0]
                 2: (gsignal()+0x37) [0x7f5e4f9ce207]
                 3: (abort()+0x148) [0x7f5e4f9cf8f8]
                 4: (ceph::__ceph_assert_fail(char const*, char
                const*, int, char const*)+0x199) [0x55a7aaee96ab]
                 5: (ceph::__ceph_assertf_fail(char const*, char
                const*, int, char const*, char const*, ...)+0)
                [0x55a7aaee982a]
                 6: (interval_set<unsigned long,
                std::map<unsigned long, unsigned long,
                std::less<unsigned long>,
                std::allocator<std::pair<unsigned long const,
                unsigned long> > > >::insert(unsigned long,
                unsigned long, unsigned long*, unsigned
                long*)+0x3c6) [0x55a7ab212a66]
                 7:
                (BlueStore::allocate_bluefs_freespace(unsigned
                long, unsigned long,
                std::vector<bluestore_pextent_t,
                mempool::pool_allocator<(mempool::pool_index_t)4,
                bluestore_pextent_t> >*)+0x74e) [0x55a7ab48253e]
                 8: (BlueFS::_expand_slow_device(unsigned long,
                std::vector<bluestore_pextent_t,
                mempool::pool_allocator<(mempool::pool_index_t)4,
                bluestore_pextent_t> >&)+0x111) [0x55a7ab59e921]
                 9: (BlueFS::_allocate(unsigned char, unsigned
                long, bluefs_fnode_t*)+0x68b) [0x55a7ab59f68b]
                 10: (BlueFS::_flush_range(BlueFS::FileWriter*,
                unsigned long, unsigned long)+0xe5) [0x55a7ab59fce5]
                 11: (BlueFS::_flush(BlueFS::FileWriter*,
                bool)+0x10b) [0x55a7ab5a1b4b]
                 12: (BlueRocksWritableFile::Flush()+0x3d)
                [0x55a7ab5bf84d]
                 13: (rocksdb::WritableFileWriter::Flush()+0x19e)
                [0x55a7abbedd0e]
                 14:
                (rocksdb::WritableFileWriter::Sync(bool)+0x2e)
                [0x55a7abbedfee]
                 15:
                
(rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
                const&,
                rocksdb::CompactionJob::SubcompactionState*,
                rocksdb::RangeDelAggregator*,
                CompactionIterationStats*, rocksdb::Slice
                const*)+0xbaa) [0x55a7abc3b73a]
                 16:
                
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d0)
                [0x55a7abc3f150]
                 17: (rocksdb::CompactionJob::Run()+0x298)
                [0x55a7abc40618]
                 18:
                (rocksdb::DBImpl::BackgroundCompaction(bool*,
                rocksdb::JobContext*, rocksdb::LogBuffer*,
                rocksdb::DBImpl::PrepickedCompaction*)+0xcb7)
                [0x55a7aba7fb67]
                 19:
                
(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
                rocksdb::Env::Priority)+0xd0) [0x55a7aba813c0]
                 20:
                (rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a)
                [0x55a7aba8190a]
                 21:
                (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned
                long)+0x264) [0x55a7abc8d9c4]
                 22:
                (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f)
                [0x55a7abc8db4f]
                 23: (()+0x129dfff) [0x55a7abd1afff]
                 24: (()+0x7dd5) [0x7f5e50bcfdd5]
                 25: (clone()+0x6d) [0x7f5e4fa95ead]
                 NOTE: a copy of the executable, or `objdump -rdS
                <executable>` is needed to interpret this.

                _______________________________________________
                ceph-users mailing list
                ceph-users@lists.ceph.com  <mailto:ceph-users@lists.ceph.com>
                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                _______________________________________________
                ceph-users mailing list
                ceph-users@lists.ceph.com  <mailto:ceph-users@lists.ceph.com>
                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to